Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Receiving curl error when using tbl(con, sql("string query")) #540

Closed
mjbroerman opened this issue Sep 18, 2023 · 4 comments · Fixed by #552
Closed

Receiving curl error when using tbl(con, sql("string query")) #540

mjbroerman opened this issue Sep 18, 2023 · 4 comments · Fixed by #552
Labels
bug an unexpected problem or unintended behavior dbplyr 🔧
Milestone

Comments

@mjbroerman
Copy link

Reassigning this from tidyverse/dbplyr#1360

When I use tbl(con, sql("string query")) on BigQuery, I want a remote tibble, but I get a curl error. Low level options work, and after testing parentheses and backticks, I don't think formatting string query is the issue. Reprex below.

Here's my curl version:

matt$ curl --version
curl 7.87.0 (x86_64-apple-darwin22.0) libcurl/7.87.0 (SecureTransport) LibreSSL/3.3.6 zlib/1.2.11 nghttp2/1.51.0
Release-Date: 2022-12-21
Protocols: dict file ftp ftps gopher gophers http https imap imaps ldap ldaps mqtt pop3 pop3s rtsp smb smbs smtp smtps telnet tftp
Features: alt-svc AsynchDNS GSS-API HSTS HTTP2 HTTPS-proxy IPv6 Kerberos Largefile libz MultiSSL NTLM NTLM_WB SPNEGO SSL threadsafe UnixSockets

Reprex

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(dbplyr)
#> 
#> Attaching package: 'dbplyr'
#> The following objects are masked from 'package:dplyr':
#> 
#>     ident, sql
library(DBI)

con <- DBI::dbConnect(
  bigrquery::bigquery(),
  project = "bigquery-public-data",
  dataset = "stackoverflow",
  billing = Sys.getenv("BIGQUERY_BILLING_PROJECT")
)

# works
tbl(con, "tags")
#> Warning: <BigQueryConnection> uses an old dbplyr interface
#> ℹ Please install a newer version of the package or contact the maintainer
#> This warning is displayed once every 8 hours.
#> ! Using an auto-discovered, cached token.
#>   To suppress this message, modify your code or options to clearly consent to
#>   the use of a cached token.
#>   See gargle's "Non-interactive auth" vignette for more details:
#>   <https://gargle.r-lib.org/articles/non-interactive-auth.html>
#> ℹ The bigrquery package is using a cached token for '[email protected]'.
#> Auto-refreshing stale OAuth token.
#> # Source:   table<tags> [?? x 5]
#> # Database: BigQueryConnection
#>        id tag_name         count excerpt_post_id wiki_post_id
#>     <int> <chr>            <int>           <int>        <int>
#>  1 149358 announcement         0        69362460     69362459
#>  2   2136 regsvr32           256        10051236     10051235
#>  3   8287 rpg                256         7080550      7080549
#>  4   9136 dbexpress          256         7663373      7663372
#>  5   9213 multiple-domains   256        22931444     22931443
#>  6  11107 menustrip          256        47237574     47237573
#>  7  15213 fileserver         256         9275976      9275975
#>  8  25272 bad-alloc          256        14261589     14261588
#>  9  34059 pitch              256        37258061     37258060
#> 10  37612 winrar             256        13178390     13178389
#> # ℹ more rows

#  works
DBI::dbGetQuery(con, sql("SELECT * FROM tags"), n = 10)
#> # A tibble: 10 × 5
#>        id tag_name         count excerpt_post_id wiki_post_id
#>     <int> <chr>            <int>           <int>        <int>
#>  1 149358 announcement         0        69362460     69362459
#>  2   2136 regsvr32           256        10051236     10051235
#>  3   8287 rpg                256         7080550      7080549
#>  4   9136 dbexpress          256         7663373      7663372
#>  5   9213 multiple-domains   256        22931444     22931443
#>  6  11107 menustrip          256        47237574     47237573
#>  7  15213 fileserver         256         9275976      9275975
#>  8  25272 bad-alloc          256        14261589     14261588
#>  9  34059 pitch              256        37258061     37258060
#> 10  37612 winrar             256        13178390     13178389
DBI::dbSendQuery(con, sql("SELECT * FROM tags"), n = 10)
#> <BigQueryResult>
#>   Query: SELECT * FROM tags
#>   Has completed: FALSE
#>   Rows fetched: 0

# does not work
tbl(con, sql("SELECT * FROM tags"))
#> Error in curl::curl_fetch_memory(url, handle = handle): URL using bad/illegal format or missing URL

Created on 2023-09-10 with reprex v2.0.2

Session info
sessionInfo()
#> R version 4.3.1 (2023-06-16)
#> Platform: aarch64-apple-darwin20 (64-bit)
#> Running under: macOS Ventura 13.3.1
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: America/New_York
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] DBI_1.1.3    dbplyr_2.3.3 dplyr_1.1.3 
#> 
#> loaded via a namespace (and not attached):
#>  [1] bit_4.0.5         jsonlite_1.8.7    compiler_4.3.1    brio_1.1.3       
#>  [5] tidyselect_1.2.0  reprex_2.0.2      Rcpp_1.0.11       assertthat_0.2.1 
#>  [9] yaml_2.3.7        fastmap_1.1.1     bigrquery_1.4.2   R6_2.5.1         
#> [13] generics_0.1.3    curl_5.0.2        knitr_1.43        tibble_3.2.1     
#> [17] openssl_2.1.0     pillar_1.9.0      rlang_1.1.1       utf8_1.2.3       
#> [21] xfun_0.40         fs_1.6.3          bit64_4.0.5       cli_3.6.1        
#> [25] withr_2.5.0       magrittr_2.0.3    digest_0.6.33     rstudioapi_0.15.0
#> [29] rappdirs_0.3.3    askpass_1.2.0     lifecycle_1.0.3   vctrs_0.6.3      
#> [33] evaluate_0.21     gargle_1.5.2      glue_1.6.2        fansi_1.0.4      
#> [37] rmarkdown_2.24    httr_1.4.7        tools_4.3.1       pkgconfig_2.0.3  
#> [41] htmltools_0.5.6
@talegari
Copy link

talegari commented Nov 1, 2023

I have landed on the same issue. This is a serious blocker for writing some non-trivial queries which need UNNEST which cannot be written with a dplyr/dbplyr verb. @hadley

@talegari
Copy link

talegari commented Nov 2, 2023

Here is a quick and temporary fix:

#' @name bq_tbl_from_query
#' @title Create a tbl object from a sql query for a bigquery connection
#' @description `tbl(con, sql(query))` should work, but fails on `bigquery
#'   (1.4.2)`. This is a interim solution to keep my work going. See the issue
#'   here: https://github.com/r-dbi/bigrquery/issues/540
#' @details
#'
#' 1. create a temporary table (expires in 1 day by default) using
#' `DBI::dbExecute`.
#'
#' 2. Create a `tbl` connection to the temporary table.
#'
#' Package dependencies: `bigrquery`, `DBI`, `dplyr`, `checkmate`, `cli`, `glue`
#'
#' @param con (object) of class 'BigQueryConnection'
#' @param query (string) query string
#' @param dataset (string) dataset where temporary table gets created
#' @param n_days_expiration (integerish) Number of days of expiry for the
#'   temporary table
bq_tbl_from_query = function(con,
                             query,
                             dataset = NULL, 
                             n_days_expiration = 1
                             ){
  
  checkmate::assert_class(con, "BigQueryConnection")
  checkmate::assert_string(query)
  checkmate::assert_string(dataset, null.ok = TRUE)
  checkmate::assert_integerish(n_days_expiration, lower = 1)
  
  # if dataset is NULL, try and extract 'dataset' from connection object
  if (is.null(con_ads@dataset)){
    if (is.null(dataset)) {
      stop("'con' does not have dataset. Please provide 'dataset' input.")
    }
  } else {
    if (is.null(dataset)) {
      dataset = con_ads@dataset
    }
  }
  dataset_exists_flag = bigrquery::bq_dataset_exists(
    bigrquery::bq_dataset(project = con@project, dataset = dataset)
    )
  checkmate::assert_true(dataset_exists_flag)
  
  # set expiration string
  expiration_string = 
    glue::glue("OPTIONS(expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL {n_days_expiration} DAY))")
  
  # pick a random table name and create it
  table_exists_flag = TRUE
  while(table_exists_flag){
    table_name = paste(sample(letters, 20), collapse = "")
    table_exists_flag = bigrquery::bq_table_exists(
      bigrquery::bq_table(project = con@project,
                          dataset = dataset,
                          table = table_name
                          )
      )
  }
  
  create_query_string =
    glue::glue("CREATE TABLE {dataset}.{table_name} {expiration_string} AS ({query})")
  
  exec_flag = (DBI::dbExecute(con, create_query_string) == 0)
  if (!exec_flag){
    stop("Query failed to run")
  }
  
  cli::cli_alert_info(glue::glue("Temporary table created at `{dataset}.{table_name}` will auto-expire in {n_days_expiration} days"))
  
  # create a tbl object
  res = dplyr::tbl(con, glue::glue("{dataset}.{table_name}"))
  
  return(res)
}

@hadley
Copy link
Member

hadley commented Nov 2, 2023

Somewhat more minimal reprex:

library(dbplyr)
library(dplyr, warn.conflicts = FALSE)
library(bigrquery)

con <- DBI::dbConnect(
  bigquery(),
  project = "bigquery-public-data",
  dataset = "stackoverflow",
  billing = bq_test_project()
)

tbl(con, dbplyr::sql("SELECT * FROM tags LIMIT 10"))
#> Warning: <BigQueryConnection> uses an old dbplyr interface
#> ℹ Please install a newer version of the package or contact the maintainer
#> This warning is displayed once every 8 hours.
#> ! Using an auto-discovered, cached token.
#>   To suppress this message, modify your code or options to clearly consent to
#>   the use of a cached token.
#>   See gargle's "Non-interactive auth" vignette for more details:
#>   <https://gargle.r-lib.org/articles/non-interactive-auth.html>
#> ℹ The bigrquery package is using a cached token for '[email protected]'.
#> Error in curl::curl_fetch_memory(url, handle = handle): URL rejected: Malformed input to a URL function

Created on 2023-11-02 with reprex v2.0.2

@hadley hadley added bug an unexpected problem or unintended behavior dbplyr 🔧 labels Nov 2, 2023
@hadley hadley added this to the v1.5.0 milestone Nov 3, 2023
@hadley
Copy link
Member

hadley commented Nov 3, 2023

Interestingly wrapping in collect() or as.data.frame() makes the problem go away, so this is caused by the print() method?

Looks like the problem is caused by which branch of op_can_download(x) you travel down.

Looks like that's caused by head(): collect(head(tbl(con, dbplyr::sql("SELECT * FROM tags LIMIT 10")), n = 10)) gives a simpler call stack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior dbplyr 🔧
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants