Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save query without loading the data into R environment #630

Open
hope-data-science opened this issue Dec 6, 2024 · 4 comments
Open

Save query without loading the data into R environment #630

hope-data-science opened this issue Dec 6, 2024 · 4 comments

Comments

@hope-data-science
Copy link

I have a pretty large duckdb file and I use a query to generate a larger-than-memory data. How can I save this results elegantly without using compute to load the data into R environment (which might not be possible)?

@krlmlr
Copy link
Collaborator

krlmlr commented Dec 11, 2024

Thanks, we need to document this better, and/or provide utilities.

For now, you could use COPY TO like this:

data_sql <- "..."
output_sql <- paste0("COPY (", data_sql, ") TO 'out.parquet' (FORMAT PARQUET, ...)")
dbExecute(con, output_sql)

Upstream docs: https://duckdb.org/docs/sql/statements/copy#copy--to

@hope-data-science
Copy link
Author

hope-data-science commented Dec 12, 2024

FYI, currently I am using:

## to parquet
duckdb_sink_parquet = \(conn,sql,path) {
  DBI::dbExecute(conn = conn,
                 statement =  stringr::str_glue("COPY ({sql})
          TO '{path}' (FORMAT PARQUET)"))
}

## to a table in the current database (duckdb)
duckdb_sink_table = \(conn,sql,table_name){
  DBI::dbExecute(conn = conn,
                 statement =  stringr::str_glue("CREATE TABLE {table_name} AS {sql}"))
}

I wonder if DBI package has some shortcuts for these functions.

@krlmlr
Copy link
Collaborator

krlmlr commented Dec 12, 2024

Thanks. This is out of scope for DBI, could be in scope for duckdb of for a utility package on top of duckdb.

@hope-data-science
Copy link
Author

hope-data-science commented Dec 12, 2024

Another practice is:

tbl(duckdb_connecntion, "table_name") %>%
    sql_render() %>%
    db_save_query(con = duckdb_connection,sql = ., name = "new_table_name",temporary = FALSE)

I think we really need this sort of functions to implement out-of-memory computation.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants