Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arrow Format #5

Open
lmangani opened this issue Dec 2, 2024 · 3 comments
Open

Arrow Format #5

lmangani opened this issue Dec 2, 2024 · 3 comments

Comments

@lmangani
Copy link
Contributor

lmangani commented Dec 2, 2024

How can we stream DuckDB query results using the ARROW IPC format?

Copy link
Contributor

github-actions bot commented Dec 2, 2024

👋 Hello!

To stream DuckDB query results using the ARROW IPC format, you can use the ArrowStreamResult class provided by the DuckDB C++ API. This allows you to efficiently stream large query results without materializing the entire result set in memory.

Here is an example of how to achieve this:

  1. Include the necessary headers:

    #include <duckdb.hpp>
    #include <duckdb/main/client_context.hpp>
    #include <duckdb/main/arrow.hpp>
  2. Create a DuckDB instance and a connection:

    DuckDB db(nullptr);
    Connection con(db);
  3. Execute a query and get the Arrow stream:

    auto result = con.Query("SELECT * FROM your_table");
    auto arrow_stream = result->ToArrowStream();
  4. Consume the Arrow stream:

    while (true) {
        std::shared_ptr<arrow::RecordBatch> batch;
        arrow_stream->GetNext(&batch);
        if (!batch) {
            break; // No more batches
        }
        // Process the batch
    }

For more detailed information and additional options, refer to the DuckDB documentation on the C++ API and Arrow integration:

I am a bot powered by Huggingface. Please verify any information provided.

@lmangani
Copy link
Contributor Author

lmangani commented Dec 2, 2024

In DuckDB Arrow support is provided by an extension.

DuckDB Arrow Extension

This is a DuckDB extension that provides features that need a dependency on the Apache Arrow library.

Features

function type description
to_arrow_ipc Table in-out-function Serializes a table into a stream of blobs containing arrow ipc buffers
scan_arrow_ipc Table function scan a list of pointers pointing to arrow ipc buffers

We want to use the to_arrow_ipc function to return results as a stream of blobs containing arrow ipc buffers

Copy link
Contributor

github-actions bot commented Dec 2, 2024

👋 Hello!

To use the to_arrow_ipc function in DuckDB, you need to ensure that the Arrow extension is installed and loaded. Here’s a step-by-step guide on how to do this:

Step 1: Install the Arrow Extension

First, you need to install the Arrow extension. This can be done by setting the repository endpoint to the HTTP URL of the bucket containing the extension and then installing it.

-- Set the repository endpoint
INSTALL httpfs;

-- Set the repository URL
SET repository='https://example.com/duckdb-extensions';

-- Install the Arrow extension
INSTALL arrow;

Step 2: Load the Arrow Extension

After installation, you need to load the Arrow extension to use its functions.

LOAD arrow;

Step 3: Use the to_arrow_ipc Function

Once the extension is loaded, you can use the to_arrow_ipc function to serialize a table into a stream of blobs containing Arrow IPC buffers.

-- Example query to serialize a table into Arrow IPC buffers
SELECT to_arrow_ipc(table_name) AS arrow_ipc_buffers
FROM table_name;

Example

Assume you have a table named my_table with some data. You can serialize this table into Arrow IPC buffers as follows:

-- Install and load the Arrow extension
INSTALL httpfs;
SET repository='https://example.com/duckdb-extensions';
INSTALL arrow;
LOAD arrow;

-- Serialize the table into Arrow IPC buffers
SELECT to_arrow_ipc(my_table) AS arrow_ipc_buffers
FROM my_table;

Documentation Links

I am a bot powered by Huggingface. Please verify any information provided.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant