Skip to content

Benchmarking Arrow Flight - A wire-speed protocol for data transfer, querying and microservices

License

Notifications You must be signed in to change notification settings

abs-tudelft/time-to-fly-high

Repository files navigation

Benchmarking Arrow Flight

For client-server (local and remote) performance, we used Arrow Flight Benchmark, the Python script is placed here.

Cartesius local node example:

singularity exec  /scratch-shared/tahmad/bio_data/flight.simg /arrow/cpp/release/release/arrow-flight-benchmark --server_host tcn541

singularity exec  /scratch-shared/tahmad/bio_data/flight.simg /arrow/cpp/release/release/arrow-flight-perf-server --server_host tcn541

For querying NYC Taxi dataset on remote Dremio (client-server) nodes with varying number of records(1-16 millions). Different protocols like ODBC and turbodbc and Arrow Flight implementation is available here.

Starting Dremio:

./dremio-community-15.0.0-202103312106020527-0be9c719/bin/dremio start

For querying NYC Taxi dataset with varying number of records (0.1-16 millions) through remote DataFusion client-server Flight connection, we used DataFusion Flight updated client-server implementation.

Commands for creating Arrow Flight based singularity container:

sudo singularity build -w flight.simg flight.def
sudo singularity shell -w flight.simg
> mkdir /arrow
> cp -r arrow/cpp/release/release /arrow

About

Benchmarking Arrow Flight - A wire-speed protocol for data transfer, querying and microservices

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published