GitHub - duckstax/otterbrix: Otterbrix: computation framework for Semi-structured data processing

About Otterbrix

Overview

OtterBrix is an open-source framework designed to process semi-structured data seamlessly. It merges the reliability of OLTP approach with the flexibility of OLAP, thanks to its innovative multi-dimensional model. This allows developers to directly ingest JSON, Parquet, ORC, Avro or other irregular formats and execute SQL-like queries while ensuring high performance.

Built with C++, this lightweight engine can be easily integrated into your applications, microservices, or data pipelines. It enables efficient real-time analytics on nested data structures, right where your data resides.

✨ Key Features

Schema-Free Processing: Ingest and query data without defining schemas first
Unified Data Model: Handle both flat (tabular) and hierarchical (nested) data in one engine
High-Performance Analytics: Column-oriented format for real-time operations on semi-structured data
SQL Query Interface: Familiar SQL syntax and DataFrame-like operations
Multi-Modal Storage: rows, columns, documents and high-dimensional vectors
Data Interoperability: Convert between internal formats and standards like JSON, Parquet, ORC, Avro, Arrow

🔍 Use Cases

OtterBrix excels in the following scenarios:

Data Pre-Processing and Filtering

Use OtterBrix in backend services to filter or aggregate streaming data before it reaches heavier data warehouses. Filter events in real-time and reduce load on downstream systems.

Real-Time Analytics in Microservices

Embed OtterBrix in microservices for local analytic capabilities. Process logs or sensor data and perform dynamic queries without round-trips to central data stores.

Hybrid Data Processing

Handle mixed structured and semi-structured data workloads. Join relational data with JSON documents for unified jobs, data science notebooks, or lightweight data integration.

Embedded Analytics in Applications

Add advanced querying and reporting capabilities to your applications. Enable features like in-app reports or offline analytics without external database dependencies.

Accelerated Data Processing

Leverage OtterBrix's performance optimizations for heavy JSON processing or data transformations, with potential for GPU acceleration in future releases.

⚡ Performance Benchmarks

While specific to each use case, OtterBrix typically shows:

3-5x faster analytics on semi-structured data compared to document databases
Similar performance to columnar databases (like DuckDB) on structured data
Up to 10x better memory efficiency when handling deeply nested, sparse data structures
Near-zero latency for real-time filtering and aggregation of streaming data

🚀 Get Started

Get started with Otterbrix using our installation and usage example below:

Installation

Otterbrix is available as a Python package on PyPI. You can install it using pip.

pip install "otterbrix==1.0.1a9"

Basic Usage:

client = Client()
c = client.execute("SELECT * FROM schema.table WHERE count = 1000;")
c.close()

Otterbrix seamlessly integrates with column-oriented memory format and can represent both flat and hierarchical data for efficient analytical operations.

📊 Why OtterBrix vs Alternatives

OtterBrix vs Alternatives

Feature/Challenge	DuckDB	Velox	OtterBrix
Memory Usage	Can exhaust available memory on large datasets and doesn't always spill to disk gracefully	High memory consumption with extensive caching; requires careful tuning	Efficient memory layout optimized for three modes: in-memory, disk-only and hybrid
Semi-Structured Data	Limited support for nested data; requires flattening	Supports complex types but typically requires schema definition	Native handling of deeply nested structures without performance degradation
Integration Model	Embedded library with direct SQL interface	Component library for building systems; requires significant integration effort	Complete embedded solution with SQL interface and DataFrame API for direct application integration
Performance on JSON	Underperforms when working with complex JSON structures	Relies on host system for JSON parsing and schema conversion	Purpose-built for high-performance JSON analytics with optimized path expressions
Point Lookups	10x slower than SQLite on certain patterns of queries involving point lookups	Optimized for analytical queries, not point lookups	Hybrid storage approach balances both scans and indexed lookups
Concurrency	Limited support for concurrent connections	Depends on host system's concurrency model	Optimized for in-process concurrency within a single application
Memory Layout	Traditional columnar storage	Arrow-compatible columnar format	Hybrid tuple and Arrow format that adapts to data patterns
Primary Use Case	Local analytics on structured data	Accelerating large-scale database systems	Application-level processing of semi-structured data

📊 Architecture

OtterBrix introduces a multi-dimensional document model that bridges document-oriented storage with analytical databases:

Here's your diagram in proper Markdown format using code blocks, so it renders cleanly and keeps the alignment intact:

┌────────────────────────────────────────┐
│            Application Layer           │
└────────────────────┬───────────────────┘
                     │
┌────────────────────┴────────────────────┐
│           OtterBrix Engine              │
├─────────────────────────────────────────┤
│ ┌─────────────┐        ┌──────────────┐ │
│ │  SQL Query  │        │  DataFrame   │ │
│ │  Interface  │        │  Interface   │ │
│ └─────────────┘        └──────────────┘ │
├─────────────────────────────────────────┤
│ ┌─────────────────────────────────────┐ │
│ │    Multi-Modal Storage              │ │
│ └─────────────────────────────────────┘ │
├─────────────────────────────────────────┤
│ ┌───────────┐ ┌──────────┐ ┌─────────┐  │
│ │ rows      │ │ columns  │ │ ML      │  │
│ │           │ │          │ │ Storage │  │
│ └───────────┘ └──────────┘ └─────────┘  │
└────────────────────┬────────────────────┘
                     │
┌────────────────────┴───────────────────┐
│            Data Sources                │
│  (JSON, Parquet, CSV, Custom Formats)  │
└────────────────────────────────────────┘

You can paste this directly into your Markdown file or viewer that supports fenced code blocks (like GitHub, GitLab, or most modern Markdown renderers).

Innovative Hybrid Memory Layout

OtterBrix's unique memory architecture combines:

Arrow Format: Efficient for analytical operations across many records
Tuple or Flexible-Tuple Format: Optimal for accessing complete nested objects
Adaptive Selection: Automatically chooses the best representation based on data patterns and access needs

This hybrid approach delivers columnar database performance while maintaining the flexibility of document stores, making it uniquely suited for semi-structured data processing.

💡 Real-World Success Stories

Case Study: Log Analytics Service

A DevOps platform reduced their log processing pipeline latency by 75% by embedding OtterBrix directly into their log collection microservice. This allowed them to perform complex filtering and aggregation at the edge before data reached their central data lake, reducing storage costs by 40% and enabling real-time alerting.

Case Study: IoT Sensor Analysis

An industrial IoT deployment used OtterBrix to process variable-schema sensor data from thousands of devices with different firmware versions. The schema-free approach eliminated the need for constant ETL adjustments as new sensor types were added, while the efficient memory layout allowed running complex analyses on edge devices with limited resources.

Case Study: API Response Processing

A data integration platform leveraged OtterBrix to join and analyze responses from multiple third-party APIs with regularly changing schemas. The hybrid storage model allowed efficient querying across these diverse data structures without performance degradation as APIs evolved.

Development

The Otterbrix team aims at keeping the code readable and consistent with the surrounding code where possible. A detailed code style guide is work in progress.

Contributing

See CONTRIBUTING for contribution requirements.

Security

See the project security policy for information about reporting vulnerabilities.

Build requirements

To correctly build Otterbrix, you will need the most current version of Docker.

Building Otterbrix

The current version of Otterbrix can be built in Dockerfiles only. If you need assistance when building Otterbrix, please contact our team.

Troubleshooting

In case you've encountered any issues, please feel free to create them right here on GitHub!

✨ Acknowledgements

OtterBrix is built on modern C++ techniques and is inspired by advances in both document-oriented and analytical database systems. We thank all contributors and the open-source community.

Name		Name	Last commit message	Last commit date
Latest commit History 249 Commits
.github		.github
cmake		cmake
components		components
conan/profiles		conan/profiles
core		core
docker		docker
example		example
integration		integration
scripts		scripts
services		services
.clang-format		.clang-format
.dockerignore		.dockerignore
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
CONTRIBUTING.md		CONTRIBUTING.md
ContributorLicenseAgreement.md		ContributorLicenseAgreement.md
LICENSE		LICENSE
README.md		README.md
conanfile.py		conanfile.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About Otterbrix

Overview

✨ Key Features

🔍 Use Cases

Data Pre-Processing and Filtering

Real-Time Analytics in Microservices

Hybrid Data Processing

Embedded Analytics in Applications

Accelerated Data Processing

⚡ Performance Benchmarks

🚀 Get Started

Installation

Basic Usage:

📊 Why OtterBrix vs Alternatives

OtterBrix vs Alternatives

📊 Architecture

Innovative Hybrid Memory Layout

💡 Real-World Success Stories

Case Study: Log Analytics Service

Case Study: IoT Sensor Analysis

Case Study: API Response Processing

Development

Contributing

Security

Build requirements

Building Otterbrix

Troubleshooting

✨ Acknowledgements

About

Releases 7

Packages

Contributors 16

Languages

License

duckstax/otterbrix

Folders and files

Latest commit

History

Repository files navigation

About Otterbrix

Overview

✨ Key Features

🔍 Use Cases

Data Pre-Processing and Filtering

Real-Time Analytics in Microservices

Hybrid Data Processing

Embedded Analytics in Applications

Accelerated Data Processing

⚡ Performance Benchmarks

🚀 Get Started

Installation

Basic Usage:

📊 Why OtterBrix vs Alternatives

OtterBrix vs Alternatives

📊 Architecture

Innovative Hybrid Memory Layout

💡 Real-World Success Stories

Case Study: Log Analytics Service

Case Study: IoT Sensor Analysis

Case Study: API Response Processing

Development

Contributing

Security

Build requirements

Building Otterbrix

Troubleshooting

✨ Acknowledgements

About

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Releases 7

Packages 0

Contributors 16

Languages

Packages