OtterBrix is an open-source framework designed to process semi-structured data seamlessly. It merges the reliability of OLTP approach with the flexibility of OLAP, thanks to its innovative multi-dimensional model. This allows developers to directly ingest JSON, Parquet, ORC, Avro or other irregular formats and execute SQL-like queries while ensuring high performance.
Built with C++, this lightweight engine can be easily integrated into your applications, microservices, or data pipelines. It enables efficient real-time analytics on nested data structures, right where your data resides.
- Schema-Free Processing: Ingest and query data without defining schemas first
- Unified Data Model: Handle both flat (tabular) and hierarchical (nested) data in one engine
- High-Performance Analytics: Column-oriented format for real-time operations on semi-structured data
- SQL Query Interface: Familiar SQL syntax and DataFrame-like operations
- Multi-Modal Storage: rows, columns, documents and high-dimensional vectors
- Data Interoperability: Convert between internal formats and standards like JSON, Parquet, ORC, Avro, Arrow
OtterBrix excels in the following scenarios:
Use OtterBrix in backend services to filter or aggregate streaming data before it reaches heavier data warehouses. Filter events in real-time and reduce load on downstream systems.
Embed OtterBrix in microservices for local analytic capabilities. Process logs or sensor data and perform dynamic queries without round-trips to central data stores.
Handle mixed structured and semi-structured data workloads. Join relational data with JSON documents for unified jobs, data science notebooks, or lightweight data integration.
Add advanced querying and reporting capabilities to your applications. Enable features like in-app reports or offline analytics without external database dependencies.
Leverage OtterBrix's performance optimizations for heavy JSON processing or data transformations, with potential for GPU acceleration in future releases.
While specific to each use case, OtterBrix typically shows:
- 3-5x faster analytics on semi-structured data compared to document databases
- Similar performance to columnar databases (like DuckDB) on structured data
- Up to 10x better memory efficiency when handling deeply nested, sparse data structures
- Near-zero latency for real-time filtering and aggregation of streaming data
Get started with Otterbrix using our installation and usage example below:
Otterbrix is available as a Python package on PyPI. You can install it using pip.
pip install "otterbrix==1.0.1a9"
client = Client()
c = client.execute("SELECT * FROM schema.table WHERE count = 1000;")
c.close()
Otterbrix seamlessly integrates with column-oriented memory format and can represent both flat and hierarchical data for efficient analytical operations.
Feature/Challenge | DuckDB | Velox | OtterBrix |
---|---|---|---|
Memory Usage | Can exhaust available memory on large datasets and doesn't always spill to disk gracefully | High memory consumption with extensive caching; requires careful tuning | Efficient memory layout optimized for three modes: in-memory, disk-only and hybrid |
Semi-Structured Data | Limited support for nested data; requires flattening | Supports complex types but typically requires schema definition | Native handling of deeply nested structures without performance degradation |
Integration Model | Embedded library with direct SQL interface | Component library for building systems; requires significant integration effort | Complete embedded solution with SQL interface and DataFrame API for direct application integration |
Performance on JSON | Underperforms when working with complex JSON structures | Relies on host system for JSON parsing and schema conversion | Purpose-built for high-performance JSON analytics with optimized path expressions |
Point Lookups | 10x slower than SQLite on certain patterns of queries involving point lookups | Optimized for analytical queries, not point lookups | Hybrid storage approach balances both scans and indexed lookups |
Concurrency | Limited support for concurrent connections | Depends on host system's concurrency model | Optimized for in-process concurrency within a single application |
Memory Layout | Traditional columnar storage | Arrow-compatible columnar format | Hybrid tuple and Arrow format that adapts to data patterns |
Primary Use Case | Local analytics on structured data | Accelerating large-scale database systems | Application-level processing of semi-structured data |
OtterBrix introduces a multi-dimensional document model that bridges document-oriented storage with analytical databases:
Here's your diagram in proper Markdown format using code blocks, so it renders cleanly and keeps the alignment intact:
ββββββββββββββββββββββββββββββββββββββββββ β Application Layer β ββββββββββββββββββββββ¬ββββββββββββββββββββ β ββββββββββββββββββββββ΄βββββββββββββββββββββ β OtterBrix Engine β βββββββββββββββββββββββββββββββββββββββββββ€ β βββββββββββββββ ββββββββββββββββ β β β SQL Query β β DataFrame β β β β Interface β β Interface β β β βββββββββββββββ ββββββββββββββββ β βββββββββββββββββββββββββββββββββββββββββββ€ β βββββββββββββββββββββββββββββββββββββββ β β β Multi-Modal Storage β β β βββββββββββββββββββββββββββββββββββββββ β βββββββββββββββββββββββββββββββββββββββββββ€ β βββββββββββββ ββββββββββββ βββββββββββ β β β rows β β columns β β ML β β β β β β β β Storage β β β βββββββββββββ ββββββββββββ βββββββββββ β ββββββββββββββββββββββ¬βββββββββββββββββββββ β ββββββββββββββββββββββ΄ββββββββββββββββββββ β Data Sources β β (JSON, Parquet, CSV, Custom Formats) β ββββββββββββββββββββββββββββββββββββββββββ
You can paste this directly into your Markdown file or viewer that supports fenced code blocks (like GitHub, GitLab, or most modern Markdown renderers).
OtterBrix's unique memory architecture combines:
- Arrow Format: Efficient for analytical operations across many records
- Tuple or Flexible-Tuple Format: Optimal for accessing complete nested objects
- Adaptive Selection: Automatically chooses the best representation based on data patterns and access needs
This hybrid approach delivers columnar database performance while maintaining the flexibility of document stores, making it uniquely suited for semi-structured data processing.
A DevOps platform reduced their log processing pipeline latency by 75% by embedding OtterBrix directly into their log collection microservice. This allowed them to perform complex filtering and aggregation at the edge before data reached their central data lake, reducing storage costs by 40% and enabling real-time alerting.
An industrial IoT deployment used OtterBrix to process variable-schema sensor data from thousands of devices with different firmware versions. The schema-free approach eliminated the need for constant ETL adjustments as new sensor types were added, while the efficient memory layout allowed running complex analyses on edge devices with limited resources.
A data integration platform leveraged OtterBrix to join and analyze responses from multiple third-party APIs with regularly changing schemas. The hybrid storage model allowed efficient querying across these diverse data structures without performance degradation as APIs evolved.
The Otterbrix team aims at keeping the code readable and consistent with the surrounding code where possible. A detailed code style guide is work in progress.
See CONTRIBUTING for contribution requirements.
See the project security policy for information about reporting vulnerabilities.
To correctly build Otterbrix, you will need the most current version of Docker.
The current version of Otterbrix can be built in Dockerfiles only. If you need assistance when building Otterbrix, please contact our team.
In case you've encountered any issues, please feel free to create them right here on GitHub!
OtterBrix is built on modern C++ techniques and is inspired by advances in both document-oriented and analytical database systems. We thank all contributors and the open-source community.