Releases: delta-io/delta-sharing
Delta Sharing 0.2.0
We are excited to announce the release of Delta Sharing 0.2.0, which introduces the following improvements and fixes multiple issues:
Improvements:
- Added official Docker images for Delta Sharing Server
- Added an examples project to show how to try the open Delta Sharing Server (#26)
- Added the
conf
directory to the Delta Sharing Server classpath to allow users to add their Hadoop configuration files in the directory (#45) - Added retry with exponential backoff for REST requests in the Python connector (#49)
Bug fixes:
- Added the minimum
fsspec
requirement in the Python connector (#23) - Fixed an issue when files in a table have no stats in the Python connector (#30)
- Improve error handling in Delta Sharing Server to report 400 Bad Request properly (#32)
- Fixed the table schema when a table is empty in the Python connector (#37)
- Fixed KeyError when there are no shared tables in the Python connector (#50)
Credits: Denny Lee, Matei Zaharia, Shixiong Zhu, Yaohua, Yuhong Chen, dobachi
Delta Sharing 0.1.0
We are excited to announce the release of Delta Sharing 0.1.0.
Delta Sharing is an open protocol for secure real-time exchange of large datasets, which enables organizations to share data in real time regardless of which computing platforms they use. It is a simple REST protocol that securely shares access to part of a cloud dataset and leverages modern cloud storage systems, such as S3, ADLS, or GCS, to reliably transfer data.
With Delta Sharing, a user accessing shared data can directly connect to it through pandas, Tableau, Apache Spark, Rust, Python, or dozens of other systems that support the open protocol, without having to deploy a specific compute platform first. This makes life simpler for both data providers and consumers. Data providers can share a dataset once to reach a broad range of consumers on any platform, and data consumers can get started using the data in minutes on their existing computing tools.
This repo includes the following components:
- Delta Sharing protocol specification.
- Python Connector: A Python library that implements the Delta Sharing Protocol to read shared tables as pandas DataFrame or Apache Spark DataFrames.
- Apache Spark Connector: An Apache Spark connector that implements the Delta Sharing Protocol to read shared tables from a Delta Sharing Server. The tables can then be accessed in SQL, Python, Java, Scala, or R.
- Delta Sharing Server: A reference implementation server for the Delta Sharing Protocol for development purposes. Users can deploy this server to share existing tables in Delta Lake and Apache Parquet format on modern cloud storage systems.
See the documentation for more details.