Releases: delta-io/delta-sharing
Delta Sharing 0.6.2
We’d like to announce the release of Delta Sharing 0.6.2, which introduces the following improvement and bug fixes.
Bug fixes:
- Fix comparison of the expiration time to current time for pre-signed urls.(#236)
Credits: Lin Zhou, William Chau
Delta Sharing 0.6.1
We’d like to announce the release of Delta Sharing 0.6.1, which introduces the following improvement and bug fixes.
Improvements:
- Spark connector changes to consume size from metadata. (#228)
- Improve delta sharing error messages(#235)
Bug fixes:
- Extends DeltaSharingProfileProvider to customize tablePath and refresher (#223)
- Refresh pre-signed urls for cdf and streaming queries (#221, #222)
- Allow 0 for versionAsOf parameter, to be consistent with Delta (#224)
- Fix partitionFilters issue: apply it to all file indices. (#227, #229)
Credits: Abhijit Chakankar, Lin Zhou
Delta Sharing 0.6.0
We are excited to announce the release of Delta Sharing 0.6.0, which introduces the following improvements.
Improvements:
- Support using a delta sharing table as a source in spark structured streaming, which allows recipients to stay up to date with the shared data. (#189, #190, #194, #195, #198, #199, #200, #201, #204, #205, #207, #208, #209, #211, #212, #214, #216, #217, #218, #219)
- Fix a few nits in the PROTOCOL documentation (#213)
- Support timestampAsOf parameter in delta sharing data source. (#186, #187, #188)
Credits: Abhijit Chakankar, Lin Zhou, Xiaotong Sun
Delta Sharing 0.5.2
Delta Sharing 0.5.2 has one single change that adds ability to override HTTP headers included in the request to the Delta Sharing server.
- Add a Custom Http Header Provider (#192)
Credits: Xiaotong Sun
Delta Sharing 0.5.1
We are excited to announce the release of Delta Sharing 0.5.1, which introduces the following changes.
Improvements:
- Upgrade AWS SDK to 1.12.189 (#170)
- More tests on the error message when loading table fails (#164)
- Add ability to configure armeria server request timeout (#163)
- documentation improvements (#171, #179)
Bug fixes:
Credits: Antonio Irizarry, Lin Zhou, Shixiong Zhu, Pat McCauley
Delta Sharing 0.5.0
We are excited to announce the release of Delta Sharing 0.5.0, which introduces the following improvements.
Improvements:
- Support for Change Data Feed which allows clients to fetch incremental changes for the shared tables. (#135, #136, #137, #138, #140, #141, #142, #145, #146, #147, #148, #149, #150, #151, #152, #153, #155, #159)
- Include response body in HTTPError exception in Python library (#124)
- Improve the error message for the /share/schema/table APIs (#120)
- Protocol and REST API documentation improvements (#121, #128, #131)
- Add query_table_version to the rest client (#111)
Credits: Abhijit Chakankar, Alex Ott, Lin Zhou, Shixiong Zhu, William Chau, Xiaotong Sun, harksin, Kohei Toshimitsu, Vuong Nguyen
Delta Sharing 0.4.0
We are excited to announce the release of Delta Sharing 0.4.0, which introduces the following improvements and fixes.
Improvements:
- Support Google Cloud Storage on Delta Sharing Server (#81, #105)
- Add a new API to get the metadata of a Share (#97)
- Protocol and REST API documentation enhancements (#85, #89, #93, #98)
- Allow for customization of recipient profile in Apache Spark connector (#99, #107)
Bug fixes:
- Block managed table creation for Delta Sharing to prevent user confusions (#92)
Credits: Denny Lee, Lin Zhou, Shixiong Zhu, William Chau, Xiaotong Sun, Kohei Toshimitsu
Delta Sharing 0.3.0
We are excited to announce the release of Delta Sharing 0.3.0, which introduces the following improvements and fixes issues:
Improvements:
- Support Azure Blob Storage and Azure Data Lake Gen2 in Delta Sharing Server (#56, #59)
- Apache Spark Connector now can send the
limitHint
parameter when a user query is usinglimit
(#55) load_as_pandas
in Python Connector now accepts alimit
parameter to allow users fetching only a few rows to explore (#76)- Apache Spark Connector will re-fetch pre-signed urls before they expire to support long running queries (#69)
- Add a new API to list all tables in a share to save network round trips (#63, #66, #67, #88)
- Add a User-Agent header to request sent from Apache Spark Connector and Python (#75)
- Add an optional
expirationTime
field to Delta Sharing Profile File Format to provide the token expiration time (#77)
Bug fixes:
- Fix a corner case that
list_all_tables
may not return correct results in the Python Connector (#84)
Credits: Denny Lee, Felix Cheung, Lin Zhou, Matei Zaharia, Shixiong Zhu, Will Girten, Xiaotong Sun, Yuhong Chen, kohei-tosshy, William Chau
Delta Sharing 0.2.0
We are excited to announce the release of Delta Sharing 0.2.0, which introduces the following improvements and fixes multiple issues:
Improvements:
- Added official Docker images for Delta Sharing Server
- Added an examples project to show how to try the open Delta Sharing Server (#26)
- Added the
conf
directory to the Delta Sharing Server classpath to allow users to add their Hadoop configuration files in the directory (#45) - Added retry with exponential backoff for REST requests in the Python connector (#49)
Bug fixes:
- Added the minimum
fsspec
requirement in the Python connector (#23) - Fixed an issue when files in a table have no stats in the Python connector (#30)
- Improve error handling in Delta Sharing Server to report 400 Bad Request properly (#32)
- Fixed the table schema when a table is empty in the Python connector (#37)
- Fixed KeyError when there are no shared tables in the Python connector (#50)
Credits: Denny Lee, Matei Zaharia, Shixiong Zhu, Yaohua, Yuhong Chen, dobachi
Delta Sharing 0.1.0
We are excited to announce the release of Delta Sharing 0.1.0.
Delta Sharing is an open protocol for secure real-time exchange of large datasets, which enables organizations to share data in real time regardless of which computing platforms they use. It is a simple REST protocol that securely shares access to part of a cloud dataset and leverages modern cloud storage systems, such as S3, ADLS, or GCS, to reliably transfer data.
With Delta Sharing, a user accessing shared data can directly connect to it through pandas, Tableau, Apache Spark, Rust, Python, or dozens of other systems that support the open protocol, without having to deploy a specific compute platform first. This makes life simpler for both data providers and consumers. Data providers can share a dataset once to reach a broad range of consumers on any platform, and data consumers can get started using the data in minutes on their existing computing tools.
This repo includes the following components:
- Delta Sharing protocol specification.
- Python Connector: A Python library that implements the Delta Sharing Protocol to read shared tables as pandas DataFrame or Apache Spark DataFrames.
- Apache Spark Connector: An Apache Spark connector that implements the Delta Sharing Protocol to read shared tables from a Delta Sharing Server. The tables can then be accessed in SQL, Python, Java, Scala, or R.
- Delta Sharing Server: A reference implementation server for the Delta Sharing Protocol for development purposes. Users can deploy this server to share existing tables in Delta Lake and Apache Parquet format on modern cloud storage systems.
See the documentation for more details.