rust-v0.16.0 (2023-09-27)
Implemented enhancements:
- Expose Optimize option min_commit_interval in Python #1640
- Expose create_checkpoint_for #1513
- integration tests regularly fail for HDFS #1428
- Add Support for Microsoft OneLake #1418
- add support for atomic rename in R2 #1356
Fixed bugs:
- Writing with large arrow types (e.g. large_utf8), writes wrong partition encoding #1669
- [python] Different stringification of partition values in reader and writer #1653
- Unable to interface with data written from Spark Databricks #1651
get_last_checkpoint
does some unnecessary listing #1643PartitionWriter
'sbuffer_len
doesn't include incomplete row groups #1637- Slack community invite link has expired #1636
- delta-rs does not appear to support tables with liquid clustering #1626
- Internal Parquet panic when using a Map type. #1619
- partition_by with "$" on local filesystem #1591
- ProtocolChanged error when perfoming append write #1585
- Unable to
cargo update
using git tag or rev on Rust 1.70 #1580 - NoMetadata error when reading detlatable #1562
- Cannot read delta table:
Delta protocol violation
#1557 - Update the CODEOWNERS to capture the current reviewers and contributors #1553
- [Python] Incorrect file URIs when partition values contain escape character #1533
- add documentation how to Query Delta natively from datafusion #1485
- Python: write_deltalake to ADLS Gen2 issue #1456
- Partition values that have been url encoded cannot be read when using deltalake #1446
- Error optimizing large table #1419
- Cannot read partitions with special characters (including space) with pyarrow >= 11 #1393
- ImportError: deltalake/_internal.abi3.so: cannot allocate memory in static TLS block #1380
- Invalid JSON in log record missing field
schemaString
for DLT tables #1302 - Special characters in partition path not handled locally #1299
Merged pull requests:
- chore: bump rust crate version #1675 (rtyler)
- fix: change partitioning schema from large to normal string for pyarrow<12 #1671 (ion-elgreco)
- feat: allow to set large dtypes for the schema check in
write_deltalake
#1668 (ion-elgreco) - docs: small consistency update in guide and readme #1666 (ion-elgreco)
- fix: exception string in writer.py #1665 (sebdiem)
- chore: increment python library version #1664 (wjones127)
- docs: fix some typos #1662 (ion-elgreco)
- fix: more consistent handling of partition values and file paths #1661 (roeap)
- docs: add docstring to protocol method #1660 (MrPowers)
- docs: make docs.rs build docs with all features enabled #1658 (simonvandel)
- fix: enable offset listing for s3 #1654 (eeroel)
- chore: fix the incorrect Slack link in our readme #1649 (rtyler)
- fix: compensate for invalid log files created by Delta Live Tables #1647 (rtyler)
- chore: proposed updated CODEOWNERS to allow better review notifications #1646 (rtyler)
- feat: expose min_commit_interval to
optimize.compact
andoptimize.z_order
#1645 (ion-elgreco) - fix: avoid excess listing of log files #1644 (eeroel)
- fix: introduce support for Microsoft OneLake #1642 (rtyler)
- fix: explicitly require chrono 0.4.31 or greater #1641 (rtyler)
- fix: include in-progress row group when calculating in-memory buffer length #1638 (BnMcG)
- chore: relax chrono pin to 0.4 #1635 (houqp)
- chore: update datafusion to 31, arrow to 46 and object_store to 0.7 #1634 (houqp)
- docs: update Readme #1633 (dennyglee)
- chore: pin the chrono dependency #1631 (rtyler)
- feat: pass known file sizes to filesystem in Python #1630 (eeroel)
- feat: implement parsing for the new
domainMetadata
actions in the commit log #1629 (rtyler) - ci: fix python release #1624 (wjones127)
- ci: extend azure timeout #1622 (wjones127)
- feat: allow multiple incremental commits in optimize #1621 (kvap)
- fix: change map nullable value to false #1620 (cmackenzie1)
- Introduce the changelog for the last couple releases #1617 (rtyler)
- chore: bump python version to 0.10.2 #1616 (wjones127)
- perf: avoid holding GIL in DeltaFileSystemHandler #1615 (wjones127)
- fix: don't re-encode paths #1613 (wjones127)
- feat: use url parsing from object store #1592 (roeap)
- feat: buffered reading of transaction logs #1549 (eeroel)
- feat: merge operation #1522 (Blajda)
- feat: expose create_checkpoint_for to the public #1514 (haruband)
- docs: update Readme #1440 (roeap)
- refactor: re-organize top level modules #1434 (roeap)
- feat: integrate unity catalog with datafusion #1338 (roeap)
rust-v0.15.0 (2023-09-06)
Implemented enhancements:
- Configurable number of retries for transaction commit loop #1595
Fixed bugs:
- Unable to read table using VM Managed Identity on Azure #1462
- Unable to query by partition column #1445
Merged pull requests:
- fix: update python test #1608 (wjones127)
- chore: update datafusion to 30, arrow to 45 #1606 (scsmithr)
- fix: just make pyarrow 12 the max #1603 (wjones127)
- fix: support partial statistics in JSON #1599 (CurtHagenlocher)
- feat: allow configurable number of
commit
attempts #1596 (cmackenzie1) - fix: querying on date partitions (fixes #1445) #1594 (watfordkcf)
- refactor: clean up arrow schema defs #1590 (polynomialherder)
- feat: add metadata for operations::write::WriteBuilder #1584 (abhimanyusinghgaur)
- feat: add metadata for deletion vectors #1583 (aersam)
- fix: remove alpha classifier #1578 (marcelotrevisani)
- refactor: use pa.table.cast in delta_arrow_schema_from_pandas #1573 (ion-elgreco)
rust-v0.14.0 (2023-08-01)
Implemented enhancements:
Fixed bugs:
- Excessive integration test sizes causing builds to fail #1550
- Slack invite link is not working #1530
Merged pull requests:
- fix: correct whitespace in delta protocol reader minimum version error message #1576 (polynomialherder)
- chore: move deps to
[workspace.dependencies]
#1575 (cmackenzie1) - chore: update
datafusion
to28
and arrow to43
#1571 (cmackenzie1) - ci: don't run benchmark in debug mode #1566 (wjones127)
- ci: install newer rust for macos python release #1565 (wjones127)
- feat: make find_files public #1560 (yjshen)
- feat!: bulk delete for vacuum #1556 (Blajda)
- chore: address some integration test bloat of disk usage for development #1552 (rtyler)
- docs: port docs to mkdocs #1548 (MrPowers)
- chore: disable incremental builds in CI for saving space #1545 (rtyler)
- fix: revert premature merge of an attempted fix for binary column statistics #1544 (rtyler)
- chore: increment python version #1542 (wjones127)
- feat: add restore command in python binding #1529 (loleek)
rust-v0.13.1 (2023-07-18)
Fixed bugs:
- Revert premature merge of an attempted fix for binary column statistics #1544
rust-v0.13.0 (2023-07-15)
Implemented enhancements:
- Add nested struct supports #1518
- Support FixedLenByteArray UUID statistics as a logical scalar #1483
- Exposing create_add in the API #1458
- Update features table on README #1404
- docs(python): show data catalog options in Python API reference #1347
- Add optimization to only list log files starting at a certain name #1252
- Support configuring parquet compression #1235
- parallel processing in Optimize command #1171
Fixed bugs:
- get_add_actions() MAX is not showing complete value #1534
- Can't get stats's minValues in add actions #1515
- Pyarrow is_null filter not working as expected after loading using deltalake #1496
- Can't write to table that uses generated columns #1495
- Json error: Binary is not supported by JSON when writing checkpoint files #1493
- _last_checkpoint size field is incorrect #1468
- Error when Z Ordering a larger dataset #1459
- Timestamp parsing issue #1455
- File options are ignored when writing delta #1444
- Slack Invite Link No Longer Valid #1425
cleanup_metadata
doesn't remove.checkpoint.parquet
files #1420- The test of reading the data from the blob storage located in Azurite container failed #1415
- The test of reading the data from the bucket located in Minio container failed #1408
- Datafusion: unreachable code reached when parsing statistics with missing columns #1374
- vacuum is very slow on Cloudflare R2 #1366
Closed issues:
- Expose Compression Options or WriterProperties for writing to Delta #1469
- Support out-of-core Z-order using DataFusion #1460
- Expose Z-order in Python #1442
Merged pull requests:
- chore: fix the latest clippy warnings with the newer rustc's #1536 (rtyler)
- docs: show data catalog options in Python API reference #1532 (omkar-foss)
- fix: handle nulls in file-level stats #1520 (wjones127)
- feat: add nested struct supports #1519 (haruband)
- fix: tiny typo in AggregatedStats #1516 (haruband)
- refactor: unify with_predicate for delete ops #1512 (Blajda)
- chore: remove deprecated table functions #1511 (roeap)
- chore: update datafusion and related crates #1504 (roeap)
- feat: implement restore operation #1502 (loleek)
- chore: fix mypy failure #1500 (wjones127)
- fix: avoid writing statistics for binary columns to fix JSON error #1498 (ChewingGlass)
- feat(rust): expose WriterProperties method on RecordBatchWriter and DeltaWriter #1497 (theelderbeever)
- feat: add UUID statistics handling #1484 (atefsaw)
- feat: expose create_add to the public #1482 (atefsaw)
- fix: add
sizeInBytes
to _last_checkpoint and changesize
to # of actions #1477 (cmackenzie1) - fix(python): match Field signatures #1463 (guilhem-dvr)
- feat: handle larger z-order jobs with streaming output and spilling #1461 (wjones127)
- chore: increment python version #1449 (wjones127)
- chore: upgrade to arrow 40 and datafusion 26 #1448 (rtyler)
- feat(python): expose z-order in Python #1443 (wjones127)
- ci: prune CI/CD pipelines #1433 (roeap)
- refactor: remove
LoadCheckpointError
andApplyLogError
#1432 (roeap) - feat: update writers to include compression method in file name #1431 (Blajda)
- refactor: move checkpoint and errors into separate module #1430 (roeap)
- feat: add z-order optimize #1429 (wjones127)
- fix: casting when data to be written does not match table schema #1427 (Blajda)
- docs: update README.adoc to fix expired Slack link #1426 (dennyglee)
- chore: remove no-longer-necessary build.rs for Rust bindings #1424 (rtyler)
- chore: remove the delta-checkpoint lambda which I have moved to a new repo #1423 (rtyler)
- refactor: rewrite redundant_async_block #1422 (cmackenzie1)
- fix: update cleanup regex to include
checkpoint.parquet
files #1421 (cmackenzie1) - docs: update features table in README #1414 (ognis1205)
- fix:
get_prune_stats
returns homogenousArrayRef
#1413 (cmackenzie1) - feat: explicit python exceptions #1409 (roeap)
- feat: implement update operation #1390 (Blajda)
- feat: allow concurrent file compaction #1383 (wjones127)
rust-v0.12.0 (2023-05-30)
Implemented enhancements:
- Release delta-rs
0.11.0
(next release after0.10.0
) #1362 - Support writing statistics for date columns in Rust #1209
Fixed bugs:
- Rust writer in operations makes a lot of data copies #1394
- Unable to read timestamp fields from column statistics #1372
- Unable to write custom metadata via configuration since version 0.9.0 #1353
- .get_add_actions() returns wrong column statistics when dataSkippingNumIndexedCols property of the table was changed #1223
- Ensure decimal statistics are written correctly in Rust #1208
Merged pull requests:
- feat: add list_with_offset to DeltaObjectStore #1410 (ognis1205)
- chore: type-check friendlier exports #1407 (roeap)
- chore: remove ancillary crates from the git tree #1406 (rtyler)
- chore: bump the version for the next release #1405 (rtyler)
- feat: more efficient parquet writer and more statistics #1397 (wjones127)
- perf: improve record batch partitioning #1396 (roeap)
- chore: bump datafusion to 25 #1389 (roeap)
- refactor!: remove
DeltaDataType
aliases #1388 (cmackenzie1) - feat: vacuum with concurrent requests #1382 (wjones127)
- feat: add datafusion storage catalog #1381 (roeap)
- docs: updated schema.rs to use the right signature for decimal data type in documentation #1377 (rahulj51)
- fix: delete operation when partition and non partition columns are used #1375 (Blajda)
- fix: add conversion for string for
Field::TimestampMicros
(#1372) #1373 (cmackenzie1) - fix: allow user defined config keys #1365 (roeap)
- ci: disable full debug symbol generation #1364 (roeap)
- fix: include stats for all columns (#1223) #1342 (mrjoe7)
rust-v0.11.0 (2023-05-12)
Implemented enhancements:
- Implement simple delete case #832
Merged pull requests:
- chore: update Rust package version #1346 (rtyler)
- fix: replace deprecated arrow::json::reader::Decoder #1226 (rtyler)
- feat: delete operation #1176 (Blajda)
- feat: add
wasbs
to known schemes #1345 (iajoiner) - test: add some missing unit and doc tests for DeltaTablePartition #1341 (rtyler)
- feat: write command improvements #1267 (roeap)
- feat: added support for Databricks Unity Catalog #1331 (nohajc)
- fix: double url encode of partition key #1324 (mrjoe7)
rust-v0.10.0 (2023-05-02)
Implemented enhancements:
- Support Optimize on non-append-only tables #1125
Fixed bugs:
- DataFusion integration incorrectly handles partition columns defined "first" in schema #1168
- Datafusion: SQL projection returns wrong column for partitioned data #1292
- Unable to query partitioned tables #1291
Merged pull requests:
- chore: add deprecation notices for commit logic on
DeltaTable
#1323 (roeap) - fix: handle local paths on windows #1322 (roeap)
- fix: scan partitioned tables with datafusion #1303 (roeap)
- fix: allow special characters in storage prefix #1311 (wjones127)
- feat: upgrade to Arrow 37 and Datafusion 23 #1314 (rtyler)
- Hide the parquet/json feature behind our own JSON feature #1307 (rtyler)
- Enable the json feature for the parquet crate #1300 (rtyler)
rust-v0.9.0 (2023-04-14)
Implemented enhancements:
- hdfs support #300
- Add decimal primitive type to document #1280
- Improve error message when filtering on non-existant partition columns #1218
Fixed bugs:
- Datafusion table provider: issues with timestamp types #441
- Not matching column names when creating a RecordBatch from MapArray #1257
- All stores created using
DeltaObjectStore::new
have an identicalobject_store_url
#1188
Merged pull requests:
- Upgrade datafusion to 22 which brings arrow upgrades with it #1249 (rtyler)
- chore: df / arrow changes after update #1288 (roeap)
- feat: read schema from parquet files in datafusion scans #1266 (roeap)
- HDFS storage support via datafusion-objectstore-hdfs #1279 (iajoiner)
- Add description of decimal primitive to SchemaDataType #1281 (ognis1205)
- Fix names and nullability when creating RecordBatch from MapArray #1258 (balbok0)
- Simplify the Store Backend Configuration code #1265 (mrjoe7)
- feat: optimistic transaction protocol #632 (roeap)
- Write support for additional Arrow datatypes #1044(chitralverma)
- Unique delta object store url #1212 (gruuya)
- improve err msg on use of non-partitioned column #1221 (marijncv)
rust-v0.8.0 (2023-03-10)
Implemented enhancements:
- feat(rust): support additional types for partition values #1170
Fixed bugs:
- File pruning does not occur on partition columns #1175
- Bug: Error loading Delta table locally #1157
- Deltalake 0.7.0 with s3 feature compliation error due to rusoto_dynamodb version conflict #1191
- Writing from a Delta table scan using WriteBuilder fails due to missing object store #1186
Merged pull requests:
- build(deps): bump datafusion #1217 (roeap)
- Implement pruning on partition columns #1179 (Blajda)
- feat: enable passing storage options to Delta table builder via Datafusion's CREATE EXTERNAL TABLE #1043 (gruuya)
- feat: typed commit info #1207 (roeap)
- add boolean, date, timestamp & binary partition types #1180 (marijncv)
- feat: extend configuration handling #1206 (marijncv)
- fix: load command for local tables #1205 (roeap)
- Enable passing Datafusion session state to WriteBuilder #1187 (gruuya)
- chore: increment dynamodb_lock version #1202 (wjones127)
- fix: update out-of-date doc about datafusion #1183 (xudong963)
- feat: move and update Optimize operation #1154 (roeap)
- add test for extract_partition_values #1159 (marijncv)
- fix typo #1166 (spebern)
- chore: remove star dependencies #1139 (wjones127)
rust-v0.7.0 (2023-02-11)
Implemented enhancements:
- Support FSCK REPAIR TABLE Operation #1092
- Expose the Delta Log in a DataFrame that's easy for analysis #1031
- Provide case-insensitive storage options in backend #999
- Support local file path in CreateBuilder::with_location() #998
- Save operational params in the same way with delta io #1054 (ismoshkov)
Fixed bugs:
- DeltaTable DataFusion TableProvider does not support filter pushdown #1064
- DeltaTable DataFusion scan does not prune files properly #1063
- deltalake.DeltaTable constructor hangs in Jupyter #1093
- Transaction log JSON formatting issue when writing data via Python bindings #1017
- crates.io entry is missing link to rustdoc documentation #1076
- URL Registered with ObjectStore registry is different from url in DeltaScan #1018
- Not able to connect to Azure Storage with client id/secret #977
- Deltalake 0.5 crate s3 feature dynamodb version mismatch #973
- Overwrite mode does not work with Azure #939
- Use Chrono without default features #914
cargo test
does not run due to tls conflict #985- Azure SAS authorization fails with
<AuthenticationErrorDetail>Signature fields not well formed.
#910
Merged pull requests:
- Make rustls default across all packages #1097 (wjones127)
- Implement filesystem check #1103 (Blajda)
- refactor: move vacuum command to operations module #1045 (roeap)
- feat: enable passing storage options to Delta table builder via DataFusion's CREATE EXTERNAL TABLE #1043 (gruuya)
- feat: improve storage location handling #1065 (roeap)
- Fix to support UTC timezone #1022 (andrei-ionescu)
- feat: harmonize and simplify storage configuration #1052 (roeap)
- feat: expose function to get table of add actions #1033 (wjones127)
- fix: change unexpected field logging level to debug #1112 (houqp)
- fix: datafusion predicate pushdown and dependencies #1071 (roeap)
- fix: azure sas key url encoding #1036 (roeap)
- Add provisional workaround to support CDC #1039 #1042 (Fazzani)
- improve debuggability of json ser/de errors #1119 (houqp)
- Add an example of writing to a delta table with a RecordBatch #1085 (rtyler)
- minor: optimize partition lookup for vacuum loop #1120 (houqp)
- Add missing documentation metadata to Cargo.toml #1077 (johnbatty)
- add test for null_count_schema_for_fields #1135 (marijncv)
- add test for min_max_schema_for_fields #1122 (marijncv)
- add test for get_boolean_from_metadata #1121 (marijncv)
- add test for left_larger_than_right #1110 (marijncv)
- Add test for: to_scalar_value #1086 (marijncv)
- Fix typo in delta-inspect #1072 (byteink)
- chore: update datafusion #1114 (roeap)
rust-v0.6.0 (2022-12-16)
Implemented enhancements:
- Support Apache Arrow DataFusion 15 #1020
- Python package: Loosen version requirements for maturin #1004
- Remove
Cargo.lock
from library crates and addCargo.lock
to binary ones #1000 - More frequent Rust releases #969
- Thoughts on adding read_delta to pandas #869
- Add the support of the AWS_PROFILE environment variable for S3 #986 (fvaleye)
Fixed bugs:
- Azure SAS signatures ending in "=" don't work #1003
- Fail to compile deltalake crate, need to update dynamodb_lock in crates.io #1002
- error reading delta table to pandas: runtime dropped the dispatch task #975
- MacOS arm64 wheels are generated incorrectly #972
- Overwrite creates new file #960
- The written delta file has corrupted structure #956
- Write mode doesn't work with Azure storage #955
- Python: We don't error on reader protocol v2 #886
- Cannot open a deltatable in S3 using AWS_PROFILE based credentials from a local machine #855
Merged pull requests:
- Support DataFusion 15 #1021 (andrei-ionescu)
- fix truncating signature on SAS #1007 (damiondoesthings)
- Loosen version requirement for maturin #1005 (gyscos)
- Update
.gitignore
and add/removeCargo.lock
when appropriate #1001 (iajoiner) - fix: get azure client secret from config #981 (roeap)
- feat: check invariants in write command #980 (roeap)
- Add a new release github action for Python binding: macos with universal2 wheel #976 (fvaleye)
- Bump version of the Python binding to 0.6.4 #970 (fvaleye)
- Handle pandas timestamps #958 (hayesgb)
- test(python): add azure integration tests #912 (wjones127)
* This Changelog was automatically generated by github_changelog_generator