You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Presto use different compression codec and compressor interfaces across modules.
presto-spi
PageCompressor is defined in presto-spi as a compressor interface, which is exactly identical to io.airlift.compress.Compressor in aircompressor. The duplication was to reduce dependencies within presto-spi. However, in modules that depends on it, additional adapters are required to utilize the compressor that has been implemented in aircompressor.
presto-orc
presto-orc define its own codec in CompressionKind, supporting NONE, ZLIB, SNAPPY, LZ4, and ZSTD.
presto-hive defines its own codec in HiveCompressionCodec, supporting NONE, SNAPPY, GZIP, LZ4, and ZSTD.
Proposal
It would be better if we can unify compression codec interfaces across module.
Since most implementations of the interfaces defined in those modules are based on aircompressor, it would be a reasonable choice to adopt it as the base abstraction and extend it if specific implementation needed.
Benefits
The compression codec definitions and implementations will be consistent across modules, ensuring better compression management, eliminating unnecessary adapters, and enhancing extensibility.
The text was updated successfully, but these errors were encountered:
@yingsu00
My initial thought is to unify the above listed compression related APIs.
Does this proposal make sense? Or should we bring in others for a discussion?
Background
Presto use different compression codec and compressor interfaces across modules.
presto-spi
PageCompressor is defined in presto-spi as a compressor interface, which is exactly identical to io.airlift.compress.Compressor in aircompressor. The duplication was to reduce dependencies within presto-spi. However, in modules that depends on it, additional adapters are required to utilize the compressor that has been implemented in aircompressor.
presto-orc
presto-orc define its own codec in CompressionKind, supporting NONE, ZLIB, SNAPPY, LZ4, and ZSTD.
It also introduces OrcDecompressor interface.
presto-parquet
presto-parquet defines its own ParquetCompressor interface.
presto-hive
presto-hive defines its own codec in HiveCompressionCodec, supporting NONE, SNAPPY, GZIP, LZ4, and ZSTD.
Proposal
It would be better if we can unify compression codec interfaces across module.
Since most implementations of the interfaces defined in those modules are based on aircompressor, it would be a reasonable choice to adopt it as the base abstraction and extend it if specific implementation needed.
Benefits
The compression codec definitions and implementations will be consistent across modules, ensuring better compression management, eliminating unnecessary adapters, and enhancing extensibility.
The text was updated successfully, but these errors were encountered: