Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify compression codec and APIs across modules #24627

Open
xin-zhang2 opened this issue Feb 25, 2025 · 1 comment
Open

Unify compression codec and APIs across modules #24627

xin-zhang2 opened this issue Feb 25, 2025 · 1 comment

Comments

@xin-zhang2
Copy link

Background

Presto use different compression codec and compressor interfaces across modules.

presto-spi

PageCompressor is defined in presto-spi as a compressor interface, which is exactly identical to io.airlift.compress.Compressor in aircompressor. The duplication was to reduce dependencies within presto-spi. However, in modules that depends on it, additional adapters are required to utilize the compressor that has been implemented in aircompressor.

presto-orc

presto-orc define its own codec in CompressionKind, supporting NONE, ZLIB, SNAPPY, LZ4, and ZSTD.

It also introduces OrcDecompressor interface.

presto-parquet

presto-parquet defines its own ParquetCompressor interface.

presto-hive

presto-hive defines its own codec in HiveCompressionCodec, supporting NONE, SNAPPY, GZIP, LZ4, and ZSTD.

Proposal

It would be better if we can unify compression codec interfaces across module.
Since most implementations of the interfaces defined in those modules are based on aircompressor, it would be a reasonable choice to adopt it as the base abstraction and extend it if specific implementation needed.

Benefits

The compression codec definitions and implementations will be consistent across modules, ensuring better compression management, eliminating unnecessary adapters, and enhancing extensibility.

@xin-zhang2
Copy link
Author

@yingsu00
My initial thought is to unify the above listed compression related APIs.
Does this proposal make sense? Or should we bring in others for a discussion?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant