-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conduct market research on c̶a̶t̶a̶l̶o̶g̶s̶ metastores #141
Comments
So I think it would be fair to say most mature enterprises will start to align around these table catalogs mostly for governance reasons, but also the other benefits afforded by the open table formats they use. Confusingly this is all slightly different to a category sometimes called Enterprise Data Catalogs which cover players like Alation, Amundsen, Atlan. These will try and position themselves as a 'catalog of catalogs':
From a UX point of view I think it would be nice for a user to configure their connections my_dataset:
type: UnityCatalogDataset
# no credentials or other stuff needed |
Definitely. Someone (or ourselves) need to bring some clarity to the terminology. I really like this exercise on orchestrators (infrastructure, scheduling, asset) for example.
We've traditionally mixed the location of the data with the data type (in-memory), which gave us a simple design that however exhibits significant flaws (kedro-org/kedro#1936 (comment), kedro-org/kedro#770 (comment)). If anything, maybe we should do a better job at separating those, given that your node functions are intimately tied to the in-memory representation of the data (if your node expects a But this is a topic for kedro-org/kedro#1936. |
Update: Unity Catalog now has a UI https://github.com/unitycatalog/unitycatalog-ui There are currently 3 documented integrations (Daft, DuckDB, Trino) http://docs.unitycatalog.io/ And looks like Spark 4.0 + Delta Lake 4.0 might give a better view on how to use Unity Catalog https://books.japila.pl/unity-catalog-internals/spark-integration/#demo But
|
Polaris was open sourced 3 days ago, see apache/polaris#2, blog announcement https://www.snowflake.com/blog/polaris-catalog-open-source/ Seems to be based on Apache Nessie by Dremio? |
To note, both Unity Catalog and Polaris implement the Apache Iceberg’s REST catalog specification. Looks like we have a winner in terms of HTTP APIs at least. |
Some questions I posed about UC unitycatalog/unitycatalog#208 (comment) |
|
|
At the moment, each platform offers only full read & write capabilities to their own catalog, and read-only capabilities for competitors:
(source)
And what's more important: data catalogs aren't new, but we're seeing catalogs created for different use cases and business needs: technical, business, and operational (source).
These are just some open source ones1 that have been in the news recently. But there's also Apache Nessie, the Hive Metastore, the Iceberg REST Catalog, probably others I'm missing. Then there are the commercial, vendor-driven ones.
And then we have... the Kedro Catalog!
We've sometimes got questions on "how does the Kedro Catalog compare to the Unity Catalog" - and the answer is that they're complementary, but this is not immediately clear to users (see kedro-org/kedro-plugins#542).
It's very clear that this is going to be a hot topic of discussion in the data engineering space in the coming months so we should have a good answer to how does Kedro interact with all these.
Footnotes
counting Polaris as open source ↩
The text was updated successfully, but these errors were encountered: