Releases: datahub-project/datahub
v0.14.0
Known Issues
Issue with kafka-setup missing a script for new deployments, hotfix will be released shortly
What's Changed
- fix(ingest/unity-catalog) upstream lineage for hive_metastore external table with s3 location by @dushayntAW in #10546
- feat(ingestion/looker): ingest explore tags into the DataHub by @sid-acryl in #10547
- fix(instropection): fix configuration application order by @david-leifker in #10579
- fix(ingest/slack): pull real names by @hsheth2 in #10565
- fix(ingest): Remove env deprecation message by @treff7es in #10581
- test(ingest/sql): refactor CLL generator + add tests by @hsheth2 in #10580
- docs(remote-ingestion): update description and deployment instructions by @darnaut in #10574
- fix(ingest): DataProcessInstance.emit_process_end() ignored start_timestamp_millis by @obaltian in #10539
- fix(ingest/metabase): Fix for query template expressions and invalid URNs for Text Cards by @pulsar256 in #10381
- feat(graphql): Support tagging incidents and assertions via GraphQL API by @jjoyce0510 in #10575
- docs(update): updating-datahub by @david-leifker in #10585
- docs: reorder semantics guide to the bottom by @yoonhyejin in #10541
- feat(auth): add viewTests platform privilege by @ksrinath in #10413
- feat(ingestion/SageMaker): Remove deprecated apis and add stateful ingestion capability by @TonyOuyangGit in #10573
- fix(search): fix autocomplete filter by @david-leifker in #10599
- fix(ingest/snowflake): handle column level lineage for dbt temporary tables by @john-claro-cko in #10258
- fix(mae-consumer): fix UpdateIndicesHook ignoring events with forceIndexing property set to true by @Masterchen09 in #10586
- feat(fieldpaths): prevent duplicate field paths by @david-leifker in #10590
- docs: update Town Hall page by @maggiehays in #10588
- fix(search): implement queryByDefault annotation for SearchableRef by @david-leifker in #10603
- fix(ingest/sagemaker): remove unsupported config by @hsheth2 in #10606
- feat(neo4j): combine neo4j statements in addEdge into one statement by @deepgarg-visa in #10598
- feat(neo4j): improve neo4j read query performance by specifying labels by @deepgarg-visa in #10593
- feat(ingest): fetch connections from the backend by @hsheth2 in #10511
- feat(graphql): custom complexity calculator and separate configurable thread pool for graphQL by @RyanHolstien in #10562
- feat(ingest): enable stateful ingestion safety threshold by @hsheth2 in #10516
- fix(ingest/spark): Bumping OpenLineage version to 0.14.0 by @treff7es in #10559
- fix(ingest/dbt): only generate one subtype by @hsheth2 in #10615
- fix(ingest/snowflake): make test connection logs less noisy by @hsheth2 in #10587
- fix(ingest): move status aspect fixer logic by @hsheth2 in #10591
- feat(data quality): update models, add assertions cli with snowflake integration by @mayurinehate in #10602
- fix(gms/autosuggestion): autosuggestion query not returning the result if the query text has a prefix or suffix '-' on the search field by @siladitya2 in #10512
- feat(consumers): mce-consumer throttling based on mae-consumer lag by @david-leifker in #10626
- Add support for runAssertion, runAssertions, and runAssertionsForAsset APIs by @noggi in #10605
- feat(graphql) data contract resolvers for graphql by @jayacryl in #10618
- Revert "feat(graphql) data contract resolvers for graphql" by @jayacryl in #10631
- fix(views): Add relationship annotation to GlobalViewsSettings urn by @pedro93 in #10597
- feat(cli) Delete form references when using delete CLI by @chriscollins3456 in #10629
- feat(ingest/looker): add ownership info to independent looks by @k7ragav in #10624
- log(custom-plugins): add additional logging for spring plugins by @david-leifker in #10627
- refactor(ui/glossary): Clean up term deletion by @asikowitz in #10589
- fix(views): handle unknown view when resolving a view to a filter by @darnaut in #10640
- feat(lineage): change query structure for explored hop limit by @RyanHolstien in #10607
- feat(ingest): measure sink bottlenecking by @hsheth2 in #10628
- fix(ingest/iceberg): update iceberg source to support newer versions of pyiceberg at runtime by @cccs-eric in #10614
- feat(ingest/redshift): Adding way to filter s3 paths in Redshift Source by @treff7es in #10622
- feat(businessAttribute): parallelize-business-attribute-propagation by @deepgarg-visa in #10638
- docs(ingest): remove trailing comma on athena permission by @nephtyws in #10634
- doc(roles): update privileges by @ksrinath in #10528
- docs(subscriptions): adding docs for assertion level subscriptions on managed DH by @jayacryl in #10495
- feat(ingest): add fast query fingerprinting by @hsheth2 in #10619
- fix(ingestion/airflow-plugin): updated the document for developers by @dushayntAW in #10633
- fix(ingest/trino): variable reference before define by @anshbansal in #10646
- feat(entity-client): restli batchGetV2 batchSize fix and concurrency by @david-leifker in #10630
- docs(): Adding API docs for incidents, operations, and assertions by @jjoyce0510 in #10522
- feat(ci): fix conditionals and consolidate change detection by @david-leifker in #10649
- fix(ingest/snowflake): avoid overfetching schemas from datahub by @hsheth2 in #10527
- docs: add note for subResourceType being a fieldPath by @anshbansal in #10660
- fix(ingest/qlik): improve logging for debug by @anshbansal in #10659
- fix(doc): Fix doc typo in transformer by @sid-acryl in #10658
- feat(graphql) data contract resolvers by @jayacryl in #10632
- fix(openapiv3): v3 scroll response fix by @david-leifker in #10654
- Use type: string for enum schemas by @kevin1chun in #10663
- fix(ingestion/airflow-plugin): airflow remove old tasks by @dushayntAW in #10485
- feat(platform): added db2 platform by @pankajmahato-visa in #10601
- feat(ingestion/kafka)-Add support for ingesting schemas from schema registry by @aabharti-visa in #10612
- fix(azure_ad): print request URL on error by @darnaut in #10677
- docs(ingest): Rename csv / s3 / file source and sink by @asikowitz in #10675
- feat(ingest/glue): database parameters extraction by @skrydal in #10665
- fix(azure_ad): fix infinite loop on request error by @darnaut in #10679
- perf(ingestion/fivetran): Connector performance optimization by @shubhamjagtap639 in #10556
- feat(ingest): make query formatting more robust by @hsheth2 in #10678
- feat(cli) Add actors to forms yaml API by @chriscollins3456 in #10683
- doc(glossary): add note for github action for glossary by @anshbansal in http...
v0.13.3
DataHub Release Notes
User Experience
- NEW: Business Attributes: Business Attributes are used to standardize and manage data elements across multiple domains, projects, and applications. By linking dataset attributes to Business Attributes, organizations ensure uniformity and ease of updates, as changes made to a Business Attribute are automatically propagated across all linked datasets. #9863
- Improved UI for Dataset Properties: Added collapse functionality for long dataset properties, making it easier to navigate and view relevant information. #10203
- Pagination for Ingestion Tasks Listing: Added pagination to the tasks listing page, making it easier to manage and navigate through tasks. #10293
- Rich Text Support for Form Descriptions: Added support for rich text in form descriptions, enhancing the user experience. #10425
- New Analytics Charts: Added charts in the Analytics tab to identify Top Users and New Users. #10344
- Enhanced search functionality with customizable autocomplete configuration. #10426
Developer Experience
- Unified CI Workflow Updates: Improved CI build with unified workflow updates and disk space cleanup, making the build process more efficient. #10353
- Improved Logging for GraphQL Requests: Enhanced logging for GraphQL requests, providing better insights and debugging capabilities. #10404
- Enhanced Documentation for Lineage Feature Guide: Updated documentation for the lineage feature guide, making it easier to understand and implement. #10401
- Improved Documentation for SchemaField.label: Updated documentation for SchemaField.label, providing clearer guidance for developers. #10251
- Enhanced CI with Docker Image Publishing: Added Docker image publishing capabilities to the CI workflow, streamlining the deployment process. #10193
- Redesigned Docs Site Feedback Button: Improved the design of the feedback button in the documentation, making it more user-friendly. #10182
Metadata Ingestion
- Improved Data Profiling by early filtering of tables, correctly computing sample row counts, and combining unique count queries per table. #10378, #10319, #10322
- Airflow: Introduced support for
BigQueryInsertJobOperator
. #10452 - BigQuery: Added support for Table Clones and incremental column-level lineage.
- Snowflake: Improved reporting for usage aggregation and handled lineage errors; Improved ingestion performance with system sampling on very large tables. #10279, #10430
- Glue: Introduced support for delta schemas. #10299
- Redshift: Improved usage extraction by filtering out system queries. #10247
- Mode: Enhanced ingestion for Mode by adding dashboards into containers, improving data visualization and management. #10563
- PowerBI: Added support to automatically extract table lineage between PowerBI and Databricks. #10416
- dbt: Improved dbt ingestion by handling complex SQL and enhancing documentation, providing better data management and insights. #10323
- NiFi: Enhanced ingestion for NiFi with process group as browse path and incremental lineage, improving data organization and tracking. #10202
- Incubating Sigma and CockroachDB sources. #10037, #10226
Breaking Changes
- DynamoDB Connector:
aws_region
is now a required configuration. The connector will no longer loop through all AWS regions; instead, it will only use the region passed into the recipe configuration. #10419 - Custom Validators and Mutators: Dropped a previously required constructor. #10389
- FabricType RVW: Added as a new FabricType. No rollbacks allowed once metadata with this fabric type is added without manual cleanups in databases. #10472
For full details on breaking changes, please refer to the updating DataHub documentation.
Contributors
A big thank you to all our contributors for this release!
First-Time Contributors
@bouaouda-achraf, @camilogutierrez, @dotan-mor, @egemenberk, @erikkvale, @guyr-ziprecruiter, @ishtartec, @jonasHanhan, @mrjefflewis, @noggi, @olgapenedo, @paguos, @richenc, @Rosmirose, @sagar-salvi-apptware, @timothyjin
Repeat Contributors
@ajoymajumdar, @deepgarg-visa, @dushayntAW, @filipe-caetano-ovo, @gaurav2733, @kevin1chun, @ksrinath, @Masterchen09, @mayurinehate, @ms32035, @Nelvin73, @rtekal, @sgomezvillamor, @shubhamjagtap639, @siladitya2, @skrydal
DataHub Maintainers
@anshbansal, @asikowitz, @chriscollins3456, @darnaut, @david-leifker, @eboneil, @gabe-lyons, @hsheth2, @jayacryl, @jjoyce0510, @RyanHolstien, @shirshanka , @sid-acryl, @treff7es, @yoonhyejin
Thank you all for your hard work and contributions!
What's Changed
- fix(ingest/bigquery): Supporting lineage extraction in case the select query result's target table is set on job by @treff7es in #10191
- fix(retention): fix time-based retention by @trialiya in #10118
- feat(lineage): give via and paths in entity lineage response by @RyanHolstien in #10192
- fix(ingestion/datahub): implemented the filter to ignore/include URN for ingestion by @dushayntAW in #10174
- fix(ingestion/glue): fix to ingest the comment for partition key as description by @dushayntAW in #10189
- feat(ingest/looker): cleanup usage generation code by @hsheth2 in #10153
- fix(dev): fix env file overrides for profiles by @hsheth2 in #10194
- fix(ingestion/hive): ignore sampling for tagged column/table by @dushayntAW in #10096
- fix(ui/property): add collapse for long dataset properties by @gaurav2733 in #10203
- saas release v0.3.1 release notes by @david-leifker in #10205
- fix(ingest/databricks): pin pandas for databricks ingestion by @mayurinehate in #10204
- Fixed issue where the custom defined aspects were missing from the API specification. by @ajoymajumdar in #10208
- feat(ingestion/transformer): Handle overlapping while mapping in extract ownership from tags transformer by @shubhamjagtap639 in #10201
- fix(build): avoid nested gradle commands by @hsheth2 in #10198
- feat(ingest/great_expectations): support in-memory (Pandas) data assets by @bouaouda-achraf in #9811
- ci(workflow): publish docker from pr with label by @david-leifker in #10193
- bump(version): bump classgraph version, add early package filter by @david-leifker in #10207
- fix(ingestion/mongodb): MongoDB source unable to parse datetimes with years > 9999 by @jonasHanhan in #10110
- fix(graphql-core): DomainEntitiesResolver does not support values FacetFilterInput parameter by @siladitya2 in #10188
- fix(graphql-core):Auto completion/suggestion of Domains are not working by @siladitya2 in #10150
- chore(usage-stats): measure time for getting buckets and aggregations by @darnaut in #10220
- test(search): introduce retry for search test by @david-leifker in #10206
- feat(ingest/bigquery): fix support for incremental column lineage by @hsheth2 in #10222
- fix(ingest/dbt): better dbt timestamp parsing by @hsheth2 in #10223
- feat(ingest/sql): normalize bigquery partitioned tables when parsing by @hsheth2 in #10224
- docs: fix feedback button design by @yoonhyejin in #10182
- docs: add discourse to community tab by @yoonhyejin in #10181
- docs: edit the text and destination for sign up link by @yoonhyejin in #10183
- fix(ingestion/datahub): moved urn_pattern config to source config by @dushayntAW in #10215
- fix(ingestio...
v0.13.2
Hotfix Release
Fixes MCL message deserialization bug when using internal schema registry and running specific upgrade jobs.
policyFields (enabled by default):
BOOTSTRAP_SYSTEM_UPDATE_POLICY_FIELDS_ENABLED:true
dataJobNodeCLL (disabled by default):
BOOTSTRAP_SYSTEM_UPDATE_DATA_JOB_NODE_CLL_ENABLED:false
Example Error:
Caused by: org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id 1
Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 13 out of bounds for length 2
at org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:460)
at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:283)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:188)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:161)
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:260)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:248)
Recovery Directions:
If currently affected, please remove the topic prior to upgrading to v0.13.2 to remove the corrupted message. The default topic name is MetadataChangeLog_Versioned_v1
however if you've customized the topic name be sure to remove that topic.
If running kafka per the example Helm chart for prerequisites the following command will delete the topic.
kubectl exec -it prerequisites-kafka-broker-0 -c kafka -- kafka-topics.sh --bootstrap-server localhost:9092 --delete --topic MetadataChangeLog_Versioned_v1
Full Changelog: v0.13.1...v0.13.2
v0.13.1
DataHub Release Notes
User Experience
- Capture and Manage Common Joins between Datasets: Users can now view and manage common join relationships between datasets, making it easier than ever to capture best practices and bespoke join logic. Watch the walkthrough here! 8325
- Head's up: you'll need to enable the
ER_MODEL_RELATIONSHIP_FEATURE_ENABLED
env variable to use this feature!
- Head's up: you'll need to enable the
- Enhanced UI Interactions: Users can now enjoy an improved markdown editor and filter policies by active/inactive statuses, resulting in a more intuitive and manageable interface. 9949, 9958
- Visual Context for Groups: You can now include picture links for groups in the UI, adding a richer visual context and enhancing the navigational experience. 9882
- Improved Error Visibility: The UI now displays error messages related to data size limitations, allowing for better troubleshooting and user experience. 10038
Developer Experience
- Enhanced Kafka Compatibility: Updated client version for Kafka setup ensures better compatibility and functionality for developers. 9962
- Optimized Docker Build: Docker setups now respect pip mirrors, optimizing the build process especially in restricted network environments. 9963
- Advanced Error Handling: New error handling for duplicate class names and improved
fspath
lint error management enhance the code reliability and quality. 9960, 9976 - Latest OpenSearch Image: Incorporation of OpenSearch image version 2.11.0 aligns with the latest stable releases, boosting performance and security. 9984
Metadata Ingestion
- NEW: Dagster Integration: You can now seamlessly ingest your Dagster Pipelines, Jobs, Ops, and lineage into DataHub. 10071
- Expanded Field Classification Support: This release introduces support for field-level classification during ingestion for Redshift, BigQuery, DynamoDB, and SQL Sources. 10013, 10031
- Enhanced Ingestion Capabilities: DataHub now offers stateful ingestion by default, optimizing routines for REST sinks and improving metadata accuracy across diverse sources like dbt and BigQuery. 9934, 10158, 10080
- Better Data Lineage: This release introduced support for Openlineage in service of the Spark Lineage Beta Plugin; additionally, we now support incremental Column-Level Lineage, improving the accuracy of detecting column-level relationships during ingestion.9870, 9967, 10090
- Schema Clarity: New descriptions support for JSON schema arrays and a mechanism to escape special characters in BigQuery table descriptions aid in clearer schema validation and ingestion processes. Databricks ingestion now supports Hive Metastore schemas with special characters. 9757, 9932, 10049
Version Upgrades
- Kafka client and OpenSearch image were updated to the latest versions.
Breaking Changes
This release introduces default settings for stateful ingestion and updates in handling dbt ingestion. For details on all breaking changes, view the full documentation here.
Contributors
MASSIVE shoutout to our contributors!
First-Time Contributors
akarsh991, alexs-101, AvaniSiddhapuraAPT, diegmonti, dushayntAW, filipe-caetano-ovo, HuanjieGuo, jayacryl, k7ragav, kopax-polyconseil, LePuppy, Nelvin73, pinakipb2, poorvi767, rae89, trialiya, valeral.
Repeat Contributors
ANich, shubhamjagtap639, sgomezvillamor, siladitya2, skrydal, sumitappt, Masterchen09, mayurinehate, ngamanda, gaurav2733, githendrik, jayasimhankv.
DataHub Maintainers
anshbansal, asikowitz, chriscollins3456, darnaut, david-leifker, eboneil, ethan-cartwright, gabe-lyons, hsheth2, pedro93, RyanHolstien, treff7es, yoonhyejin.
What's Changed
- bump(kafka-setup): client version bump by @david-leifker in #9962
- feat(ingest): throw codegen error on duplicate class names by @hsheth2 in #9960
- feat(docker): respect pip mirrors with uv by @hsheth2 in #9963
- Openlineage endpoint and Spark Lineage Beta Plugin by @treff7es in #9870
- fix(ingest/json-schema): adding support descriptions for array by @AvaniSiddhapuraAPT in #9757
- fix(ingest/redshift): fix bug in lineage v2 table renames by @hsheth2 in #9967
- feat(ingest): speed up to_obj() and validate() by @hsheth2 in #9969
- feat(ingest): fix fspath lint error by @hsheth2 in #9976
- docs: archive old version before 0.12.0 & fix broken links by @yoonhyejin in #9957
- fix(ui/markdown-editor): arrows change field when editing description… by @gaurav2733 in #9949
- feat(ui/policies): add filter for Active/Inactive/All on policy page by @gaurav2733 in #9958
- feat(ui): add option to add picture link for groups by @akarsh991 in #9882
- feat(ingest): add Looks subtype + stop reemitting browsePathV2 by @hsheth2 in #9978
- fix(ingest/bigquery): escape special characters for table descriptions by @AvaniSiddhapuraAPT in #9932
- feat(ui): add loading spin to access management table by @filipe-caetano-ovo in #9974
- fix(ingestion/fivetran): Fix fivetran get connector jobs bug by @shubhamjagtap639 in #9975
- feat(ingest/dbt): generate CLL for all node types by @hsheth2 in #9964
- chore(search): bump OpenSearch image version to 2.11.0 by @darnaut in #9984
- feat(ingest): enable stateful_ingestion by default for DataHub rest sink by @shubhamjagtap639 in #9934
- feat(ingestion/cli): Adding check option to validate allow/deny and path_specs by @treff7es in #9983
- fix(ingest): only import PathSpec when necessary by @hsheth2 in #9989
- feat(config): add configuration to reprocess UI sourced events by @RyanHolstien in #9988
- feat(pluginRegistry): add configuration to reduce runnable frequency by @RyanHolstien in #9990
- build(react): Fix typescript errors in test files by @sumitappt in #9982
- feat(docs): disable last update timestamps by @hsheth2 in #9987
- feat: add versioned content for 0.12.1 by @yoonhyejin in #9944
- doc: add version 0.13.0 by @yoonhyej...
v0.13.0
DataHub v0.13.0 Release Notes Summary
User Experience
- NEW - Asset Documentation Forms & UI-Editable Properties: Define specific documentation requirements via a Form, and empower your asset owners to capture their valuable knowledge via UI-Editable Properties. Watch the demo here!
- NEW - DataHub Incidents: Create, communicate, and data quality and observability incidents when they inevitably arise. Watch the demo here!
UI Improvements: Editing secrets, handling forms, and rendering token pages and lineage diagrams have been improved for a smoother user interface experience. - UI Improvements: Editing secrets, handling forms, and rendering token pages and lineage diagrams have been improved for a smoother user interface experience.
Developer Experience
- Security Upgrades: Core dependencies like shiro-core and FastAPI have been upgraded to fix vulnerabilities, ensuring a safer development environment.
- GraphQL/OpenAPI Enhancements: New GraphQL endpoints and better OpenAPI documentation provide more powerful tools for API interaction, making developers' jobs easier.
- Performance Tuning: Backend improvements for search operations and ingestion processes make the platform faster and more reliable.
Metadata Ingestion
- Platform Integrations: Enhanced support for dbt, Metabase, BigQuery, AWS Glue, Oracle, and Redshift allows for more comprehensive metadata capture, making integration with these platforms smoother.
- Ingestion Framework: The reliability of ingestion has been improved, with new capabilities like support for tags from Tableau datasources and compatibility with Airflow 2.5.0, facilitating a broader range of data synchronization tasks.
- Connector Improvements: Ingestion connectors for external data tools have been streamlined, ensuring easier integration and data synchronization.
Other Improvements and Fixes
- Enhanced internal testing frameworks with Cypress and pytest-random-order for ingestion tests.
- Simplified developer workflows with configurable Docker Compose project names in CLI.
- Addressed various ingestion-related bugs for platforms like Feast and Snowflake.
- Enhanced the UI codebase with TypeScript compilation linting and updated styles.
- Streamlined CI processes for pull requests and linting conditions.
- Version Upgrades: Upgraded pytest-docker, Pegasus, and SQLglot, among others, to improve stability and performance. Security vulnerabilities addressed by upgrading FastAPI, gitdb, and follow-redirects.
Notable Breaking Changes
- Updates to MySQL version for quickstarts and migration to Neo4j 5.x may impact existing setups.
- JDK17 build requirement and Docker Compose > 2.20 needed for building DataHub.
- Python 3.8+ requirement for the
acryl-datahub
CLI. - Changes in Unity Catalog ingestion source configs and Redshift lineage generation.
- Deprecation of Spark 2.x and associated JDK8 build requirements.
For full details on breaking changes, please visit DataHub's update guide.
Acknowledgements
A huge thank you to all our contributors for making this release possible. Your hard work and dedication are greatly appreciated.
First-Time Contributors
7onn, Adityamalik123, atjones0011, BlueHorn07, diegoreico, dim-ops, fer-marino, Gerrit-K, gp1105739, ilpianista, ingthorb, KaYunKIM, Kunal-kankriya, muzzacode, nnnkkk7, pankajmahato-visa, rubiojr, ryaminal, scalvanese452, sleeperdeep, stevenayers.
Repeat Contributors
allizex, arunvasudevan, cburroughs, feldjay, gaurav2733, iprentic, KulykDmytro, kushagra-apptware, mayurinehate, nmbryant, noggi, purnimagarg1, rinzool, sgomezvillamor, shubhamjagtap639, siddiquebagwan-gslab, siladitya2, skrydal, sumitappt, TonyOuyangGit, wngus606, yangjiandan, Salman-Apptware.
DataHub Maintainers
anshbansal, asikowitz, chriscollins3456, darnaut, david-leifker, eboneil, ethan-cartwright, gabe-lyons, hsheth2, jjoyce0510, maggiehays, pedro93, RyanHolstien, shirshanka, sid-acryl, treff7es, yoonhyejin.
What's Changed
- fix(ingest/transformer): correct registration by @anshbansal in #9418
- docs(ingest/sql-queries): Rearrange sections by @asikowitz in #9426
- fix: Adjusting the view of the Column Stats by @Salman-Apptware in #9430
- feat(patch): support fine grained lineage patches by @RyanHolstien in #9408
- fix(CVE-2023-6378): update logback classic by @RyanHolstien in #9438
- feat: allow the sidebar size to be draggable by @Salman-Apptware in #9401
- fix(json-schema): do not send invalid URLs by @anshbansal in #9417
- fix(ingest/profiling) Fixing profile eligibility check by @treff7es in #9446
- fix(ingest): avoid git dependency in dbt by @hsheth2 in #9447
- feat(ingest): add retries for tableau by @hsheth2 in #9437
- docs(updating-datahub): update docs for v0.12.1 by @david-leifker in #9441
- feat: Allow specifying Data Product URN via UI by @Salman-Apptware in #9386
- Add button to copy urn of an Ownership Type by @Salman-Apptware in #9452
- docs(ingest/tableau): add token to sink config in sample recipe by @KaYunKIM in #9411
- feat(glossary): add ability to clone glossary term(name and documentation) from term profile menu by @allizex in #9445
- feat(ingestion): Add typeUrn handling to ownership transformers by @skrydal in #9370
- fix(ingest): reduce GraphQL Logs to warning for circuit breaker by @arunvasudevan in #9436
- fix: support Apollo caching for settings / Policies by @Salman-Apptware in #9442
- refactor | PRD-785 | datahub oss: migrate use of useGetAuthenticatedU… by @sumitappt in #9456
- refactor(ui): Minor improvements & refactoring by @jjoyce0510 in #9420
- feat(ingest): add ingest
--no-progress
option by @BlueHorn07 in #9300 - fix(powerbi): add access token refresh by @anshbansal in #9405
- fix | PRD-463 | Stop trying to ping the track endpoint on login home … by @sumitappt in #9462
- feat(ingest/unity): enable hive metastore ingestion by @mayurinehate in #9416
- feat(ingestion/transformer): create tag if not exist by @siddiquebagwan-gslab in #9076
- fix(ingest): make user_urn and group_urn generation consider user and… by @shirshanka in #9026
- feat(ingestion): Add test_connection methods for important sources by @shubhamjagtap639 in #9334
- docs: fix sample command for container logs by @nnnkkk7 in #9427
- fix(ingest): bump source configs json schema version by @hsheth2 in https://github...
DataHub v0.12.1
Release Highlights
New Features
SQLAlchemy Source Enhancements: Support for view lineage across all SQLAlchemy sources (PR #9039).
Airflow Integration: Retry callback and support for ExternalTaskSensor subclasses added (PR #8514).
Kafka Enhancements: Increased Kafka message size and enabled compression (PR #9038).
JSONSchema Ingestion: Enabled schema-aware JsonSchemaTranslator (PR #8971).
Search Bar Improvements: Added a flag to hide/display the autocomplete query (PR #9104).
SQL Parser Performance: Enhancements and asyncio fixes (PR #9119).
MongoDB Ingestion: Support for stateful ingestion and improved schema inference for lists (PR #9118, PR #9145).
Policy Engine Updates: Refactoring and enhancements, including support for 10k+ policies (PR #9163, PR #9177).
UI Enhancements: Numerous improvements including command-k icons in the search bar, updated Apollo cache, and auto-complete debounce in the search bar (PR #9194, PR #9193, PR #9205).
Fivetran Integration: Connector integration for Fivetran (PR #9018).
Neo4j Database Support: Connection to specific Neo4j databases now supported (PR #9179).
Chart Subtypes in UI: Support for chart subtypes (PR #9186).
Fixes and Improvements
BigQuery Fixes: Resolved issues with lineage filter query, and fixed extracting comments from complex types (PR #9114, PR #8950).
MongoDB Refactoring: Platform instance addition to MongoDB (PR #8663).
Kafka Setup: Adjusted truststore settings for PEM files (PR #8656).
REST API Authorization: Fixed rollback failure when authorization is enabled (PR #9092).
Java Exception Handling: Addressed java.util.ConcurrentModificationException (PR #9090).
UI and Documentation: Fixed filtering logic in UI, corrected documentation errors, and added feature guides (PR #9116, PR #9125, PR #9124, PR #9126, PR #9134, PR #9137, PR #9122, PR #9068).
SQL Server and Snowflake Ingestion: Updated queries and fixed missing view downstream call (PR #9127, PR #8966).
ClickHouse and DB2 Ingestion: Addressed column reflection regression and table properties handling (PR #9143, PR #9128).
Ingestion Improvements: Numerous fixes and enhancements across various ingestion sources (PR #9153, PR #9155, PR #9141, PR #9157, PR #9123).
CI and Build Process: Tweaked workflows, increased gradle retries, and addressed CI errors (PR #9052, PR #9091, PR #9160).
Security Updates: Addressed a zookeeper CVE and other security concerns (PR #9190).
UI Refactoring: Improved entity page loading indicators and renamed button texts (PR #9195, PR #9196).
Policy and Auth Enhancements: Refactored policy locking and added roles to policy engine validation logic (PR #9178).
Testing and Continuous Integration
API Testing: Added tests for managing secrets, access token privilege, and flaky tests fix (PR #9121, PR #9167, PR #9132, PR #9175).
Cypress Test Fixes: Addressed glossary navigation and download_lineage_results tests (PR #9175, PR #9132).
Cleanup and Refactoring
Ingestion Cleanup: Removed legacy memory_leak_detector and refactored ingestion sources (PR #9158, PR #9135, PR #9120, PR #9105).
PDL Refactoring: Refactored Assertion model enums (PR #9191).
Build and Deployment
Release Preparation: Updated files for the 0.12.0 release (PR #9130).
What's Changed
- feat(ingest): support view lineage for all sqlalchemy sources by @mayurinehate in #9039
- fix(ingest/bigquery): Fixing lineage filter query by @treff7es in #9114
- refactor(ingestion/mongodb): Add platform_instance to mongodb by @nicholas-fwang in #8663
- fix(kafka-setup): Don't set truststore pass for PEM files by @mmmeeedddsss in #8656
- fix(ingest): Fix roll back failure when REST_API_AUTHORIZATION_ENABLED is set to true by @TonyOuyangGit in #9092
- (fix): Avoid java.util.ConcurrentModificationException by @rtekal in #9090
- Fix(ingest/bigquery): fix extracting comments from complex types by @maaaikoool in #8950
- docs: add versions 0.12.0 by @yoonhyejin in #9125
- fix(ui) Fix filtering logic for everwhere generating OR filters by @chriscollins3456 in #9116
- build(release): Update files for 0.12.0 release by @pedro93 in #9130
- fix(ingest/sql-server): update queries to use escaped procedure name by @mayurinehate in #9127
- feat(airflow): retry callback, support ExternalTaskSensor subclasses by @richenc in #8514
- docs: fix saasonly flags for some pages by @yoonhyejin in #9124
- fix(ingest/snowflake): missing view downstream cll if platform instance is set by @mayurinehate in #8966
- feat: Add flag to hide/display the autocomplete query for search bar by @kushagra-apptware in #9104
- docs(timeline): correct markdown heading level by @mayurinehate in #9126
- docs(graphql) Correct mutation -> query for searchAcrossLineage examples by @eboneil in #9134
- feat(kafka): increase kafka message size and enable compression by @david-leifker in #9038
- feat(ingest/jsonschema) enable schema-aware
JsonSchemaTranslator
by @KulykDmytro in #8971 - fix(metadata-ingestion): adds default value to _resolved_domain_urn i… by @alexklavensnyt in #9115
- ci: tweak to only run relevant workflows by @anshbansal in #9052
- Fix for flaky download_lineage_results cypress test by @kkorchak in #9132
- docs: Update updating-datahub.md by @pedro93 in #9131
- fix(ingest/clickhouse): pin version to solve column reflection regression by @hsheth2 in #9143
- feat(ingest/looker): cleanup error handling by @hsheth2 in #9135
- feat(ingest): add
entity_supports_aspect
helper by @hsheth2 in #9120 - feat(sqlparser): support more update syntaxes + fix bug with subqueries by @hsheth2 in #9105
- docs: correct broken doc links by @sachinsaju in #9137
- feat(ingest): sql parser perf + asyncio fixes by @hsheth2 in #9119
- feat(quickstart): fix broker InconsistentClusterIdException issues by @hsheth2 in #9148
- fix(policies): remove non-existent policies, fix name by @anshbansal in #9150
- Fix for a test that passed on Oss and failed on Saas by @kkorchak in #9147
- docs(teradata): teradata doc external link 404 fix by @sachinsaju in #9152
- fix(datahub-client): Include relocation for snakeyaml dependency. by @jiateoh in #8911
- fix(ingest): cleanup large images in CI by @hsheth2 in #9153
- build: increase gradle retries by @hsheth2 in #9091
- feat(ingest): bump sqlglot parser by @hsheth2 in #9155
- feat(ingest/mongodb): support stateful ingestion by @TonyOuyangGit in #9118
- API test for managing secrets privilege by @kkorchak in #9121
- fix(ingest): handle exceptions in min, max, mean profiling by @mayurinehate in #9129
- feat: rename Assets tab to Owner Of by @kushagra-apptware in #9141
- fix(ingest/mongodb): fix schema inference for lists of values by @hsheth2 in #9145
- fix(ingest/db2): fix handling for table properties by @deepgarg-visa in #9128
- fix(ingest): fully support MCPs in urn_iter primitive by @hsheth2 in #9157
- fix(ingest/bigquery): use correct row count in null count profiling c… by @mayurinehate in #9123
- docs: add feature guides for subscriptions and notifications by @yoonhyejin in #9122
- docs: unify oidc guides using tabs by @yoonhyejin in #9068
- chore(ingest): remove legacy memory_leak_detector by @hsheth2 in #9158
- feat(ingest/looker): support emitting unused explores by @hsheth2 in #9159
- refactor(policy): refactor policy locking, no functional difference by @david-leifker in #9163
- API test for managing access token privilege by @kkorchak in #9167
- fix(mysql-setup): quote database name by @darnaut in #9169
- fix(health): fix health check ...
v0.12.1rc2
What's Changed
- fix(deprecation): bring frontend in-sync with model by @anshbansal in #9303
- fix: fix the settings height when there are not many items by @Salman-Apptware in #9294
- docs: update recommended CLI by @anshbansal in #9307
- feat(ui): bump frontend dependencies by @ngamanda in #8353
- fix(java) Fixes NPE ES service by @chriscollins3456 in #9311
- feat(config): Configurable bootstrap of ownership types by @skrydal in #9308
- feat: update the "json-schema" version from package.json to solve json-schema vulnerability by @kushagra-apptware in #9289
Full Changelog: v0.12.1...v0.12.1rc2
v0.12.0
v0.12.0 Release Highlights
User Experience
Nested Domains
Nested Domains are here! This provides flexibility in organizing your entities within Domains to match the unique organizational structure of your company.
DataHub Chrome Extension Improvements
The Acryl DataHub Chome extension now supports PowerBI! This is a super powerful way for your business users to gain DataHub-specific insights directly in the BI tools they use most. Additionally, we now support making edits back to DataHub Entities directly from the Chrome extension.
Access Management Tab for Datasets
Shoutout to @Ramendra761 from the PayPal Team for contributing a new Access Management tab in Dataset Entity pages! The aim of this feature is to enable users to view the required roles for accessing the Dataset, as defined by Roles and/or Policies in the organization’s Access Management System. It also introduces the ability to request access directly from the page.
Metadata Ingestion
Miscellaneous Improvements
- Sampling-Based Profiling: You can now configure sampling-based profiling to address query performance concerns in Snowflake and BigQuery
- Kafka Connect > Snowflake: We now support automatically defining lineage between the two platforms
- Athena: Support for complex and nested schemas
Column-Level Lineage
We are incubating CLL support for the following:
- Airflow plugin v2 now supports automatic extraction of CLL for certain operators, removing the need to annotate DAGs
- dbt
- Redshift
- PowerBI (support for Column-Level Lineage for M-Query)
Incubating Sources
- MLflow
- Teradata
- Unity Catalog Notebooks
- DynamoDB
Developer Experience
- Data Contracts: v0.12.0 introduces underlying models and CLI; UI support to follow
- We now support creating custom models without requiring a fork of the main DataHub project
- Updates to support OpenSearch 2.x and alternate Postgres db in postgres-setup
Other Notable Changes
- Session token configuration has changed, all previously created session tokens will be invalid and users will be prompted to log in. Expiration time has also been shortened which may result in more login prompts with the default settings.
There should be no other interruption due to this change.
Breaking Changes
- #9044 - GraphQL APIs for adding ownership now expect either an
ownershipTypeUrn
referencing a customer ownership type or a (deprecated)type
. Where before adding an ownership without a concrete type was allowed, this is no longer the case. For simplicity you can use thetype
parameter which will get translated to a custom ownership type internally if one exists for the type being added. - #9010 - In Redshift source's config
incremental_lineage
is set default to off. - #8810 - Removed support for SQLAlchemy 1.3.x. Only SQLAlchemy 1.4.x is supported now.
- #8942 - Removed
urn:li:corpuser:datahub
owner for theMeasure
,Dimension
andTemporal
tags emitted
by Looker and LookML source connectors. - #8853 - The Airflow plugin no longer supports Airflow 2.0.x or Python 3.7. See the docs for more details.
- #8853 - Introduced the Airflow plugin v2. If you're using Airflow 2.3+, the v2 plugin will be enabled by default, and so you'll need to switch your requirements to include
pip install 'acryl-datahub-airflow-plugin[plugin-v2]'
. To continue using the v1 plugin, set theDATAHUB_AIRFLOW_PLUGIN_USE_V1_PLUGIN
environment variable totrue
. - #8943 - The Unity Catalog ingestion source has a new option
include_metastore
, which will cause all urns to be changed when disabled.
This is currently enabled by default to preserve compatibility, but will be disabled by default and then removed in the future.
If stateful ingestion is enabled, simply settinginclude_metastore: false
will perform all required cleanup.
Otherwise, we recommend soft deleting all databricks data via the DataHub CLI:
datahub delete --platform databricks --soft
and then reingesting withinclude_metastore: false
. - #8846 - Changed enum values in resource filters used by policies.
RESOURCE_TYPE
becameTYPE
andRESOURCE_URN
becameURN
.
Any existing policies using these filters (i.e. defined for particularurns
ortypes
such asdataset
) need to be upgraded
manually, for example by retrieving their respectivedataHubPolicyInfo
aspect and changing part using filter i.e.
"resources": {
"filter": {
"criteria": [
{
"field": "RESOURCE_TYPE",
"condition": "EQUALS",
"values": [
"dataset"
]
}
]
}
into
"resources": {
"filter": {
"criteria": [
{
"field": "TYPE",
"condition": "EQUALS",
"values": [
"dataset"
]
}
]
}
for example, using datahub put
command. Policies can also be removed and re-created via UI.
- #9077 - The BigQuery ingestion source by default sets
match_fully_qualified_names: true
. This means that anydataset_pattern
orschema_pattern
specified will be matched on the fully qualified dataset name, i.e.<project_name>.<dataset_name>
. We attempt to support the old pattern format by prepending.*\\.
to dataset patterns lacking a period, so in most cases this should not cause any issues. However, if you have a complex dataset pattern, we recommend you manually convert it to the fully qualified format to avoid any potential issues.
What's Changed
- feat(UI): AccessManagement UI to access the role metadata for a dataset by @Ramendra761 in #8541
- Glossary Navigation Cypress test by @kkorchak in #8804
- ci: upgrade python to 3.10 for builds by @hsheth2 in #8808
- feat(ingestion/looker): Add view file-path as option in view_naming_pattern config by @siddiquebagwan-gslab in #8713
- feat(upgrade): add ability to provide a startingOffset for RestoreIndices by @ukayani in #8539
- fix(index): Do not override the search analyzer for ngram fields by @iprentic in #8818
- test(managed_ingestion): fix managed ingestion test by fixing actions… by @david-leifker in #8820
- docs: add 0.11 docs to docs site by @hsheth2 in #8813
- docs(release): Update updating-datahub.md for 0.11.0 release by @iprentic in #8821
- fix(ingest/mssql): Add UNIQUEIDENTIFIER data type as String by @cjm98332 in #8642
- build(ingest): upgrade to sqlalchemy 1.4, drop 1.3 support by @mayurinehate in #8810
- fix(ingest): use epoch 1 for dev build versions by @hsheth2 in #8824
- ci: make wheel builds more robust by @hsheth2 in #8815
- feat(cli): fix upload ingest cli endpoint by @pedro93 in #8826
- docs(transformer): fix names in sample code of 'pattern_add_dataset_domain' by @Starkie in #8755
- fix(siblingsHook): check number of dbtUpstreams instead of all upStreams by @ethan-cartwright in #8817
- fix(java) Update DataProductMapper to always return a name by @chriscollins3456 in #8832
- build(ingest): Bump jsonschema for Python >= 3.8 by @asikowitz in #8836
- feat(ingest/rest-emitter): Do not raise error on retry failure to get better error messages by @asikowitz in #8837
- ci: add markdown-link-check by @yoonhyejin in #8771
- docs(managed datahub): release notes 0.2.11 by @anshbansal in #8830
- build(ingest): Remove constraint on jsonschema for Python >= 3.8 by @asikowitz in #8842
- fix(build): clean task cleanup generated src by @anshbansal in #8844
- feat(ci): disable ingestion smoke build by @anshbansal in #8845
- fix: fix quickstart page by @yoonhyejin in #8784
- feat(bigquery): add better timers around every API call by @mayurinehate in #8626
- feat(ingestion/dynamodb): Add DynamoDB as new metadata ingestion source by @TonyOuyangGit in #8768
- feat(ingest/bigquery): support bigquery profiling with sampling by @mayurinehate in #8794
- Fix for edit_documentation and glossary_navigation cypress tests by @kkorchak in #8838
- feat(ui/java) Update domains to be nested by @chriscollins3456 in #8841
- dcs(ml-models): enhancing ml model documentation ...
v0.11.0
Release Highlights
Potential Downtime
This release introduces substantial improvements to search ranking which require reindexing indices.
During the reindexing:
- a system-update job will set indices to read-only and create a backup/clone of each index
- new components will be prevented from start-up until the reindex completes
- Helm deployments will go into read-only mode and new ingestion runs will fail
This process can take anywhere from 5 minutes to multiple hours; as a rough estimate, please expect it to take 1 hour for every 2.3 million entities. After the reindex is complete, please check your ingestion run to re-run any that did not complete.
User Experience
New Search and Browse Experience
We have some really exciting improvements to the DataHub user experience in this release! The new search and browse experience, which was first made available in the previous release behind a feature flag, is now on by default. Check out our release notes for v0.10.5 to get more information and documentation on this new Browse experience.
Improvements to Search
In addition to the ranking changes mentioned above, this release includes changes to the highlighting of search entities to understand why they match your query. You can also sort your results alphabetically or by last updated times, in addition to relevance. In this release, we suggest a correction if your query has a typo in it.
Manage Home Page Posts
In this release we now enable you to create and delete pinned announcements on your DataHub homepage! If you have the “Manage Home Page Posts” platform privilege you’ll see a new section in settings called “Home Page Posts” where you can create and delete text posts and link posts that your users see on the home page.
OpenAPI Endpoints Expanded
OpenAPI entity and aspect endpoints expanded to improve developer experience when using this API with additional aspects to be added in the near future.
Metadata ingestion
Added support for Confluent S3 Sink Connector, extracting stored procedures and jobs from mssql, and snowflake shares. Additionally, sql parsing source now converts query logs into CLL and usage.
Developer Experience
The CLI now supports recursive deletes.
Versioned documentation
Starting from this release, we support versioned documentation on the datahub docs site! Select the version you’re on and browse docs specifically at that version.
Performance Improvements
- Batching of default aspects on initial ingestion (SQL)
- Improvements to multi-threading. Ingestion recipes, if previously reduced to 1 thread, can be restored to the 15 thread default.
- Gradle 7 upgrade moderately improves build speed
- DataHub Ingestion slim images reduced in size by 2GB+
Important Bug Fixes
- Glue Schema Registry fixed
Deprecation Notice
- MAE Events are no longer produced. MAE events have been deprecated for over a year.
What's Changed
- feat(ingest/presto-on-hive): enable partition key for presto-on-hive by @zheyu001 in #8380
- feat(classification): allow parallelisation to reduce time by @mayurinehate in #8368
- feat(ingest): Add metabase database id to platform instance mapping by @k-popov in #8359
- feat(ingest): add ability to read other method types than GET for OAS ingest recipes by @jsmilkstein in #8303
- fix(ingest): fix data platform urn in dataset_urn_to_key and dataset_key_to_urn by @Masterchen09 in #8209
- fix(ingest/s3): wrong sorting in case of multi-partition key by @anshbansal in #8536
- fix(ingest/presto): fix presto on hive test failures by @hsheth2 in #8548
- Cypress test for managing groups by @kkorchak in #8520
- feat(ingest/kafka-connect): add support for Confluent S3 Sink Connector by @tusharm in #8298
- Variable rename - Allows deselection of members in add members modal for a group by @Sukeerthi31 in #8529
- fix(ingest/s3): catch no such bucket exception instead of failing by @anshbansal in #8549
- fix(ingest): add tableau sqlglot dep by @hsheth2 in #8552
- fix(ingetion/mssql): convert dataset urns to lowercase by @siddiquebagwan in #8551
- Fix flaky add_user smoke test by @kkorchak in #8471
- feat(ci): use docker registry cache by @hsheth2 in #8544
- fix(glue): restore glue configurations by @RyanHolstien in #8533
- build(release): Update files for 0.10.5 release by @iprentic in #8556
- docs(release): Update updating-datahub.md for 0.10.5 release by @iprentic in #8557
- feat(ingestion/snowflake): use user email-id in urn generation for top users stat by @siddiquebagwan in #8513
- docs(development.md): Minor grammatical error by @PauloGoncalvesLima in #8558
- fix(usage): Update index lifecycle policy to not delete old datahub usage events by @iprentic in #8565
- fix(ui): Simplify background color for Entity Health Status popover by @jjoyce0510 in #8559
- fix: add --write args on pre-commit prettier by @yoonhyejin in #8560
- docs(observe): Add feature doc for Freshness Assertions by @jjoyce0510 in #8547
- docs(updating): add details on Unified Search & Browse experience by @maggiehays in #8568
- fix: fix features section by @yoonhyejin in #8571
- feat(ingest): allow lower freq profiling based on date of month/day of week by @anshbansal in #8489
- fix(stats): default to 3 months by @anshbansal in #8566
- fix(aspect): count query only for relevant aspect index by @iprentic in #8569
- feat(quickstart): bump quickstart start periods more by @hsheth2 in #8573
- Origin/cypress test for managing policies by @kkorchak in #8554
- feat(ui) Show source documentation when editing entity documentation by @chriscollins3456 in #8516
- fix(ingest): handle redaction of configs with int keys by @hsheth2 in #8545
- fix(ingest/snowflake): maintain qualified name casing, do not lowercase by @mayurinehate in #8574
- feat(docs): add github repo links to readme and docs by @yoonhyejin in #8422
- feat(ebean): Add metric in ebean aspect DAO for failed tries, as well as failed operation… by @iprentic in #8576
- refactor(search) Use search across multiple-entities API, deprecate Aggregator classes by @iprentic in #8498
- feat(siblings): dont show multiple platform icons if the siblings are ghost nodes by @gabe-lyons in #8543
- docs(lineage): Add description to make_lineage_mce by @eboneil in #8596
- doc(ingest/log): failure log at pipeline level document by @anshbansal in #8591
- Dataset ownership test by @kkorchak in #8583
- doc(release): release notes for 0.2.10 by @anshbansal in #8599
- docs(release): fix typo by @anshbansal in #8600
- feat(ui): apply views to: domains, containers, terms by @eboneil in #8572
- feat(search): embedded view dropdown by @joshuaeilers in #8598
- fix(ingest/file): remove
entity_type_counts
andaspect_counts
by @hsheth2 in #8586 - fix(ingest): use hive pure_sasl variant by @hsheth2 in #8570
- Feat(ingest/ldap)fix list index out of range error by @alplatonov in https://githu...
v0.10.5
Release Highlights
NEW: Unified Search and Browse Experience
It’s here, it’s here! We are incredibly excited to roll out our re-designed, streamlined Search and Browse experience. End-users now have a one-stop-shop to search for specific data entities and browse across systems, making it easier than ever to find the most relevant and meaningful resources within DataHub.
Checkout the screenshot below and get a full walk-through in this video!

User Experience
- Column-Level Lineage (CLL) visualization update: you can now visualize CLL relationships through DataJobs (i.e. Airflow DAGs)
- Unique Glossary Terms: We now prevent creating duplicate Glossary Term names within a Term Group
- Domains: You can now configure the Documentation tab to be the default landing page within a Domain
- Formatting updates to Row Count to make large numbers more human readable (ie. 3283337 > 3.2M)
- Stats Tab: Y-axis scale now dynamically set to reflect the minimum & maximum values, improving readability
Metadata ingestion
Ingestion Enhancements:
- BigQuery: Set
platform_instance
using project_id - PowerBI: Ingest datasets not used in visualizations (tiles/pages
- Kafka Connect: Ability to set
platform_instance
- Nifi: Support for basic auth
- Presto on Hive: Extract all table properties from Hive Metastore
- Elasticsearch: Support for basic profiling
- Add advanced configuration for LDAP manager ingestion
Lineage Improvements:
- Schema-aware SQL parsing to derive column-level lineage
- Column-level lineage support for BigQuery, Tableau, and Snowflake View definitions
- Snowflake: Extract Snowpipe S3 lineage
Developer Experience
- Fine-grained ownership policies
- PATCH support for DataJob Inputs/Outputs
- New endpoints to extract size of time-series indices and truncate/cleanup time-series indices in Elasticsearch; support for bulk-deletes
- Initial support for exception reporting via Sentry
- New OpenAPI endpoint to get Task Status
- SDK: Easily generate container URNs
Docs
- Improvements to our File-Based Lineage doc, specifically focused on Fine-Grained Lineage config components (link)
- Code examples of how to manage Posts within DataHub (link)
- Guide to generating custom browse paths for the new search experience (link)
What's Changed
- refractor(classification): datahub classifier init by @mayurinehate in #8193
- fix(glue): fix typo in reported warning, report with flow_urn by @mayurinehate in #8138
- fix(ingest/delta-lake): fix CI issues due to delta lake version bump by @mayurinehate in #8215
- Upgrade kafka and its dependencies to 3.4 in docker compose by @jinlintt in #8161
- chore(release): update default cli for managed ingestion by @pedro93 in #8226
- fix(ownership): Corrects graphQL resolver for entity operations by @pedro93 in #8219
- fix(cli/quickstart): handle docker hangs gracefully by @hsheth2 in #8211
- fix(cli): make quickstart robust to docker race conditions by @hsheth2 in #8233
- fix(search): tag/term should filter for both entity and field level by @anshbansal in #7881
- docs(tests): document test eval endpoint by @anshbansal in #8227
- feat(ingest/bigquery_v2): enable platform instance using project id by @asikowitz in #8216
- feat(stats): make rowcount more human readable by @joshuaeilers in #8232
- docs(es): Update aws deploy docs to correct ElasticSearch version by @iprentic in #8240
- feat(sdk): support patches as MCPs in file source by @hsheth2 in #8220
- fix(apiAuth): add resources where applicable and update docs by @RyanHolstien in #8234
- feat(patch): support datajob input output by @RyanHolstien in #8190
- feat(ingest/unity): Set external url for containers and datasets by @asikowitz in #8238
- docs(airflow): add docs on custom operators by @matthew-coudert-cko in #7913
- chore(release): update datahub upgrade docs by @pedro93 in #8228
- fix(ingestion/tableau): Remove unused field documentViewId by @mohdsiddique in #8225
- feat(ui): create fast path for immediate processing of ui sourced changes by @RyanHolstien in #8200
- fix(ingest/druid) Handling gracefully if no table returned in a schema by @treff7es in #8203
- fix(kafka-setup): bump kafka version by @david-leifker in #8245
- feat(ingestion/powerbi): Ingest datasets not used in PowerBI visualization(tiles/pages) by @mohdsiddique in #8212
- fix(sdk/dataflow): deprecate cluster and use env and platform_instance instead by @shubhamjagtap639 in #8201
- fix(ingest): pass platform correctly to browse path v2 helper by @asikowitz in #8244
- feat(search): Supporting Aggregations for hasX fields by @jjoyce0510 in #8241
- fix(ingest): Call validator on the base urn as well as aspect components when ingesting by @iprentic in #8250
- docs(website): adjust markprompt z-index so it's not covered by nav by @jeffmerrick in #8255
- fix(patch): Fix exception when using default patch for patching missing aspects by @jjoyce0510 in #8221
- fix(custom-search): revert underscore as quoted by @david-leifker in #8163
- chore(ci): add back optional static sleep for tests by @anshbansal in #8258
- chore(checkbox): darken all checkboxes by @joshuaeilers in #8248
- chore(assertions): catch any exception on assertion delete by @joshuaeilers in #8247
- feat(opensearch): Rollover usage events at a file size rather than time-based manner by @iprentic in #8182
- fix(ingest/okta): Set default of okta_profile_to_username_attr to email by @asikowitz in #8263
- feat(ui) Update Search & Browse to be a unified experience by @chriscollins3456 in #8235
- fix(ingest/tableau): split table columns query from datasources query by @mayurinehate in #8217
- fix(ingest/okta): Set default of okta connector to match OIDC defaults by @anshbansal in #8272
- feat(elasticsearch): Add endpoint for getting the size of timeseries indices by @iprentic in #8265
- feat(ingest/delete-cli): Add configurable batch size; update docs by @asikowitz in #8274
- fix aggregation sorting in browsev2 sidebar by @joshuaeilers in #8276
- Support de-selecting browse paths by @joshuaeilers in #8242
- feat(cli): Initial support for sending exceptions to Sentry by @treff7es in #7172
- fix(ingestion/powerbi): use admin api resolver to fetch modified workspaces by @mohdsiddique in #8273
- fix: dbt-athena types mapping for complex types by @svdimchenko in #8264
- feat(graphql) Prevent duplicate glossary term names within a group by @chriscollins3456 in #8187
- Add retries to JavaEntityClient:deleteReferencesTo by @joshuaeilers in #8268
- feat(ingest): Create zero usage aspects by @asikowitz in #8205
- fix(docs) Update Chrome extension docs to reflect current reality by @chriscollins3456 in #8284
- refactor(validations): Add URL-based Routing to Dataset Validations Tab by @jjoyce0510 in #8254
- fix(metadata-io): retry transactions on serialization errors when using a PostgreSQL database by @Masterchen09 in #8278
- docs(ingest/lineage): Update fine grained file lineage docs by @eboneil in https://github...