Skip to content

Releases: datahub-project/datahub

v1.0.0

17 Mar 14:37
Compare
Choose a tag to compare

DataHub v1.0.0

Release Highlights

DataHub v1.0.0 is packed with exciting updates, including:

  • A completely redesigned user experience focused on simplified navigation and a visually stunning interface.
  • Unified support for Data & AI, including AI Model Group Versions, AI Model Lineage, Model Stats, and Experiment/Run ingestion.
  • DataHub Iceberg Catalog, allowing users to manage Iceberg tables directly from DataHub.

Read the blog post here!

Changelog

New User Interface: Putting Usability First

With a completely re-designed user interface, DataHub v1.0 represents a fundamental rethinking of how users interact with their metadata and data assets. The new experience includes:

  • Intuitive Platform-Based Navigation - Hierarchically browse data by database and schema in Snowflake, BigQuery, Redshift, Databricks, and more. Combine hierarchical navigation with filtering by data owners, domain, tags, and glossary terms to find the right data fast.
  • Seamless Lineage Exploration - Our reimagined lineage view features multi-level expansion, name-based search, and column-level visibility, making it easier than ever to understand data relationships and impact.
  • Integrated Data Quality - Make confident decisions with deeply integrated quality signals throughout the platform, helping you quickly identify and trust reliable data assets.

DataHub Admins can enable the new UI for all users by setting the THEME_V2_DEFAULT environment variable to true; until then, Users can opt into the new experience by navigating to Settings > Appearance > Try New User Experience.

Comprehensive AI Asset Support: Unifying Data and AI

DataHub v1.0 treats AI assets as first-class citizens within the data ecosystem, allowing users to track their entire data-to-AI pipeline in one place.

  • Unified Search and Discovery: Seamlessly search across models, model groups, and traditional data assets in one unified interface.
  • Advanced Versioning System: Track multiple versions of datasets and ML models with detailed performance metrics and clear linkages between versions.
  • Rich Model Statistics: Monitor key metrics across versions, understand performance trends, and make data-driven decisions about model deployment.
  • End-to-End Lineage: Trace data flows from raw inputs through models to final outputs, with complete versioning support.

DataHub Iceberg REST Catalog Beta: Simplifying Data Lake Management

This release introduces an integration with Apace Iceberg, allowing users to manage Iceberg tables directly through DataHub, including:

  • Create and manage Iceberg tables through DataHub
  • Maintain consistent metadata across DataHub and Iceberg
  • Facilitate data discovery by exposing Iceberg table metadata in DataHub
  • Enable secure access to Iceberg tables through DataHub's permissions model

Read the docs here!

DataHub CLI

This release introduces the following improvements to our CLI:

  • Added container command to apply tags, terms, and owners on all assets within the container. [ #12418, #12436]
  • Improved delete command to optionally reference a file with a list of URNS to be deleted. [#12247]
  • Expanded ingest command to support ingesting MCPs from S3. [#12649]

Metadata Ingestion

We’re continuously improving our integrations to add new capabilities and squash bugs.

  • dbt: Added the parameter include_database_name to support including the database name in URN generation. [#12411]
  • Iceberg: Alongside our new Iceberg Catalog API, we’ve made various improvements to our Iceberg integration. [#12744]
  • MLFlow: Significantly revamped our MLFlow connector, adding support for tracking Model Group Versions and Model Stats; tracking Model lineage to underlying datasets; and capturing Experiments and Runs.
  • MSSQL: Improved support for extracting stored procedures from MS SQL. [ #12244, #12563]
  • Oracle: Improved the accuracy of column-level lineage resolution.
  • PowerBI: Improved lineage mapping so PowerBI Reports can now contain PowerBI Dashboards. [#12451]
  • Redshift: Added support for data shares and external schemas, including automatic lineage resolution across Redshift namespaces.
  • S3: Added functionality to the S3 ingestion process to ignore paths that do not match the specified depth, resolving warning messages triggered by mismatched paths. [#12326]
  • Snowflake: Added support for Snowflake Streams and Hybrid Tables, and fixed a bug with lineage resolution across table renames. [#12318]
  • Superset: (community contribution!): Added support for Superset virtual datasets and lineage. [#12679]

Additionally, we’re working on a new integration with Vertex AI. Please reach out if you’re interested in joining the beta.

Of course, this only scratches the surface of changes. This release contains 100+ improvements across 25 different integrations.

Thank You to our Contributors!

First-Time Contributors

@Bhadhri03 @brock-acryl @cccs-cat001 @davidebriscese @Deepalijain13 @dougbot01 @Haebuk @haon85 @josges @mihai103 @rajatgl17 @Rasnar @rharisi @samanthafigueredo5 @ttekampe

Repeat Contributors

@bda618 @deepgarg-visa @eagle-25 @jayasimhankv @ksrinath @llance @Masterchen09 @mayurinehate @mkamalas @PeteMango @pinakipb2 @remisalmon @sagar-salvi-apptware @svdimchenko @v-tarasevich-blitz-brain

Project Maintainers

@anshbansal @asikowitz @chakru-r @chriscollins3456 @david-leifker @gabe-lyons @hsheth2 @jayacryl @jjoyce0510 @kevinkarchacryl @pedro93 @RyanHolstien @ryota-cloud @sakethvarma397 @sgomezvillamor @shirshanka @skrydal @treff7es @yoonhyejin

View the full changelog: v0.15.0.1...v1.0.0

v1.0.0rc5

13 Mar 09:53
Compare
Choose a tag to compare
v1.0.0rc5 Pre-release
Pre-release

Full Changelog: v1.0.0rc4...v1.0.0rc5

v1.0.0rc4

12 Mar 10:33
Compare
Choose a tag to compare
v1.0.0rc4 Pre-release
Pre-release

Full Changelog: v1.0.0rc3...v1.0.0rc4

v1.0.0rc3

04 Mar 18:14
6097820
Compare
Choose a tag to compare
v1.0.0rc3 Pre-release
Pre-release

What's Changed

New Contributors

Full Changelog: v1.0.0rc2...v1.0.0rc3

v1.0.0rc2

24 Feb 20:04
8dfd8fb
Compare
Choose a tag to compare
v1.0.0rc2 Pre-release
Pre-release

What's Changed

Read more

v1.0.0rc1

30 Jan 15:01
a155470
Compare
Choose a tag to compare
v1.0.0rc1 Pre-release
Pre-release
fix(ci): disable ci telemetry modelDocUpload (#12504)

v0.15.0.1

21 Jan 15:43
Compare
Choose a tag to compare

DataHub v0.15.0.1 Release Notes

🎵 Listen to this release's theme song on Suno: Structured Flow
Shoutout to @DSchmidtDev for this genre inspo for this round!

  • Structured Properties

    • Added comprehensive support for managing structured properties, including creation, editing, deletion, and display preferences. Introduced timestamps for tracking creation and modification. [#12100, #11419]
    • Enhanced property display options with badge styling, custom column types, and configurable visibility settings in asset sidebars and schema fields. [#12111, #12052]
    • Added structured property filtering in UI with improved aggregation logic and entity metadata display. Introduced new property validators and display settings. [#12097, #12099]
  • UI Enhancements

    • Enhanced container organization with parent hierarchy labels. [#11705]
    • Added support for markdown in incident descriptions, enabling rich formatting capabilities. [#11759]
    • Improved ingestion reporting with better visibility of successful ingestions with warnings. Enhanced browse paths display for business attributes and schema fields. [#11704, #11585]
    • Added support for timeseries aspects in OpenAPI and customizable date range fields for Analytics charts. [#12096, #11366]
  • Authorization & Authentication

    • Enabled authentication and API authorization by default, with support for URN-wildcard-based policies using STARTS_WITH condition. [#11484, #11441]
    • Added authorization checks for managing Glossary terms, including privileges for ownership, domain management, and link actions. [#11337]

Metadata Ingestion

Ingestion Framework Improvements

  • Enhanced Data Source Support: Expanded ingestion capabilities for multiple platforms, including Superset (with dataset entities, schema fields, and column-level lineage), Feast (supporting tags and owners ingestion), Neo4j, and Cassandra. Added stateful ingestion support for file sources. [#11688, #11784, #11804, #11526, #11822]

  • SQL Processing Improvements: Replaced vulnerable sqlparse dependency with an in-house SQL parser, optimized CLL generation with reduced memory usage, and added special handling for MSSQL case sensitivity. Enhanced multi-query lineage support for Snowflake temporary tables. [#11645, #11708, #11920, #12020]

  • CLI Enhancements: Introduced new commands for managing ingestion, including listing source runs with filtering capabilities, undoing soft deletes with platform filtering, and listing structured properties. Added an offline flag to the SQL parser CLI. [#11740, #11980, #12012, #12283, #11635]

  • Ownership and Metadata Management: Extended ownership transformer capabilities across entities, improved glossary sync to preserve custom ownership types, and added support for multiple ownership types in glossaries and terms. Enhanced Forms CLI with additional filters for subtypes, platform instances, owners, tags, and glossary terms. [#11700, #11545, #12050, #10979]

  • Core Infrastructure Improvements: Implemented unique URN generation for all entities, added support for efficient entity ingestion through get_entity_as_mcps, improved empty field handling, and introduced progress reporting during ingestion. Added execution request cleanup job and support for dropping duplicate schema fields. [#11676, #11425, #11613, #12117, #11765, #12308]

Source-Specific Ingestion Improvements

Airflow

  • Upgraded infrastructure with support for Airflow 2.10, deprecated versions below 2.3, and improved template handling with Jinja support. Added configuration options for dag patterns and environment variables. [#11300, #11371, #11472, #11537, #11579, #12056]
  • Enhanced error handling and debugging with improved logging, fixed plugin stability issues on EMR, and added support for AthenaOperator lineage extraction. Introduced ability to disable plugin without restart. [#11857, #11877, #11880, #12098]

BigQuery

  • Enhanced data modeling capabilities with support for foreign/primary keys, BigLake tables, and improved handling of external tables. Added support for region qualifiers and partition management. [#11686, #11728, #11874, #11940]
  • Improved lineage tracking with GCS data source support and optimized query performance. Added platform resource entity generation from BigQuery labels. [#11442, #11492, #11534, #11602]
  • Enhanced profiling and performance with better type handling and size limits. Fixed issues with tag synchronization and platform instance settings. [#11807, #12060]

Dagster

  • Added support for skipping Asset ingestion, fixed input/output value formatting, and improved compatibility with latest Dagster versions (v1.9.6). Deprecated Python 3.8 support. [#11262, #11481, #12121, #12189]

dbt

  • Improved performance and functionality with node_name_patterns for faster CLL processing, support for multiple test paths, and better handling of custom owner types. [#11450, #11460, #11848]
  • Enhanced lineage handling by preventing cycles in SQL parsing and supporting multiple dataset assertions for tests. Added support for dbt Cloud's Explore page. [#11666, #11451, #12223]

Snowflake

  • Expanded support for various table types, including secure, dynamic, and hybrid tables. Enhanced lineage capabilities for renames, swaps, and external tables. [#11600, #12039, #12094, #12179]
  • Improved authentication with OAuth support and token management. Added incremental property processing and structured property support for tags. [#11888, #12048, #12080, #12285]
  • Enhanced error handling and logging with better parse failure reporting and dot handling in table names. [#12105, #12110, #12153]

Tableau

  • Enhanced project management with new path pattern filtering and improved handling of hidden assets. Added support for access roles and group permissions. [#10855, #11157, #11559]
  • Improved API integration with retry logic for various error codes (502, 504), better authentication handling, and consistent page size application. [#12213, #12216, #12233]
  • Enhanced reporting and debugging capabilities while maintaining efficient performance and proper permission handling. [#12015, #12024, #12175]

PowerBI

  • Improved M-query parsing with support for comments, better handling of quotes, and DatabricksMultiCloud native query functionality. [#12177, #11743, #11756]
  • Enhanced workspace management with cross-workspace dataset linking and app ingestion support. Added timeouts for M-query parsing. [#11560, #11629, #11753]
  • Improved error reporting and performance optimization with reduced type casting and better organization of responsibilities. [#11763, #12004]

Developer Experience

  • Entity Management: Introduced entity versioning for Datasets and ML Models, with support for version set linking. Improved timeline functionality with better handling of primary key changes and rename events. Added data transformation logic models to enhance data processing capabilities. [#11819, #11843, #12166, #12198]

  • Enhanced Configuration Management: Added new customization options through environment variables and Helm charts, including editable dataset names and configurable garbage collection scheduling. The bootstrap process has been optimized to reduce latency during installation. [#11391, #11518]

  • Development Environment Updates: Added Git support to the ingestion-base image, enabling better source control integration for ingestion workflows. [#11477]

  • Security Logging Enhancement: Improved security audit trails by adding actor URN tracking for unauthorized access attempts. [#12030]

NEW: Garbage Collection

  • Comprehensive Metadata Cleanup: Introduced a new ingestion source: DataHubGC to function as a garbage collector for managing dataflows, data jobs, and data process instances, with configurable retention policies and deletion parameters. Added dry run mode for testing cleanup operations. [#11102, #11413]

  • Performance Optimizations: Significantly improved processing speed from 1 hour to 15 minutes by implementing batch processing, optimizing queries, and removing unnecessary operations. Increased default hard delete limit from 10k to 25k entities. [#11809, #12093, #12238]

  • Reliability Improvements: Enhanced garbage collection stability with additional validation checks, improved error handling, and better process visibility through ingestion stage reporting. Fixed issues with entity deletion logic and reference handling to preserve critical lineage relationships. [#12011, #12013, #12027, #12049, #12124, #12226]

Thank You to Our Contributors!

First-Time Contributors

@AColocho, @alberttwong, @Alice-608, @Bumyu, @chakru-r, @chriscc2, @dejan2609, @donovan-acryl, @eagle-25, @hwmarkcheng, @k-bartlett, @kanavnarula, @kartikey-visa, @kevinkarchacryl, @kousiknandy, @kris48k, @llance, @margaridafernandes-trip, @mikeburke24, @raudzis, @ronybony1990, @ryota-cloud, @shepherd44, @siong-tcha, @ssidorenko, @tanguyantoine, @th0ger, @udays-visa, @udbhav-hbk, @vejeta

Repeat Contributors

@aviv-julienjehannet, @bda618, @bossenti, @darnaut, @deepgarg-visa, @DSchmidtDev, @dushayntAW, @eboneil, @ethan-cartwright, @feldjay, @githendrik, @haeniya, @Jorricks, @Masterchen09, @mkamalas, @Nbagga14, @nicholas-fwang, @noggi, @pankajmahato-visa, @pinakipb2, @rtekal, @sagar-salvi-apptware, @steffengr

DataHub Maintainers

@acrylJonny, @anshbansal, @asikowitz, @chriscollins3456, @david-leifker, @gabe-lyons, @hsheth2, @jayacryl, @jjoyce0510, @maggiehays, @mayurinehate, @pedro93, @RyanHolstien, @sakethvarma397, @sgomezvillamor, @shirshanka, @sid-acryl, @skrydal, @treff7es, @yoonhyejin...

Read more

V0.15.0

16 Jan 15:37
3108b53
Compare
Choose a tag to compare

DataHub v0.15.0 Release Notes

Please refer to v0.15.0.1 for full release notes.

What's Changed

Read more

v0.14.1

17 Sep 21:48
6a165a8
Compare
Choose a tag to compare

DataHub v0.14.1 Release Notes

User Experience

  • Enhanced Data Propagation UI: New features allow viewing propagated column documentation, source information, and asset-level propagation details. This improves visibility into data lineage and enables better understanding of data flow across the organization. (#11047)

  • Improved Search Result Tracking: Added page number to search result click events, enabling better measurement of search ranking performance. This helps users understand and optimize their search experience. (#11151)

  • Fixed Display Issues: Resolved issues with displaying "0" values for last ingested data and improved handling of multilingual characters in descriptions. These fixes ensure more accurate and readable information presentation. (#10840, #10975)

Developer Experience

  • Performance Improvements:

    • Implemented lazy dataLoaders for GraphQL queries, significantly reducing latency for local environments. (#11293)
    • Added option to log slow GraphQL queries, helping identify and address performance bottlenecks. (#11308)
    • Introduced session authorization caching for faster access checks. (#11327)
  • Enhanced Search Capabilities:

    • Added support for custom highlighting fields in GraphQL queries, allowing faster and more customizable data retrieval. (#11339)
    • Implemented new search query functionality to filter by parents/children of Domains or Containers. (#11279)
    • Added support for multiple values in 'CONTAIN', 'START_WITH', and 'END_WITH' operators, enabling more flexible and precise searches. (#11068)
  • API Improvements:

    • Extended throttling to API requests, supporting non-browser ingestion/write requests and manual throttling for better control over system load. (#11325)
    • Added support for 'START_WITH' and 'END_WITH' operators in GraphQL API, enhancing string query capabilities. (#11026)
  • Bug Fixes:

    • Resolved issues with forward slash handling in search queries, empty key-value pairs in Elasticsearch mapping, and support for various data types in object fields. These fixes improve search accuracy and data representation. (#10932, #11004, #11066)
    • Addressed Postgres regression by upgrading the ebean library from version 12.x to 15.x, resolving a read lock NPE issue. (#11379)

Metadata Ingestion

  • S3 Integration Enhancements:

    • Enhanced partition support for S3 dataset ingestion, improving metadata representation and enabling advanced partition detection. (#11083)
    • Enhanced S3 ingestion process to support reading specific file types, allowing more granular control over data ingestion. (#11177)
  • BigQuery Improvements:

    • Implemented query log extractor for BigQuery, creating "Query" entities with usage statistics, lineage, and operation details. (#10994)
    • Added support for filtering GCP project ingestion based on project labels, enabling more targeted data collection. (#11169)
    • Implemented query job retries for transient errors, improving system robustness. (#11162)
  • Snowflake Updates:

    • Added support for Iceberg tables in Snowflake access history, enhancing lineage capture capabilities. (#10961)
    • Introduced ability to define clustering key formulas for Snowflake datasets. (#11254)
    • Fixed tag exclusion issues in Snowflake ingestion process. (#11250)
  • New and Updated Connectors:

    • Added ingestion source for SAP Analytics Cloud, expanding DataHub's integration capabilities. (#10958)
    • Enhanced Salesforce connector with customizable API version and improved error messages. (#11145, #11266)
    • Updated Tableau ingestion process with new parameters and improved field type parsing. (#11255, #11202)
  • Other Ingestion Improvements:

    • Added support for MongoDB database ingestion as containers. (#11178)
    • Implemented automatic capturing of Snowflake assets with Pandas I/O Manager in Dagster module. (#11189)
    • Enhanced Fivetran ingestion with destination ID filtering capabilities. (#11277)
    • Added support for browse-only tables in Databricks ingestion. (#10766)

Other Improvements and Fixes

  • Upgraded various dependencies including Kafka, Azure Identity, Acryl-SQLglot, and GraphQL/Spring versions.
  • Improved error handling and logging across multiple components.
  • Enhanced test coverage and reliability.
  • Updated documentation for various features and processes.

Breaking Changes

Notable breaking changes include:

  • Removal of lower method from get_db_name in SQLAlchemySource, affecting URNs of related entities.
  • Changes to default sink mode and aspect handling that require server version 0.14.0+.

See the full details here.

Contributors

We extend our heartfelt thanks to all contributors for their valuable work on this release:

First-Time Contributors

@AaronYang0628, @alexandrebunn, @alisa-aylward-toast, @arpanchakra29, @esselius, @eunseokyang, @ignitz, @milindgupta, @milindgupta9, @Nbagga14, @rohansun, @sakethvarma397, @vignesh-hbk

Repeat Contributors

@deepgarg-visa, @dushayntAW, @feldjay, @filipe-caetano-ovo, @ksrinath, @Masterchen09, @matthew-coudert-cko, @mayurinehate, @nmbryant, @pinakipb2, @prashanthic23, @sagar-salvi-apptware, @siladitya2, @sleeperdeep

DataHub Maintainers

@anshbansal, @asikowitz, @chriscollins3456, @darnaut, @david-leifker, @eboneil, @hsheth2, @jjoyce0510, @maggiehays, @pedro93, @RyanHolstien, @shirshanka, @sid-acryl, @skrydal, @treff7es, @yoonhyejin

Your contributions are invaluable in making DataHub better for everyone. Thank you!

What's Changed

Read more

v0.14.0.2

21 Aug 15:29
Compare
Choose a tag to compare

DataHub v0.14.0.2 Release Notes

User Experience

  • Renamed: Validation --> Quality: The Validation tab has been renamed to Quality to make it more intuitive to end-users that it contains outcomes from data quality checks. [#10935]

  • Data Contract UI: A new Data Contract UI is now available under the Quality Tab, allowing users to handle various data assertion types and add/remove contracts more easily. [#10625]

  • Updates to Customized Search Ranking: By default, explore (* ) query results are ranked based on enrichment (tags, terms, owners, description, domains, row/column counts) as well as incident status. [#10774]

  • Custom Dataset Names: Business users can now maintain an editable dataset name separate from default properties, providing more control over dataset identification. [#10608]

  • Documentation Propagation Setting Page: A new settings page has been added to the UI for managing Documentation Propagation, giving users more control over how documentation is shared across the platform. [#11038]

Developer Experience

  • NEW: DataHub Open Assertions Specification:

    • Announcing a universal assertions specification for declaring Data Quality checks and compiling them into artifacts for use by 3rd party Data Quality tools like Great Expectations, dbt tests, and Snowflake via Data Quality DMFs. [#10609]
    • Added ability to define data quality rules using a YAML specification file, enabling users to set assertions like volume metrics and conditions, with the ability to compile and schedule them to run on Snowflake as the assertion backend. [#10602]
  • API and SDK Enhancements:

    • New GraphQL APIs added for managing forms, structured properties, and data contracts. [#10826, #10825, #10632]
    • Updates to Java and Python SDKs to support creating and updating structured properties on assets. [#10823, #10824]
    • Support for conditional write semantics including If-Modified-Since, If-Unmodified-Since, and If-Version-Match in MetadataChangeProposals (MCP) and OpenAPI. [#10868]
  • CLI Improvements:

    • A new check server-config command has been added to test server credentials and retrieve diagnostic information. [#10990]
    • The get command now includes a --details/--no-details flag for more detailed output, facilitating easier issue debugging. [#10815]
    • Update to CLI to optionally display server configuration settings. [#10676]
    • Added functionality to the CLI by introducing the ability to assign actors (users or groups) to forms in the forms YAML API. [#10683 ]
  • Improved Logging and Monitoring:

    • Unified request logging implemented across GraphQL, OpenAPI, and Restli requests, including additional information like actor, IP address, and API type. [#10802]
    • New CLI command check server-config added to test server credentials and retrieve diagnostic information. [#10990]
  • Performance Optimizations:

    • Implemented throttling for the mce-consumer based on mae-consumer lag. [#10626]
    • Unified request logging now includes additional information like actor, IP address, and API type across GraphQL, OpenAPI, and Restli requests. [#10802]
    • Added an ASYNC_BATCH mode to the rest sink for improved performance. [#10733]
    • Improved the performance of read queries in Neo4j by specifying labels and combining multiple Neo4j statements within the addEdge function into a single statement, improving efficiency and performance. [#10593, #10598]
  • Security Enhancements:

    • Updated encryption and decryption methods with a stronger cryptographic algorithm. [#11059]
    • Optimized regular expressions to prevent potential ReDoS vulnerabilities. [#10315]

Metadata Ingestion

  • New Ingestion Sources:

    • Azure Blob Storage: Added as a new ingestion source with support for Path Specs. [#10813]
    • Grafana: New connector to ingest dashboards, providing documentation within DataHub for DevOps members on call. [#10891]
    • IBM DB2: Added support for this platform. [#10601]
  • Snowflake Improvements:

    • Enhanced view lineage parsing without query-based lineage/usage. [#10905]
    • Added support for more than 10k views in a Snowflake database. [#10718]
    • Implemented parallel schema extraction for improved performance. [#10653]
    • Added snowflake-queries source for lineage, usage, queries, and operational metadata to improve performance and configurability. [#10835]
  • BigQuery Enhancements:

    • Refactored and parallelized dataset metadata extraction for better performance. [#10884]
    • Added support for new data types including BIGNUMERIC, NUMERIC, DECIMAL, BIGDECIMAL, FLOAT64, and RANGE. [#10950]
    • Added support for ingesting View labels during ingestion. [#10648]
  • Looker Updates:

    • Ingested explore tags into DataHub. [#10547]
    • Fixed issues related to CLL generation when the view definition language is SQL. [#10542]
    • Added support for including platform instance details in URNs for dashboards and charts. [#10771]
  • Other Improvements:

    • dbt: Enhanced flexibility in lineage generation with the new experimental prefer_sql_parser_lineage flag. [#11039]
    • Airflow: Task ownership info can now be set as a group rather than an individual user. [#10742]
    • Athena: Enhanced profiling capabilities to support column quantiles and medians. [#10723]
    • Fivetran: Improved connector performance for faster ingestion. [#10556]
    • SageMaker: Added stateful ingestion capability to remove deleted assets during ingestion runs. [#10573]
    • Tableau: Support added for ingesting multiple Tableau sites in a single configuration, with sites appearing as containers in DataHub. [#10498]
    • Added support for ingesting schemas from schema registry in the Kafka module. [#10612]
    • Introduced a TagsToTermMapper transformer for mapping specific tags to glossary terms. [#10758]
    • Enhanced the SQL lineage parser with an optional default_dialect parameter for customized dialect selection. [#10830]

Other Improvements and Fixes

  • Fixed high vulnerabilities related to sensitive information logging. [#11088]
  • Optimized regular expressions to prevent potential ReDoS vulnerabilities. [#10315]
  • Improved error handling and logging across various modules.
  • Enhanced test coverage for new features and existing functionality.

Breaking Changes

  • Protobuf CLI will no longer create binary encoded protoc custom properties by default.
  • Changes to Data flow info and data job info aspects may require a server upgrade.
  • OpenAPI V3 - Creation of aspects now requires wrapping within a value key.
  • Profiling configuration for Glue source has been updated.

For full details on breaking changes, please refer to the updating guide.

Contributors

Massive shoutout to all of the contributors who made this release possible:

First-Time Contributors

@aabharti-visa, @acrylJonny, @amit-apptware, @AndreasHegerNuritas, @aviv-julienjehannet, @brbrown25, @chardaway, @dragontail, @ipolding-cais, @joelmataKPN, @john-claro-cko, @jordanjeremy, @lima-renan, @nadavgross, @nephtyws, @obaltian, @PeamThom, @pie1nthesky, @pulsar256, @samblackk, @shtephlee, @simaov, @steffengr, @tkdrahn, @TristanHeisler, @wornjs, @xkollar

Repeat Contributors

@ajoymajumdar, @bossenti, @cburroughs, @cccs-eric, @deepgarg-visa, @dushayntAW, @fjmacagno, @githendrik, @haeniya, @jayasimhankv, @k7ragav, @kevin1chun, @ksrinath, @Kunal-kankriya, @looppi, @Masterchen09, @mayurinehate, @ngamanda, @nmbryant, @noggi, @pankajmahato-visa, @PatrickfBraz, @pinakipb2, @Rajasekhar-Vuppala, @rtekal, @sagar-salvi-apptware, @shubhamjagtap639, @siladitya2, @ssilb4, @Sukeerthi31, @sumitappt, @TonyOuyangGit, @walter9388

DataHub Maintainers

@anshbansal, @asikowitz, @chriscollins3456, @darnaut, @david-leifker, @eboneil, @ethan-cartwright, @gabe-lyons, @hsheth2, @jayacryl, @jjoyce0510, @maggiehays, @pedro93, @RyanHolstien, @shirshanka, @sid-acryl, @skrydal, @treff7es, @yoonhyejin

What's Changed

Read more