Open
Description
Is there an existing issue for this?
- I have searched the existing issues
Problem statement
There's no easy way to know what still needs to be migrated to UC within a given workspace
Proposed Solution
- [FEATURE] Create
migration-progress
scheduled daily workflow #2578 - [FEATURE] Support rerunning crawlers #2576
- [FEATURE] Update the inventory with the
migration-progress
crawled objects #2574 - [FEATURE] Persist the
migration-progress
crawled objects in theucx.history
table #2573- [FEATURE] Create a
ucx
catalog via a cli-command #2571 - [FEATURE] Create a
ucx.history
table #2572 - [FEATURE] Create a
ucx.workflow_runs
table #2600 - [FEATURE] Create a
ucx.errors
table #2603 - [BUG]: Dashboard and workflow refresh during the
migration-progress-experimental
job do not update the historical log #3237 - [TODO] bring back self.tables_migrator.index() #3153
- [FEATURE] Create a
- [FEATURE] Skip the
migration-progress
run when theucx
catalog does not exists #2577 - [FEATURE] Skip the
migration-progress
run when theassessment
job did not run yet #2816 - [FEATURE] Set
RemoveAfter
property onucx
catalogs in integration test #2594- Pre-requisite: Change in watchdog
- [FEATURE] Run the workflow static analysis as part of the
migration progress
workflow #2595 - [FEATURE] Visualize migration process in dashboard #2596
- [FEATURE] Encode dataclasses to a history log entry #3064
- [FEATURE]: History log encoder for clusters #3057
- [FEATURE]: History log encoder for grants #3058
- [FEATURE]: History log encoder for jobs #3059
- [FEATURE]: History log encoder for pipelines #3060
- [FEATURE]: History log encoder for tables #3061
- [FEATURE]: History log encoder for udfs #3062
- [FEATURE]: History log encoder for cluster policies #3063
- Dashboards
UsedTables
- Add used tables to table/views as failure when referencing non-migrated table
Potential challenges
- [FEATURE] Let crawlers support
append
to tables #2597 -
but for migration progress purposes, we need to overwrite the tables. or add another column with a timestamp and modify "fetch latest" queries to fetch the latest timestamp of the snapshot. fetching the latest timestamp from the snapshot allows to build a bar-chart widget to see how fast migration progresses, but we don't really care about it it. if we do fetch-latest-timestamp, all our views and dashboards would become a bit more complicated. but that's fine.--> Decision is made to keep the current ucx inventory and store history in a separate table (see proposed solution above) -
let's keep the status of migration progress in HMS (for now), but we can change this decision in a few weeks.--> Decision is made to store the migration process in a ucx catalog.
Migration process crawlers
Assessment tasks that make sense to re-run on migration-progress
workflow:
crawl_tables
assess_jobs
- potentially harden the code there as wellassess_clusters
- potentially harden as wellassess_pipelines
- potentially harden as wellcrawl_cluster_policies
assess_global_init_scripts
not to be re-run:
crawl_mounts
- we already pre-created external locationssetup_tacl
- we don't need to crawl grantscrawl_grants
- no need to, i thinkestimate_table_size_for_migration
- most likely not necessaryguess_external_locations
- we already migrated external locations by this pointassess_incompatible_submit_runs
- not going to be necessary in septemberworkspace_listing
- we are going to analyse only those notebooks that are part of jobs in the scope of static analysiscrawl_permissions
- we expect permissions to already be migratedcrawl_groups
- we expect groups to already be migrated
Additional Context
Metadata
Metadata
Assignees
Type
Projects
Status
Todo