Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cube Cloud failing pre-aggregation warm up - requirements.txt no running? #8923

Open
johache opened this issue Nov 7, 2024 · 0 comments
Open
Assignees
Labels
cube cloud Issues related to Cube Cloud

Comments

@johache
Copy link

johache commented Nov 7, 2024

Describe the bug
Using Cube Cloud, I think there might be something wrong with the pre-aggregation warm up instances.

  • I have a very simple scheduled_refresh_contexts in my cube.py, which depends on the databricks SDK.
  • This runs fine on my worker on API instances, but not on my pre-aggregation warm up instances
  • It's a little but hard to debug, because the pre-aggregation warm up instance only seems to exist for a fraction of a second, maybe because it fails immediately. I did manage to get a screenshot
  • I can definitely see that at least in my build job, the databricks.sdk is installed

To Reproduce
Steps to reproduce the behavior:

  1. Define your requirements.txt to install databricks-sdk
databricks-sdk
  1. Define a scheduled_refresh_contexts which depends on databricks in cube.py
from cube import config
from databricks.sdk import WorkspaceClient

# ...

@config('scheduled_refresh_contexts')
def scheduled_refresh_contexts() -> list[object]:
    databricks_workspace_client = WorkspaceClient(
        host  = os.environ.get('DATABRICKS_HOST'),
        token = os.environ.get('CUBEJS_DB_DATABRICKS_TOKEN')
    )

    # Fetch the list of schemas within the environment's catalog
    catalog_name = os.environ.get('CUBEJS_DB_DATABRICKS_CATALOG')
    schemas = databricks_workspace_client.schemas.list(catalog_name=catalog_name)

    # ...
    return security_contexts_array
  1. Enable pre-aggregation warm up in cube cloud

Expected behavior

  • dependencies from requirements.txt get installed before any instance run
  • After the env vars update on cube cloud, all contexts defined by scheduled_refresh_contexts should compile and pre-aggregate, any query hitting a pre-aggregation should pass

Actual behavior

  • This runs fine on my worker on API instances, but not on my pre-aggregation warm up instances
  • It's a little but hard to debug, because the pre-aggregation warm up instance only seems to exist for a fraction of a second, but when I do catch it, it says that databricks-sdk is not installed
  • I can definitely see that at least in my build job, the databricks.sdk is installed
  • The result is that NO pre-aggregations get built, unless the refresh_key triggers it, which can take time and leave the instance broken for extended periods of time

Screenshots
Screenshot 2024-10-18 at 1 12 42 PM
Screenshot 2024-10-18 at 1 14 27 PM
Screenshot 2024-10-18 at 1 15 44 PM

Minimally reproducible Cube Schema
Adding a cut out from my schema, but I don't think this is schema dependent. The important part is the requirements.txt and cube.py posted above

cubes:
  - name: gold_journal_lines
    sql_table: "{{ COMPILE_CONTEXT.securityContext.company_id | safe }}.gold__journal_lines"

    dimensions:
      - name: id
        sql: id
        type: string
        primary_key: true
      - name: net_amount
        sql: net_amount
        type: number
      - name: posted_on
        sql: posted_on
        type: time
    measures:
      - name: sum_net_amount
        type: sum
        sql: net_amount

    pre_aggregations:
      # Rollup Pre-aggregation with accounts and counterparties
      - name: journal_line_acc_cpt_rollup
        measures:
          - gold_journal_lines.sum_net_amount
        time_dimension: CUBE.posted_on
        granularity: month
        partition_granularity: year

Version:
Tried with 0.35.55, 1.0.1, 1.1.0

Happy to provide any additional details

@igorlukanin igorlukanin self-assigned this Nov 7, 2024
@igorlukanin igorlukanin added the cube cloud Issues related to Cube Cloud label Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cube cloud Issues related to Cube Cloud
Projects
None yet
Development

No branches or pull requests

2 participants