Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partitions named after URLS prevent graph assets from running. #26421

Open
IanGallacher opened this issue Dec 11, 2024 · 2 comments
Open

Partitions named after URLS prevent graph assets from running. #26421

IanGallacher opened this issue Dec 11, 2024 · 2 comments
Labels
area: asset Related to Software-Defined Assets area: partitions Related to Partitions type: bug Something isn't working

Comments

@IanGallacher
Copy link

IanGallacher commented Dec 11, 2024

What's the issue?

Attempting to manually launch a run of a partition containing a URL in a static partitioned graph asset throws an exception on the final step of the graph.

Short error message: "ValueError: can't combine incompatible UPath protocols"

See additional information for complete error message.

What did you expect to happen?

I would expect the script to complete without errors, as the documentation for @graph_asset supports a partitions_def.

How to reproduce?

Example script that reproduces the issue:

from dagster import (
    StaticPartitionsDefinition,
    graph_asset,
    op,
)

partitions = StaticPartitionsDefinition(["https://www.example.com", "2", "3"])


@op
def op_1():
    return 1


@op
def op_2(number: int):
    return number * 2


@graph_asset(
    partitions_def=partitions,
)
def test_script():
    return op_2(op_1())

Dagster version

Reproduced on 1.9.2 and 1.9.4

Deployment type

Docker Compose

Deployment details

My docker-compose file:

services:
  dagster:
    container_name: my-dagster
    build:
      context: .
      dockerfile: Dockerfile.DAG
    image: my-dagster
    restart: unless-stopped
    entrypoint: ["dagster", "dev", "--host", "0.0.0.0"]
    network_mode: "host" # Access host ollama
    ports:
      - "3000:3000" # Web UI
    environment:
      DAGSTER_HOME: /dagster_home
    volumes:
      - ./.data:/data
      - ./.dagster_home:/dagster_home
      - ./dagster:/app

volumes:
  dagster_home:

Dockerfile.DAG (I do not believe that the dependencies are the issue but I'm including them anyway)

FROM python:3.11-slim

WORKDIR /app

COPY ./dagster/setup.py /app/setup.py

RUN pip install -e ".[dev]"
RUN python -m pip install playwright
RUN python -m pip install undetected-playwright
RUN python -m playwright install-deps
RUN python -m playwright install

COPY ./dagster /app

CMD ["dagster-webserver", "-h", "0.0.0.0", "-p", "3000"]

setup.py

from setuptools import find_packages, setup

setup(
    name="my-app",
    packages=find_packages(exclude=["tests"]),
    install_requires=[
        "bs4",
        "ollama",
        "openai",
        "pymongo",
        "dagster",
        "dagster-cloud",
        "scrapy",
        "pandas",
        "tiktoken",
        "sentence-transformers",
    ],
    extras_require={"dev": ["dagster-webserver", "pytest"]},
)

Additional information

The complete error message:

dagster._core.errors.DagsterExecutionHandleOutputError: Error occurred while handling output "result" of step "test_script.op_2":
  File "/usr/local/lib/python3.11/site-packages/dagster/_core/execution/plan/execute_plan.py", line 245, in dagster_event_sequence_for_step
    for step_event in check.generator(step_events):
  File "/usr/local/lib/python3.11/site-packages/dagster/_core/execution/plan/execute_step.py", line 506, in core_dagster_event_sequence_for_step
    for evt in _type_check_and_store_output(step_context, user_event):
  File "/usr/local/lib/python3.11/site-packages/dagster/_core/execution/plan/execute_step.py", line 553, in _type_check_and_store_output
    for evt in _store_output(step_context, step_output_handle, output):
  File "/usr/local/lib/python3.11/site-packages/dagster/_core/execution/plan/execute_step.py", line 758, in _store_output
    for elt in iterate_with_context(
  File "/usr/local/lib/python3.11/site-packages/dagster/_utils/__init__.py", line 480, in iterate_with_context
    with context_fn():
  File "/usr/local/lib/python3.11/contextlib.py", line 158, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/usr/local/lib/python3.11/site-packages/dagster/_core/execution/plan/utils.py", line 84, in op_execution_error_boundary
    raise error_cls(
The above exception was caused by the following exception:
ValueError: can't combine incompatible UPath protocols
  File "/usr/local/lib/python3.11/site-packages/dagster/_core/execution/plan/utils.py", line 54, in op_execution_error_boundary
    yield
  File "/usr/local/lib/python3.11/site-packages/dagster/_utils/__init__.py", line 482, in iterate_with_context
    next_output = next(iterator)
                  ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dagster/_core/execution/plan/execute_step.py", line 748, in _gen_fn
    gen_output = output_manager.handle_output(output_context, output.value)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dagster/_core/storage/upath_io_manager.py", line 430, in handle_output
    paths = self._get_paths_for_partitions(context)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dagster/_core/storage/upath_io_manager.py", line 245, in _get_paths_for_partitions
    return {
           ^
  File "/usr/local/lib/python3.11/site-packages/dagster/_core/storage/upath_io_manager.py", line 247, in <dictcomp>
    self.get_path_for_partition(context, asset_path, partition)
  File "/usr/local/lib/python3.11/site-packages/dagster/_core/storage/upath_io_manager.py", line 216, in get_path_for_partition
    return path / partition
           ~~~~~^~~~~~~~~~~
  File "/usr/local/lib/python3.11/pathlib.py", line 767, in __truediv__
    return self._make_child((key,))
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/upath/implementations/local.py", line 160, in _make_child
    raise ValueError("can't combine incompatible UPath protocols")

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.
By submitting this issue, you agree to follow Dagster's Code of Conduct.

@IanGallacher IanGallacher added the type: bug Something isn't working label Dec 11, 2024
@garethbrickman garethbrickman added area: asset Related to Software-Defined Assets area: partitions Related to Partitions labels Dec 23, 2024
@OwenKephart
Copy link
Contributor

Hi @IanGallacher, I was not able to replicate this error with the provided script. The observed behavior on my end is that the path gets automatically converted to <base path>/https:/www.example.com (note the removal of the second slash) when using the latest version of dagster and universal_pathlib==0.2.3. What version of universal_pathlib do you have installed? I'm wondering if that's what's making the difference here

@IanGallacher
Copy link
Author

I've created an example repository that has an exact project configuration that I'm able to reproduce the issue with. The host operating system I'm running on is Manjaro Linux.

From the python console inside the docker image:

>>> import upath
>>> upath.__version__
'0.2.6'

It appears I'm using 0.2.6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: asset Related to Software-Defined Assets area: partitions Related to Partitions type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants