You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I've encountered a very strange issue. I pushed a new code, but for some odd reason, suddenly, there was an error with the docker-compose down operation (it had never happened before), and the volumes caused an error. I decided to delete the volumes and redo the CI/cdCD
It worked, but I did not realize that my S3 bucket sensor caused 600+ queued jobs (I guess for each file), went into a bit of panicked mode, terminated and deleted all the runs, and I assume some runs were not canceled properly because now I get false alerts.
I have an email on the failure sensor that now, for most jobs, sends an email alert even though the job was successful (it still works normally for failed jobs) - the false alerts come with the error of "This run has been marked as failed from outside the execution context."
The run IDs of the false alerts do not exist inside my Postgres storage - I verified that several times and even purged the volume again. purging the sensor and scheduling ticks also did not work.
I tried to change the name of the sensor, but it still doesn't solve the issue.
Impacted by this issue? Give it a 👍! We factor engagement into prioritization.
By submitting this issue, you agree to follow Dagster's Code of Conduct.
The text was updated successfully, but these errors were encountered:
What's the issue?
Hi,
I've encountered a very strange issue. I pushed a new code, but for some odd reason, suddenly, there was an error with the docker-compose down operation (it had never happened before), and the volumes caused an error. I decided to delete the volumes and redo the CI/cdCD
It worked, but I did not realize that my S3 bucket sensor caused 600+ queued jobs (I guess for each file), went into a bit of panicked mode, terminated and deleted all the runs, and I assume some runs were not canceled properly because now I get false alerts.
I have an email on the failure sensor that now, for most jobs, sends an email alert even though the job was successful (it still works normally for failed jobs) - the false alerts come with the error of "This run has been marked as failed from outside the execution context."
The run IDs of the false alerts do not exist inside my Postgres storage - I verified that several times and even purged the volume again. purging the sensor and scheduling ticks also did not work.
I tried to change the name of the sensor, but it still doesn't solve the issue.
I'm all out of tricks... please help.
What did you expect to happen?
No response
How to reproduce?
No response
Dagster version
1.8.0
Deployment type
Docker Compose
Deployment details
version: "3.8"
services:
dagster_postgresql:
image: image_url:latest
container_name: dagster_postgres_storage
restart: unless-stopped
ports:
- 5432:5432
volumes:
- dagster_postgresql_data:/var/lib/postgresql/data
networks:
- docker_dagster_network
dagster_webserver:
image: dagster-webserver-service
container_name: dagster_webserver
env_file:
- .env
restart: unless-stopped
ports:
- "3000:3000"
environment:
DAGSTER_CURRENT_IMAGE: "dagster-webserver-service"
PRODUCTION: "Y"
SYSTEM_DEFAULT_STATUS: "RUNNING"
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /tmp/io_manager_storage:/tmp/io_manager_storage
networks:
- docker_dagster_network
depends_on:
- dagster_postgresql
dagster_daemon:
image: dagster-webserver-service
container_name: dagster_daemon
env_file:
- .env
command: "dagster-daemon run"
restart: on-failure
environment:
DAGSTER_CURRENT_IMAGE: "dagster-webserver-service"
PRODUCTION: "Y"
SYSTEM_DEFAULT_STATUS: "RUNNING"
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /tmp/io_manager_storage:/tmp/io_manager_storage
networks:
- docker_dagster_network
networks:
docker_dagster_network:
driver: bridge
name: docker_dagster_network
volumes:
dagster_postgresql_data:
driver: local
-- dgaster yaml:
scheduler:
module: dagster.core.scheduler
class: DagsterDaemonScheduler
run_coordinator:
module: dagster.core.run_coordinator
class: QueuedRunCoordinator
config:
max_concurrent_runs: 5
run_launcher:
module: dagster_docker
class: DockerRunLauncher
config:
env_vars:
- env_1
- env_2
network: docker_dagster_network
container_kwargs:
volumes: # Make docker client accessible to any launched containers as well
- /var/run/docker.sock:/var/run/docker.sock
- /tmp/io_manager_storage:/tmp/io_manager_storage
auto_remove: true
run_storage:
module: dagster_postgres.run_storage
class: PostgresRunStorage
config:
postgres_db:
hostname: dagster_postgresql
username:
env: POSTGRES_USER
password:
env: POSTGRES_PASSWORD
db_name:
env: POSTGRES_DB
port: 5432
schedule_storage:
module: dagster_postgres.schedule_storage
class: PostgresScheduleStorage
config:
postgres_db:
hostname: dagster_postgresql
username:
env: POSTGRES_USER
password:
env: POSTGRES_PASSWORD
db_name:
env: POSTGRES_DB
port: 5432
event_log_storage:
module: dagster_postgres.event_log
class: PostgresEventLogStorage
config:
postgres_db:
hostname: dagster_postgresql
username:
env: POSTGRES_USER
password:
env: POSTGRES_PASSWORD
db_name:
env: POSTGRES_DB
port: 5432
telemetry:
enabled: false
nux:
enabled: false
sensors:
use_threads: true
num_workers: 8
retention:
schedule:
purge_after_days: 2
sensor:
purge_after_days:
skipped: 1
failure: 1
success: 1
Additional information
No response
Message from the maintainers
Impacted by this issue? Give it a 👍! We factor engagement into prioritization.
By submitting this issue, you agree to follow Dagster's Code of Conduct.
The text was updated successfully, but these errors were encountered: