-
-
Notifications
You must be signed in to change notification settings - Fork 32.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automations marked as "Still Running" After Upgrade to 2023.7 & 2023.8 #98073
Comments
Hey there @home-assistant/core, mind taking a look at this issue as it has been labeled with an integration ( Code owner commandsCode owners of
(message by CodeOwnersMention) automation documentation |
Might have the same issue here. When mode=restart, the automation is never restarted (no new trace, no updated timestamp), making a Home Assistant restart the only way to fix this. There is no relevant logging. |
Don't know if it is the same issue, but sometimes my automations are not triggered although the triggering device shows the state change in the logbook. All Zigbee devices. Automations are running without faults manually triggered. |
Manually running doesn't work here either. |
Upgraded to 2023.8.3 to try things out... Still happening... Providing some trace files trace automation.sunroom_day_on 2023-08-12T17_28_39.484164+00_00.json.zip These are two traces that get shown as "still running" The difference this time around is that I can manually go in and "disable" the affected automation without getting the system to move into an unresponsive state for automation. I had four total automations get into a "still running" in about 2 hours after my upgrade. Each could be disabled and reenabled without having to restart HA core. |
I have also started seeing this issue. I haven't found an obvious reason in the logs but my automations are now hanging regularly. Not the same ones, could be any and at random times but for me, it also appears related to when switching Z-Wave devices. |
Seeing the same issue. Commented on #97721. |
@lux4rd0 In your trace examples that aren't finished, is there another trace that happened around the same time? That is, my impression is that the mode behavior is about when an automation is triggered multiple times in parallel so i'm wondering if there was another action somewhere else that fired around the same time, or if its just that this stuck action is the one preventing others from happening. |
I can get the "still running" even if the mode is set to queued. I'm seeing other automation fail or partially fail without the "still running" too (only since 2023.7.x), but I'll have to track that stuff down for a different report. |
If one automation gets stuck - it's just for that one automation. Eventually, I'll get a second or third, or fourth one. But there's no dependency that I can determine. It's also not the same automation or in the same stacked automation order every time. When an automation gets stuck, the "already running" indicates an issue - not the problem itself. I will say that 2023.8.2 has been nice because I can disable and enable the "stuck" automation without rebooting HA. That was not the case in the previous 2023.7 / 2023.8.0/1 versions. Using this Dashboard:
I can click on "running automation" and disable and re-enable it from the panel. I might get about an hour or two before something gets stuck again... trace automation.sunroom_day_on 2023-08-14T22_34_38.501316+00_00.json.zip That's the trace for the "Sunroom Day On" that's marked as "Still Running" |
So far it seems like device actions for light with brightness_pct set seems to be persent in most reports of this. Perhaps, though, its just common to have an automation to do that. I might suggest trying variations on the automations e.g. not device actions or other device actions that don't change light |
I've had automation get stuck that only turn off lights. So not 100% the case. However - all of my turn-on actions have percentages of brightness because I have both "Day" and "Night" automations. (100% and 10% brightness). |
Mine is just turning a group (helper group of 2 non-dimming zwave light switches) off so no explicit percentages specified. Assuming off != 0% |
Same here, one of my automation that get stuck in the ''still runing'' state, only does at the last action, a service call to turn off a z-wave switch preceded by a service call to turn off many lights. I can also tell you that the switch is one that will often be affected by the dead node issue. But before, that automation would only fail and log an error with the node being unresponsive. Now it will just hang forever in the ''still runing'' state until I reboot HA or until I reboot Z-wave JS UI |
I've made an test automation that simply uses a service call to send my phone a notification instead of calling the z-wave service. That particular one hasn't hung yet... I'll keep monitoring. |
Perhaps you can confirm if one of the symptoms is really related to flaky devices. I am reading some of the changes and it seems like before what would happen is the service would wait for a timeout then proceed anyway even if the call timed out. I think what we should do instead is timeout explicitly, and fail, and allow use of |
I managed to catch one of the traces that gets stuck running:
The automation it corresponds to is as follows:
The scene in question controls a set of Z-wave devices. It looks like one of them failed to trigger (and got marked as unavailable by Z-wave JS) during the execution. I agree that in cases like this, it would be better for the automation to timeout and log an error rather than run indefinitely. |
I started seeing this sometime in the 2023.7 series as well. I see it in two automations that are simply motion lights. I.e., turn on with motion, turn off with no-motion for X min. Both are zigbee motion sensors triggering z-wave dimmer switches. One of them issues turn_on/turn_off, and the other issues turn_on with brightness parameters depending on time of day. |
I just had two different automations both using the same Zigbee motion sensor to trigger a Z-Wave light and a SMTP notification. The Z-wave got stuck - the SMTP notification worked fine. Clearly something hung up with Z-wave devices. All of my Z-wave devices are online. |
Given the linked bug (#98501) has been rejected does this mean the issue is one for owners of For some context the my case (#98491) seems to be for Zwave (and probably ZigBee) devices go to a state other than Off, such as Unavailable, after the turn_off command is sent. At least with Scripts we can monitor if it has been running for more than a certain amount of time and then run the |
Thanks for following up, apologies for the delay. The decision is that we'll only fix the root causes of the integration bugs. (The timeout was meant to be a stopgap, but it doesn't fix the underling issue so we'll switch to focus on this.). @lux4rd0 i think we should split this into one issue for zwave and one issue for zigbee, and we'll need to get the integration specific part of the issue spelled out. (It may be already i have not re-reviewed this issue) |
@iDontWantAUsername i reopened #98491 and updated it and assigned to zwave. |
I’m seeing the same issue with the 2023.9 beta that was released today. I didn’t have debug logging on but I’ll try again in the morning; I’ve got a script that turns a ton of zwave devices off that seems to hit this issue most times it’s triggered. Edit: an improvement is that I seem to be able to cancel the script even though it was stuck |
I'm running with Home Assistant 2023.8.4 a have a trace. I'm on the beta channel so hope to have the latest. zwavejs_2023-08-31.log And I don't see the script is cancelled. |
are these battery devices or mains powered? Please do share the logging when you have a chance! |
Hi, not sure if this is the right place, but I got a simple automation that using an Ikea 2-button remote to turn off/on a light (Ikea Tradfri plug). What is strange is that the 2-button remote triggers in ZHA events for the device AND the same triggers in the automation also trigger (blink blue), but the automation does not complete the actions nor is there a trace of the triggers being triggered. Its really strange. If I edit the automation (like add a space and delete a space) and save it, it works again. Its not an issue with the devices. They are online and working normally. Similar comment here #98073 (comment) |
Hey there @home-assistant/z-wave, mind taking a look at this issue as it has been labeled with an integration ( Code owner commandsCode owners of
(message by CodeOwnersMention) zwave_js documentation |
Hey there @dmulcahey, @Adminiuga, @puddly, mind taking a look at this issue as it has been labeled with an integration ( Code owner commandsCode owners of
(message by CodeOwnersMention) zha documentation |
@Anto79-ops zha right? just want to make sure I am tagging the right folks |
yes, ZHA. Started happening in 2023.8.x I'll see if I can get screen video or something. |
rather than providing videos and screenshots, I would recommend pulling whatever logs you can and indicate where in the logs your automation started |
ok thanks, I'm in beta now and so just restarted HA for b1, but now the automation works again. I will post back here when it stops working, thanks @raman325 |
OK here's logs. Home assistant version 2023.9.0b0 (docker, raspberrypi4-homeassistant), zwave-js 11.13.0. The script is cancellable but marked "still running" after last night. Edit: |
Hi, latest versions running: After turning off a Z-Wave device automation got stuck on turning off Sonoff devices. Triggered by the state of input_boolean.lightson at 4 september 2023 om 06:56:50 zwavejs_2023-09-04.log |
those of you in the beta, please update to b5 and see if that resolves the issue |
Haven't had a chance to grab the beta yet - but I've been building a new instance of HA since I'm migrating to a SONOFF Zigbee and a Zooz 800 Z-Wave Stick to see about alleviating the talked about issues of my husbzb-1. This time I'm only using my own docker containers for HA (instead of HAOS), Zigbee2MQTT (instead of ZHA), and Z-Wave JS UI (instead of Z-Wave JS). I've moved all of my automations back from "restart" to "single" and I've not had a single stuck automation the entire time I've been testing as I migrate my 100+ devices. |
I'm also experiencing this issue. |
It's specifically a zwave device that's an issue? Please set the addon/server log level to debug as well as the integration and the lib. We will need to see the debug logs from the moment the automation starts to a point where it's clear the automation won't finish |
Hi running the latest version and got more problems than before. See attached 4 automations and Z-Wave log. It even looks like a scene is not even working anymore. Home Assistant 2023.9.0 trace automation.bedlamp_avond_aan 2023-09-07T19_30_00.153936+00_00.json.txt trace automation.buitenlampen_tuin_uit 2023-09-07T18_30_00.396035+00_00..json.txt trace automation.woonkamer_wakeup_aan 2023-09-07T18_09_31.909739+00_00.json.txt trace automation.zonsondergang 2023-09-07T18_09_31.909058+00_00.json.txt |
can you set the addon log level to debug and reupload if/when this happens again? |
Yes, Debug was on. I only had a filter to log only a few Z-Wave nodes which I have removed now. |
thanks, please share with the filter off if/when the issue happens again. I am having a hard time parsing the first logs you uploaded. |
@raman325 , running now for 3 days with no problems with: |
We think this is solved for Z-Wave now. I'll close here now to easier track if further work is needed. If you still have a problem please open a new issue and describe what device for what integration is affected, with as much data as possible, debug logs etc, so we can triage the problem appropriately. |
The problem
Automations are getting stuck in versions of HA core 2023.7.0 through 2023.8.1 after upgrading from 2023.6.3.
What version of Home Assistant Core has the issue?
core-2023 7.0, core-2023 7.1. core-2023 7.2, core-2023 7.3, core-2023 8.0, core-2023 8.1
What was the last working version of Home Assistant Core?
core-2023 6.3
What type of installation are you running?
Home Assistant OS
Integration causing the issue
Z-wave and Zigbee
Link to integration documentation on our website
No response
Diagnostics information
home-assistant.log.2023.6.3.zip
home-assistant.log.2023.7.0.zip
Example YAML snippet
Anything in the logs that might be useful for us?
No response
Additional information
Watching several other issues that have been opened and closed:
#97965
#97768
#97721
#97662
#97581
https://community.home-assistant.io/t/already-running-new-automation-bug/596654/16
The text was updated successfully, but these errors were encountered: