Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very sporadic behaviour of Zigbee devices #124016

Closed
mj0500 opened this issue Aug 16, 2024 · 21 comments
Closed

Very sporadic behaviour of Zigbee devices #124016

mj0500 opened this issue Aug 16, 2024 · 21 comments

Comments

@mj0500
Copy link

mj0500 commented Aug 16, 2024

The problem

Everything had been working fine and was stable until mid-day yesterday. I had not made any changes and noticed that a couple of my Zigbee based lights (philips hue and ikea tradfri) that normally turn on at sunset hadn't turned on.

At this time I was running 2024.7.4. I started troubleshooting by trying to turn on the lights through the app, but would get an error saying message failed to send (error 3074) when trying to turn on these few specific lights.

I have 37 devices connected to ZHA. Most of them are still working just fine, however about 7 bulbs are just refusing to work. I can power cycle the bulbs and sometimes they will show up in HA. I can then send one command (eg. turn off) and it will work, but then they will stop responding after that. Once in a while the bulb will flash on and off repeatedly for 5-8 cycles.

After I encountered this issues, I had a pending update from 2024.7.4 to 2024.8.1, so I went ahead with the update, however this didn't fix the problem. I then rolled back to 2024.7.4 and restored from a backup from when things were working properly, but also did not help.

I turned off my HA server for over an hour to try and get the Zigbee network to reroute, however that made no difference.

I checked RF interference using the ZHA diagnostics. I am on channel 20 and it's showing 25% utilization, which is the lowest amongst all channels.

What I am really struggling with is there is no consistency in behaviour other than which devices aren't working properly. It doesn't seem like an interference issue, as one of the problematic devices is less than 10ft from my server and within line of sight. But some other devices which are far away are working perfectly.

I captured some ZHA debug logs. What happened during this logging session is I powered off and on a device and then tried to toggle it from HA.
home-assistant_zha_2024-08-15T23-32-51.573Z.log

What version of Home Assistant Core has the issue?

2024.8.1

What was the last working version of Home Assistant Core?

2024.7.3

What type of installation are you running?

Home Assistant OS

Integration causing the issue

No response

Link to integration documentation on our website

No response

Diagnostics information

No response

Example YAML snippet

No response

Anything in the logs that might be useful for us?

No response

Additional information

No response

@home-assistant
Copy link

Hey there @dmulcahey, @Adminiuga, @puddly, @TheJulianJES, mind taking a look at this issue as it has been labeled with an integration (zha) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of zha can trigger bot actions by commenting:

  • @home-assistant close Closes the issue.
  • @home-assistant rename Awesome new title Renames the issue.
  • @home-assistant reopen Reopen the issue.
  • @home-assistant unassign zha Removes the current integration label and assignees on the issue, add the integration domain after the command.
  • @home-assistant add-label needs-more-information Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue.
  • @home-assistant remove-label needs-more-information Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


zha documentation
zha source
(message by IssueLinks)

@Mark612
Copy link

Mark612 commented Aug 17, 2024

2024.7.4

Same issue.

74 devices, and 5-6 suddenly stopped working. Those devices are about 3 feet from the controller, and the others that work are on other floors of the house hundreds of feet away.

@Mark612
Copy link

Mark612 commented Aug 17, 2024

2024.7.4

Same issue.

74 devices, and 5-6 suddenly stopped working. Those devices are about 3 feet from the controller, and the others that work are on other floors of the house hundreds of feet away.

It has resolved. At least for now. Very strange.

@apollo40
Copy link

78 devices, and 4-5 stopped working. Some are working, but very slow. Send Bulb to turn off. takes 3 minutes.

@dmulcahey
Copy link
Contributor

78 devices, and 4-5 stopped working. Some are working, but very slow. Send Bulb to turn off. takes 3 minutes.

Enable debug mode for ZHA, reproduce the issue, disable debug mode and attach the downloaded log.

@apollo40
Copy link

apollo40 commented Aug 18, 2024

Tried to reproduce. The thing is, that beside of the non working devices, other devices are totaly slow. Like Pressing a Light Button result in 3 min later turning that light on. Seems like lights are affected and smart plugs, window sensors for example updates there status quite quick.

home-assistant_zha_2024-08-18T10-25-04.167Z.log

@dmulcahey
Copy link
Contributor

Tried to reproduce. The thing is, that beside of the non working devices, other devices are totaly slow. Like Pressing a Light Button result in 3 min later turning that light on. Seems like lights are affected and smart plugs, window sensors for example updates there status quite quick.

home-assistant_zha_2024-08-18T10-25-04.167Z.log

your mesh is being flooded by Tuya mmWave and metering plugs. Start by removing them and check if the issue stops. WHILE THEY ARE REMOVED if you can still reproduce the issue please enable debug mode again, reproduce the issue, disable debug mode and attach the newly downloaded log.

@dmulcahey
Copy link
Contributor

https://www.tuyaos.com/viewtopic.php?t=3484

here you can see folks basically begging Tuya to fix this.

@apollo40
Copy link

apollo40 commented Aug 18, 2024

Seems to work now after i removed the Tuya mmwave Sensors. Funny, had them running over 1 Year in this config without any problem. It all happend in the last few days after Upgrading to 2024.8.x

Metering Plugs are still all running without any lagging.

@mj0500
Copy link
Author

mj0500 commented Aug 18, 2024

Hi @dmulcahey any chance you had a chance to look through my debug logs in the original post? I don't have any Tuya devices but I am still seeing very odd behavior from a bunch of devices and haven't been able to figure out why. I also tried downgrading to 2024.7.3 with no effect, and also updated to 2024.8.2 also with no effect. Thanks

@puddly
Copy link
Contributor

puddly commented Aug 19, 2024

@mj0500 Have you tried to power cycle all of your bulbs (off for 10 seconds, then back on), including ones that you aren't having problems with? It's possible some unrelated bulb's firmware has crashed and is affecting routing on your network.

RF interference is tricky to detect and the scan that's done is brief and only from the perspective of the coordinator. If there is interference near the bulbs but not the coordinator, this would also be a problem. Are your affected bulbs physically near one another?

@mj0500
Copy link
Author

mj0500 commented Aug 19, 2024

Totally understand on the RF interference. I had added an additional Wifi access point a few days ago, but the issues with my Zigbee devices didn't start until 3 days after that. I verified that the channel was set properly (I have 2 APs, one on ch1 and one on ch6).

I'm thinking there is a bulb somewhere that is causing a routing issue, but I haven't had any luck figuring out which one. I have tried power cycling most of my bulbs, but not the working ones, as some I have to pull out furniture to get to the outlet.

If you have any suggestions on identifying a misbehaving router that would be hugely appreciated.

Thanks!

@DarthSonic
Copy link

Same issue. 41 devices. 2 hue motion sensors stop working. Bringing them back by power cycling or re-pairing does only help temporary. Another hue light strip is getting unavailable for some time, then coming back and getting unavailable again after some time.

@elianbgr
Copy link

elianbgr commented Aug 20, 2024

I had a similar problem after upgrading to 8.1 and then 8.2. 60 zigbee2mqtt devices (on 6 coordinators) that worked flawlessly before began to randomly disappear. Not necessarily the same. The strange thing is that in the web interface of the coordinators they work normally, but in the mqtt broker they are shown as inactive. Only a cold restart fixes the problem for a very short time. Then the mess begins again. I noticed today that I have several automations marked as unavailable that have been running that way for over a year. There is no way I can get them to work even though everything in them is OK. Also, on several sensors, after a restart, new duplicate entities begin to be generated, as the previous ones become inactive but remain in the list. It strongly smells like a database problem to me.

@mj0500
Copy link
Author

mj0500 commented Aug 20, 2024

Channel change fixed it for me. I moved some of the non-working bulbs to a different location and they started working, so it must have been interference. I don't know what changed to introduce it as it was working great for months. Went from ch20 to 25 and re-paired everything and it's happy again

@puddly
Copy link
Contributor

puddly commented Aug 20, 2024

@mj0500 if you have a spare EZSP coordinator (SkyConnect, Sonoff-E, etc.) or don't mind disabling your network for half an hour to use your current one, try out zigpy/zigpy-cli#49 and post your JSON in that other linked PR. I'm curious to see if it's possible to detect the conditions you ran into.

@puddly puddly closed this as completed Aug 20, 2024
@DarthSonic
Copy link

Channel change fixed it for me. I moved some of the non-working bulbs to a different location and they started working, so it must have been interference. I don't know what changed to introduce it as it was working great for months. Went from ch20 to 25 and re-paired everything and it's happy again

well, re-pairing fixes the issue even without changing the channel. but only for limited time. a few days, not more. I doubt that changing channel will fix.

@mj0500
Copy link
Author

mj0500 commented Aug 21, 2024

@puddly unfortunately I don't have a spare but can take the network down at some point, will update here when I can.

@DarthSonic before changing channel no devices that were having issues would re-pair, I'm assuming due to interference.

@alexruffell
Copy link

Although this issue has been closed, I just wanted to chime in saying that I too started seeing 3074 errors after a recent update. Before that update things were working so smoothly, now I either have Zigbee devices not working, or doing so very slowly. I have not made any recent changes to my mesh.

@puddly
Copy link
Contributor

puddly commented Aug 23, 2024

@alexruffell if you're having issues, please open a separate issue and include the ZHA diagnostics JSON and a debug log of one of the errors happening.

@bryanklingner
Copy link

Although this issue has been closed, I just wanted to chime in saying that I too started seeing 3074 errors after a recent update. Before that update things were working so smoothly, now I either have Zigbee devices not working, or doing so very slowly. I have not made any recent changes to my mesh.

Me as well--Alex did you open a new issue? I can post my debug logs there.

@github-actions github-actions bot locked and limited conversation to collaborators Sep 22, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests