-
-
Notifications
You must be signed in to change notification settings - Fork 30
[ozw.library] [critical]: Error - Node: 0 ERROR: Not enough space in stream buffer #150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm experiencing something similar with the 114 beta release. I don't know if it is related to your issue since my devices start to work after a while or rebooting the ozw docker container. This is what I reported in the HA beta discord channel: When I reboot homeassistant all z-wave devices are unavailable and unusable untill I manually use the device or it reports some information in case it is a sensor. When I force a reboot of the (external) OZW docker everything is available when OZW has finished loading. I noticed this on the dev builds to, but since it is development that can happen. I tried the several dev builds again to see when it stopped working for me and that is 0.114.0-dev20200724. So 0.114.0-dev20200723 does work as expected. My production system which runs 113.3 keeps working without any problem and keeps receiving all the information. (I also have quite a few devices, 80+) |
Also reporting this issue. The ozwdaemon process in the container goes to 100% CPU and the log is filled with these buffer errors.
Possibly related: OpenZWave/open-zwave#673 |
Add another one to the list with this issue. 80+ device network, Rpi 3b+, HUSBZB-1 stick plugged into pi. Worked fine in 1.4 in homeassistant. In 1.6...Upon daemon startup, MQTT status shows "offline" and never changes. However, in the logs you can see ozw polling all of the nodes... but then a few min go by (maybe 5) and it starts with lods of "valueRemoved" and "Not enough space in stream buffer":
|
My issue seems to be triggered after the network has started -which takes about 15-20 mins. Thdd Ed error occurs when I try refresh a node that is reviving packets and has neighbours-but doesn’t resolve with a type. It seems to kill ozw without fail |
Having exactly the same problem (both the "Not enough space in stream buffer" and "delValue: Removing Value ..."), with the network not starting. I'm not even able to view the OZW UI (it stays on the connect screen with ConnectingState 0%). EDIT: Just noticed that before it fails, I have some logs related to the stick (not sure what they mean) and MQTT failures:
|
One month later, B115b0, same error - occurred this time after removing a node from the network (a working node, can't seem to get dead nodes out) - Can't heal either it seems... anyway , the logs 20200909 16:14:24.314 ACST] [ozw.notifications] [debug]: Notification pvt_valueRemoved: 281475091841041 Thread: 0x7f6fe47b3d48 |
I've been receiving this error in ozwadmin, but changing the Network Object Cache to >1,000 in the ozwadmin preferences pane fixes it. I have no idea in which file its making that change in the backend but it works. |
@blhoward2 what do you mean you have been getting this error in ozwadmin? you mean you've been seeing it the ozw logs which you are viewing through ozwadmin? And you had a reproducible case, and this fixed it? |
I meant that I get this same error in a pop-up in ozwadmin. At least I think its the same error...it's been a few weeks since I did it and I haven't received it since. It is possible that I'm wrong and it was a similar error but it should be easy enough for someone experiencing the error to test. If it is truly the USB stream buffer size in the kernel, has anyone tried raising the allocated USB memory? https://support.pixelink.com/support/solutions/articles/3000054087-image-transfer-fails-to-start-when-image-size-is-bigger-than-2-mb#:~:text=By%20default%2C%20the%20Linux%20kernel,time%20before%20data%20loss%20occurs. |
@blhoward2 I think the error you're talking about is this one home-assistant/addons#1415 (comment) |
this issue remains. I have now two different configs. I can reproduce this 100% of the time |
I tried to increase the USB stream buffer like @blhoward2 mentioned and I haven’t seen this problem since. |
@karl-gustav you did that on the host machine or in the docker image somewhere? And I think this still leaves all of the people running Home Assistant supervised install on a Pi out of luck... |
What I posted is a boot-level command line option added to the boot loader. It has nothing to do with docker as the issue is a Linux kernel issue. I’ve never used Supervised but I’d be surprised if there is no way to pass boot loader commands during boot. |
if the issue is a Linux kernel problem, why it doesnt happen with 1.4 and also why it happens with a specific zwave config file? |
I don't have the issue so I can't test it, but I suspect that 1.6 is requesting a dump from the usb device in a different manner or order and so it is receiving a larger dump, which depending on the data being dumped is overflowing the stream buffer. It's not so much a "problem" as the stream buffer is set too small by default in the kernel for the amount of data being retrieved at once in this instance. |
Try this to temporarily change the stream buffer size, and see if it fixes it: sudo modprobe usbcore usbfs_memory_mb=1000 |
i can't find usbcore as a module:
|
If it’s compiled in then you’d have to do it as a command prompt during booting I believe. |
Just tried changing usbfs_memory_mb to 500 and it didn't solve the issue |
Just to be sure. Did you check in the /sys folder in the link I posted to confirm that it took after you restarted? |
cat /sys/module/usbcore/parameters/usbfs_memory_mb ? Yep, prints 500. |
Definitely overkill, but I added
This worked and the value stayed on reboot. Hopefully this fixes the issue, I can't reliably reproduce this bug, but it happens every couple of days. If it stops, I will likely lower this value to 512. |
This unfortunately did not solve the problem. The process no longer responds and spams this message. |
I am not very familiar with the code, or why certain decisions have been made. However, at the surface, it looked like this value is set to low: Is the buffer just not being drained fast enough? |
I thought this might be similar to issue #140. Instead of working on the z-wave messages too much and blocking mqtt messages, maybe ozwdaemon is too busy doing mqtt/other things (or is in a deadlock) so it stops processing the z-wave notifications, which would cause the stream buffer to fill up. But I wasn't entirely sure because the logs don't seem to indicate that. I'm not sure you can say the buffer size is too low, maybe it is fine for a normally functioning application? 2MB is quite large compared to typical z-wave message sizes. There are no kinds of statistics captured for the stream buffer so we don't really know what would be "normal". If ozwdaemon has such a bug, then the problem is not the buffer size but the application, and if there's a deadlock for example, increasing the size won't help. However, it would be a useful experiment to increase the value and see if it makes a difference. Maybe making it much larger would workaround an ozwdaemon bug and let it get back to processing the zwave messages. Or maybe it's like you say, and it's too low for large networks, but the comments about this not being a problem with OZW 1.4 would seem to indicate it's not a buffer size problem... If you know how to build docker images it would be pretty simple to clone your own copy of open-zwave, change the buffer size and reference your repo to build with. qt-openzwave/Docker/Dockerfile Line 48 in 89cc0d8
|
I updated my boot parameters so that |
Sounds like an application (ozwdaemon) bug to me then. zwave2mqtt uses the same OpenZWave library (but they are switching to entirely different library).
Can you get a stack trace when it's in that state? |
@kpine I assume you mean logs and not stack trace since it doesn't crash 🙂 Here I haven't let the process run long enough to produce the Not enough space in stream buffer, BUT it does produce 5MB of logs in 2 minutes on startup: ozwdaemon.log.xz Here is one of my older logs (i.e. 1 hour older) with 10k lines of the Not enough space in stream buffer error: ozwdaemon.old.log.xz (I'm hoping there aren't any sensitive information in these I haven't found ¯\_(ツ)_/¯) |
If ozwdaemon is stuck at 100% for a significant time, maybe it is stuck in some kind of loop. If you captured a stack trace (of all threads) at that time it might provide a clue. It's just cheap way of doing some performance analysis. |
@karl-gustav In the two logs you provided it's actually showing issue #140
Because ozwdaemon is not handling the MQTT messages, the broker is disconnecting it. Once disconnected, ozwdaemon will shutdown, which is why the "DriverRemoved" message is shown. The stream buffer errors only seem to occur after the driver has been removed, so ozwdaemon has probably stopped handling ozw notifications at this point. It is just taking a long time for it to shutdown, as it does cleanup work. This seems to be seems to be true for all the other logs posted in this issue, the error happens following all of the "value removed" messages. It's looking more like this is just a duplicate of #140, unless someone has logs of the stream buffer error occurring w/o an MQTT disconnect? |
I’ve had mosquito itself crash a few times requiring a restart particularly upon refreshing nodes. I wonder if this is an upstream error? Has an issue been opened on Mosquitto’s GitHub?
|
Issue #140, and the logs here are a bug in ozwdaemon. There is a keepalive timeout value set when connecting to the broker. If ozwdaemon does not respond to a ping request within that time, the broker is closing that connection. I can't comment on the mosquitto crashes you're seeing though. If mosquitto crashes, ozwdaemon is going to restart. Either way, ozwdaemon is not working properly. |
Hmm, V117 and 3 months later and this issue is still a problem... [20201030 19:20:45.268 ACDT] [ozw.library] [critical]: Error - Node: 0 ERROR: Not enough space in stream buffer |
The HA version has nothing to do with qt-openzwave. Qt-openzwave hasn’t released any new versions in that time and word is that the maintainer was stepping back for a bit because of hardware failures and some personal things. If you check the issues and PRs nothing has changed really at all since August
…Sent from my iPhone
On Oct 30, 2020, at 4:53 AM, Madelaide ***@***.***> wrote:
Hmm, V117 and 3 months later and this issue is still a problem...
[20201030 19:20:45.268 ACDT] [ozw.library] [critical]: Error - Node: 0 ERROR: Not enough space in stream buffer
[20201030 19:20:45.269 ACDT] [ozw.logging] [debug]: popping Log Mesages
[20201030 19:20:45.270 ACDT] [ozw.library] [critical]: Error - Node: 0 ERROR: Not enough space in stream buffer
[20201030 19:20:45.270 ACDT] [ozw.logging] [debug]: popping Log Mesages
[20201030 19:20:46.262 ACDT] [ozw.library] [critical]: Error - Node: 0 ERROR: Not enough space in stream buffer
[20201030 19:20:46.263 ACDT] [ozw.logging] [debug]: popping Log Mesages
[20201030 19:20:46.264 ACDT] [ozw.library] [critical]: Error - Node: 0 ERROR: Not enough space in stream buffer
[20201030 19:20:46.264 ACDT] [ozw.logging] [debug]: popping Log Mesages
[20201030 19:20:46.562 ACDT] [ozw.values] [debug]: delValue: Removing Value QVariant(QString, "Active flashing alarm time") QVariant(qulonglong, 10977524202602518) 0
[20201030 19:20:46.564 ACDT] [ozw.notifications] [debug]: Notification pvt_valueRemoved: 11258999179313169 Thread: 0x7f4e7166fd48
[20201030 19:20:47.262 ACDT] [ozw.library] [critical]: Error - Node: 0 ERROR: Not enough space in stream buffer
[20201030 19:20:47.263 ACDT] [ozw.logging] [debug]: popping Log Mesages
[20201030 19:20:47.264 ACDT] [ozw.library] [critical]: Error - Node: 0 ERROR: Not enough space in stream buffer
[20201030 19:20:47.264 ACDT] [ozw.logging] [debug]: popping Log Mesages
[20201030 19:20:48.262 ACDT] [ozw.library] [critical]: Error - Node: 0 ERROR: Not enough space in stream buffer
[20201030 19:20:48.263 ACDT] [ozw.logging] [debug]: popping Log Mesages
[20201030 19:20:48.264 ACDT] [ozw.library] [critical]: Error - Node: 0 ERROR: Not enough space in stream buffer
[20201030 19:20:48.265 ACDT] [ozw.logging] [debug]: popping Log Mesages
[20201030 19:20:49.262 ACDT] [ozw.library] [critical]: Error - Node: 0 ERROR: Not enough space in stream buffer
[20201030 19:20:49.262 ACDT] [ozw.logging] [debug]: popping Log Mesages
[20201030 19:20:49.264 ACDT] [ozw.library] [critical]: Error - Node: 0 ERROR: Not enough space in stream buffer
[20201030 19:20:49.264 ACDT] [ozw.logging] [debug]: popping Log Mesages
[20201030 19:20:49.285 ACDT] [ozw.values] [debug]: delValue: Removing Value QVariant(QString, "Updating the dimming level without the input from the switch") QVariant(qulonglong, 11258999179313169) 0
[20201030 19:20:49.285 ACDT] [ozw.notifications] [debug]: Notification pvt_valueRemoved: 11540474156023828 Thread: 0x7f4e7166fd48
[20201030 19:20:50.263 ACDT] [ozw.library] [critical]: Error - Node: 0 ERROR: Not enough space in stream buffer
[20201030 19:20:50.263 ACDT] [ozw.logging] [debug]: popping Log Mesages
[20201030 19:20:50.263 ACDT] [ozw.library] [critical]: Error - Node: 0 ERROR: Not enough space in stream buffer
[20201030 19:20:50.263 ACDT] [ozw.logging] [debug]: popping Log Mesages
[20201030 19:20:50.704 ACDT] [ozw.mqtt.publisher] [debug]: Removing CommandClass Topic for 6 1 112
[20201030 19:20:51.262 ACDT] [ozw.library] [critical]: Error - Node: 0 ERROR: Not enough space in stream buffer
[20201030 19:20:51.263 ACDT] [ozw.logging] [debug]: popping Log Mesages
[20201030 19:20:51.264 ACDT] [ozw.library] [critical]: Error - Node: 0 ERROR: Not enough space in stream buffer
[20201030 19:20:51.264 ACDT] [ozw.logging] [debug]: popping Log Mesages
[20201030 19:20:52.065 ACDT] [ozw.values] [debug]: delValue: Removing Value QVariant(QString, "Scene activation functionality") QVariant(qulonglong, 11540474156023828) 0
[20201030 19:20:52.066 ACDT] [ozw.notifications] [debug]: Notification pvt_valueRemoved: 115114003 Thread: 0x7f4e7166fd48
[20201030 19:20:52.262 ACDT] [ozw.library] [critical]: Error - Node: 0 ERROR: Not enough space in stream buffer
[20201030 19:20:52.262 ACDT] [ozw.logging] [debug]: popping Log Mesages
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
it doesn't, however HA is planning to ditch their native zwave integration (based on v1.4) and pickup the OZWbeta that is plaggued with this issue. Not a HA issue, but an OZWdaemon issue |
That’s the roadmap but they haven’t announced a timeline. The whole point was to allow time for these issues to be worked through so they aren’t a mad rush to fix.
… —
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
I am having the same issue I think. According to logs seems like devices are being polled, then I get a mqtt shutdown message, but things still appear to be polled. I have to keep restarting the add-on and then at some point it will work. This is on RPI 4b with Nortek Z wave stick. Oddly enough, I didnt have this issue at all when I was running HA on virtual machin; but I had to switch to PI because I needed to move where the Z stick is. And I dont know much on the Pi side of how to even attempt to increase USB allocation. Think I am going back to 1.4 for now. |
I'm testing to switch from HA integrated Zwave over to ozwdaemon and get the same error: I'm on a rpi3 with a Aeotec Zstick get5 running latest qt-openzwave docker image. Mosqito MQTT running on the same device. I have a zwave network with about 50 nodes. The errors usually come after running for 10-20 minues. However, I have observed that right before the errors start, the ozwdaemon seem to freeze for about 2 minutes. During that time there are no log entries at all in the log file and at that time OpenZwave-integration in HA start reporting the OZWDaemon as offline. There are no other warning or error logs in the ozwdaemon log before the the freeze. |
I have this problem too. Im pretty sure that this is actually a hardware corruption of EEPROM on the stick, but im not sure what is causing it. |
It's not the stick. I moved to zwave2jsmqtt a few months ago and have had zero issues since. Restoring the eeprom likely just clears out the queue for the buffer. |
I disagree, or we are experiencing two completely different issues maybe? |
It's unlikely this is due to the stick given the fact it affects many different brands of sticks and only on qt-openzwave. There are lots of reasons that might happen, including the usb identifier changing and so the kernel sees it as a different device. (I don't know that happens, it's just an example.) Even if it is corrupted qt-openzwave is clearly causing the corruption somehow. Also deleting the cache sometimes helps fix it temporarily. |
Definitely not the stick. I moved to zwavejs - no issues.
It’s that the MQTT broker is not stroked when the zwave stack is busy.
Do yourself a favor move to zwavejs. It is easy
https://community.home-assistant.io/t/switching-from-openzwave-beta-to-zwave-js/276723
…On Sat, Feb 20, 2021 at 2:17 PM blhoward2 ***@***.***> wrote:
It's unlikely this is due to the stick given the fact it affects many
different brands of sticks. There are lots of reasons that might happen,
including the usb identifier changing and so the kernel sees it as a
different device. (I don't know that happens, it's just an example.)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#150 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABM2GAKNWFZTDRNGKOH44FLTAAYINANCNFSM4PXBZQ2Q>
.
|
Ozw dead. Closing. |
HA 114b1, ozw 0.52 addon, now locks up with the error above, approximately 100 nodes in the NW - was working okay in my test system NW of 10 nodes.
Now all the ZW devices are marked as "unavailable" and my ZW network is unusable; WAF is very low...
The text was updated successfully, but these errors were encountered: