Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OZW goes into 100% CPU load if the Aeon USB Stick 2 is unplugged (only restarts helps) #111

Open
GoogleCodeExporter opened this issue Mar 14, 2015 · 22 comments
Assignees
Labels

Comments

@GoogleCodeExporter
Copy link

What steps will reproduce the problem?
1. Start OZW e.g. with MinOZW and get the devices discovered
2. Unplug the USB Stick
3. Check with "top" and the MinOZW is consuming 100% CPU load

What is the expected output? What do you see instead?
Not 100% CPU load

What version of the product are you using? On what operating system?
Unbuntu 10.04
Open Z-Wave r556

Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 3 Nov 2012 at 11:37

@dsoulayrol
Copy link

Hi. It seems this issue was abandoned half-fixed. As reported earlier, there are two problems. First, a 100% Cpu spin in SerialImpl, and then the absence of an event towards the user of the Manager object.

I need to address both those difficulties, so I'd like to know if anyone has made progress upon the proposed patches. Thanks.

@dsoulayrol
Copy link

Hello.

Here is a refreshed version of the patch proposed on 2015 March, the 14th. It still cannot be considered complete because I do not use the recover capabilities of the Driver and the Controller for now. When the stream is broken, the Driver emits a DriverFailure notification, and my program stops upon reception of this event.

However, I think there is not much to do so as to get the Driver up again. I experience no more CPU spins, but I had some problems on reconnection and I suspect some purge is missing before commands can be transmitted correctly again.

@Fishwaldo
Copy link
Member

Thanks. I'll look over it soon.

I think trying to recover from a "failed" serial port is too hard. I'll probably just try to tweek it a bit, so we also unload the Driver as well...

@julienw
Copy link
Contributor

julienw commented Jun 6, 2016

I'm very interested by this issue, please tell me if I can help at something.

@julienw
Copy link
Contributor

julienw commented Jun 6, 2016

note: the link to the issue in https://github.com/OpenZWave/open-zwave/wiki/FAQ points incorrectly to google code.

@dhylands
Copy link
Contributor

dhylands commented Jun 6, 2016

The real secret is that the serial port must be closed once it has "gone away". I've been using udev in other programs to detect serial ports being added and removed, and then just close the handle when the close is detected. This will typically cause any reads/writes which happen after the close to fail, and if timeouts are used, then in-progress reads/writes will timeout.

Under linux, if the serial port is still open when the dongle is plugged back in, then it will skip a number (i.e. go from /dev/ttyUSB0 to /dev/ttyUSB1).

@mattw-db
Copy link

mattw-db commented Jun 6, 2017

Does anyone have a full or work-in-progress solution to this, beyond the Linux-only partial discussed above?

The 100% CPU usage on controller removal appears to exist still on Windows also. I was able work around that symptom in a similar manner as what was done in the proposed patch for Linux, but gracefully and reliably removing the driver (so that it can later be re-added by the application) seems like a more substantial task. To be able to safely call NotifyWatchers and stop the driver thread on the way down, specifically, would appear to need some restructure to avoid deadlocks.

I'm new to the code, so not keen on hacking things up too much, but if someone's got a starting point or a clever idea on how it should be done, I'd be interested in playing with it since this is actually a fairly important use-case for me.

Thanks!

@DvdGiessen
Copy link
Contributor

Just spend some time testing the patch from #111 (comment) and that does properly notify the application of the failure instead of just spinning to 100% and not doing much else.

@Fishwaldo Is there any reason not to merge this patch already? Sure, we want to implement a better solution for the problem, but that's still something we can do in the future. By merging this patch applications can at least be made aware of the issue without any external hacks to monitor the underlying devices; I don't think anybody is happier with 100% CPU, so merging this patch will improve the situation for now until we can implement an even better solution.

@dilruacs
Copy link

dilruacs commented Apr 6, 2019

I'd like to request that this patch is added, too.

I am in a situation where my zwave device "goes away" briefly every now and then (totally unrelated problem which will be solved elsewhere) and that leaves the whole system in a mess because it ends up in a spin.
This patch causes the system to handle the problem much better.
Not applying a patch for a known 4 year old problem is much worse IMHO ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants