-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node discovery takes 3 minutes for a pairing #1153
Comments
Some further investigations on this, as I think I found the reason why discoveries are started on the startup of ZSS: Once we set the network to ONLINE in Lines 1330 to 1337 in 84042f7
Since the Lines 252 to 254 in 84042f7
We start a new discovery for this node. And we add the Lines 82 to 89 in 84042f7
here: Line 295 in 84042f7
Now I am wondering what we should do about this comment:
If we always request the network address per node, we will add one task to the thread pool per device on startup. If there are multiple nodes that re currently turned off (no power for example), we will will try to reach them multiple times, due to our retries until we eventually give up. During these retries, there are some chances for other tasks to be run, however, they might take some time, see my comment above. If we have 6 or more unreachable devices, the situation becomes even worse, because then all threads from the pool might be occupied and if the user wants to start a pairing in that time-span, chances are high that this will take a very long time (and the application that initiated the pairing might have even timed out then). So the questions would be:
|
The theory was that the NWK address can change. This is especially true for battery devices where they can change parent, or could leave/rejoin for a number of reasons, and this will result in them having a different address. All that said, I'm happy to discuss any suggestions here @triller-telekom as I do think we're probably being too conservative. Probably we need a more optimistic approach - eg not to rediscover the network so often, and not to assume that the NWK address has changed, and only perform these sort of checks on exception (eg if we have a transaction failure or something). These concepts came originally from the ZigBee4Java project, and possibly the ZigBee4Osgi before that, but I do think there is a better way to manage the discovery of devices than crawling the neighbough tables etc and reducing some of this "noise" would likely improve things... Ok... So... I guess there are 2 issues here?
I'm open to suggestions / PRs here :) |
I agree with that. But I also think that there is nothing wrong with being conservative, because this way we have a "working" system, even if something changes in the network.
I had something like this in mind while analyzing it, sounds plausible to me. To your 2 points:
Back to my questions above, as I think they might be a first (and easier) step towards a slimmer discovery: Do you think its feasible to remove the network address request task from the mesh discovery completely, as we do add that task later anyway, in the case we do not know the nwk address? If not, then we certainly should build in a mechanism to skip this on startup, so not flood the network or our transaction manager with too many requests. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I am currently investigating a problem where it takes very long to discover a zigbee device (3 minutes and 9 seconds), i.e. reading out its descriptors, endpoints, etc.
What I found out so far is, that there is a huge gap from scheduling a task, until it actually starts:
This continues for basically all of the above tasks, so it sums up to the 3 minutes.
The only explanation I have so far is:
So there are only 6 threads available for such tasks to be run in parallel.
On the startup of the
ZigbeeDiscoveryExtension
, we create aZigBeeNetworkDiscoverer
and tell it to start a node discovery from network node "0". We collect all associated devices and add those to the network manager. I am assuming that the network state must be ONLINE at this point, because I assume the listener fornodeAdded
will be triggered and thus we start a discovery (with tasks occupying threads from the pool, mentioned above) for all "associated nodes". That is becauseZigbeeDiscoveryExtension.startDiscoveryIfNecessary()
has no discoverer for each node and thus it creates them.The other scenario where we could start a discovery for all nodes, is when we load the nodes from the storage, however, I think the network state is not ONLINE at this point in time and thus no discovery will be triggered.
Also: I have identified 4 "broken"
ZigBeeNode
s in the particular system, which are nodes that exist, but only have a IEEE address and no endpoints, descriptors, etc. So they are left overs from a broken pairing/deletion of a device, whatever. Those 4 devices would take 4 threads (continuously failing because they are not reachable) and I am wondering why the 2 other threads are also occupied. The only explanation I have is what I wrote above: That we start a discovery for ALL nodes on startup.So, i think we might run into a problem if there are 6 devices in the network, that not reachable at startup. Because if they occupy all threads and run into timeouts -> retries -> timeouts, it will take a long time until we are able to start a discoverer for a pairing of a new device.
The text was updated successfully, but these errors were encountered: