Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is table routing the default over source routing in ZHA #342

Open
cemizm opened this issue Dec 28, 2024 · 1 comment
Open

Why is table routing the default over source routing in ZHA #342

cemizm opened this issue Dec 28, 2024 · 1 comment

Comments

@cemizm
Copy link

cemizm commented Dec 28, 2024

Context

I’m trying to understand the rationale behind the decision to use table routing instead of source routing (also referred to as many-to-one routing) as the default in the Zigbee integration for Home Assistant.

According to the Zigbee documentation:

"Many-to-one routing is a simple mechanism to allow an entire network to have a path to a central control or monitoring device. Under normal table routing, the central device and the devices immediately surrounding it would need routing table space to store a next hop for each device in the network, as well as an entry to the central device itself. Given the memory-limited devices often used in Zigbee networks, these large tables are undesirable."

Source routing / many-to-one routing seems ideal for a centralized control device like Home Assistant. However, it is not enabled by default in the integration and must be explicitly activated through the configuration file.

My Experience

In my network of over 100 devices, I struggled with reliability issues for several months. Devices would occasionally stop responding, and in automations targeting multiple devices, only some would execute the commands successfully. The affected devices varied, making the issue unpredictable and harder to diagnose. These disruptions often coincided with errors in the logs, such as:

Failed to deliver message: <sl_Status.ZIGBEE_DELIVERY_FAILED: 3074>
Failed to deliver message: <EmberStatus.Delivery_Failed: 102>

After extensive troubleshooting and the use of a Zigbee sniffer, I identified the root cause: route requests—an essential mechanism of table routing—frequently failed to resolve for certain devices. Interestingly, the coordinator could still receive "Report Attributes" from the affected devices. This, combined with other error patterns, pointed to overflowing routing tables as the likely issue, despite approximately 50% of my devices being capable of routing.

Ultimately, enabling source routing completely resolved these problems. Since making this change, my network has been stable and reliable for several months. The process of diagnosing and fixing this issue taught me a lot about Zigbee networks, and I shared my insights and solution in this blog post, which may be helpful for others facing similar challenges.

It’s worth noting that I also followed the Home Assistant troubleshooting recommendations to add more routers. However, this approach was insufficient in my case, likely due to the elongated layout of my home, which further exacerbated the routing table limitations near the coordinator.,

Question

There must be a reason why table routing is the default choice. Is it due to potential challenges with source routing, such as the need to build the source routing table before devices can be addressed by the coordinator? Or are there other reasons?

If table routing is preferred for valid reasons, should the Home Assistant documentation's troubleshooting steps be expanded to include source routing as a potential solution for users experiencing similar issues? While adding more routers is currently the primary suggestion, it might not be effective for all network layouts, as seen in my case.

References

@cemizm
Copy link
Author

cemizm commented Jan 12, 2025

Hi everyone,

I wanted to follow up on this issue and provide a bit more detail from my own experience. In my Zigbee network, I encountered recurring issues where some devices intermittently stopped responding. This was particularly noticeable in automations targeting multiple devices, where only a subset would execute commands successfully. The affected devices varied, which made the issue difficult to diagnose.

After troubleshooting, I found that enabling source routing resolved these problems. Since making this change, the devices in my network have been consistently reliable, and the communication failures have stopped.

I’m curious if there are known trade-offs or challenges with source routing that might explain why table routing is the default? Or if other users have faced similar issues with non-responding devices, I’d love to hear how they addressed them.

Happy to provide further details or contribute to testing if it helps. Thanks for your time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant