-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
topic_to_socketcan
communication breaks under high load
#285
Comments
You might increase the socketcan buffer size (#102). 100 Hz should be no problem on a more or less ordered CAN bus. If you keep sending ROS messages at 10k Hz, the ROS callback system and the CANopen multi-threading won't perform properly anymore (at least on non RT-linux systems). Given that the buffer is big enough, the jitter will increase dramtically. |
Thanks for the tips. I've increased buffer size, which seems to lead to improvements. Tomorrow I'll do some further real-world tests. In practice we won't send our messages above 100Hz. The 10k Hz that I ran was only to stress test it (but I have the impression that can actually trigger other types of errors than the ones we had in practice at 100Hz). You wrote:
I agree, but the problems are in the CAN bus of the original vehicle, which is already quite erroneous of its own unfortunately... We don't plan to go into rewiring that whole CAN bus, which would be the only solution I'd say. I tried the autorecover in #249. It works good for recovery (though you'll loose a significant interval of communication) and thus seems useful for us as a fallback. |
I did some real-world tests today. Results:
Maybe it would be nice to make it configurable via a rosparam setting to enable (ROS_ERROR) or disable (ROS_WARN only) the escalation of these errors via the error mask? |
Please don't confuse this bitmask of reported errors with the handling of these errors. |
I fully agree. This was just a quick and dirty hack to confirm that it works for us when removing those two errors. I made it slightly cleaner now: git diff HEAD~2 -- socketcan_interface/include/socketcan_interface/socketcan.h
diff --git a/socketcan_interface/include/socketcan_interface/socketcan.h b/socketcan_interface/include/socketcan_interface/socketcan.h
index 96845db..277bdda 100644
--- a/socketcan_interface/include/socketcan_interface/socketcan.h
+++ b/socketcan_interface/include/socketcan_interface/socketcan.h
@@ -199,11 +199,16 @@ protected:
if(frame_.can_id & CAN_ERR_FLAG){ // error message
input_.id = frame_.can_id & CAN_EFF_MASK;
input_.is_error = 1;
-
- LOG("error: " << input_.id);
setInternalError(input_.id);
- setNotReady();
+ LOG("error: " << input_.id);
+ if (frame_.can_id & CAN_ERR_LOSTARB ||
+ frame_.can_id & CAN_ERR_CRTL)
+ {
+ LOG("warn: ignoring arbitration (2) or controller (4) error");
+ } else {
+ setNotReady();
+ }
}else{
input_.is_extended = (frame_.can_id & CAN_EFF_FLAG) ? 1 :0;
input_.id = frame_.can_id & (input_.is_extended ? CAN_EFF_MASK : CAN_SFF_MASK); I can clearly see now that both happen every now and then, but overall communication now remains working. |
I am facing same problem as described above by @koenlek for controller issue. With suggested changes to socketcan.h, the error disappears and I can see the CAN message being transmitted on to the bus correctly. However, the intended functionality in the vehicle is not realized. My question is
@koenlek and @ipa-mdl, appreciate your inputs on this |
Solution from ros-industrial#285
lost arbitration are not fatal anymore (#362) |
After improving according to the method you provided, the fault has not been eliminated |
Problem
We are sending data from a ros topic to a SocketCAN device via the
socketcan_bridge
. Sometimes, (presumably when the load on the bus and/or our PEAK PCI CAN card is too high) an error triggers which completely breaks allsocketcan_bridge
communication in the direction topic -> socketcan. The only solution is to restart thesocketcan_bridge
node. This affects bothtopic_to_socketcan_node
andsocketcan_bridge_node
. In the case ofsocketcan_bridge_node
, only the communication from topic -> socketcan breaks. The other direction remains functional.We don't always get the same errors. This are the ones I've seen:
The errors will print endlessly (a new error is printed for every msg that we try to send via
topic_to_socketcan
). The communication remains broken (also after lowering the load on the canbus) until we restart thesocketcan_bridge
node.How to reproduce
vcan
(that seems to be able to handle extreme loads without problems).sudo ip link set can0 type can bitrate 500000 triple-sampling on
. I also tried without triple-sampling.rosrun socketcan_bridge topic_to_socketcan_node
/sent_messages
. To enforce triggering it, we send them atrate = rospy.Rate(10000)
, which triggers the error practically immediately. When we send them at 10 hz, all is fine. At 100Hz we have problems in our real system, but not always immediately (seems to be related with how busy it is on the bus, though I'm not sure).Environment:
I am curious if we could increase buffers to fix this issue, or if we could make it at least recover from the error (I guess it all starts with
FAILED system:105
).I'd be willing to contribute a PR to fix it, if people can point me in the right direction.
The text was updated successfully, but these errors were encountered: