Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

socketcan_bridge: unrecoverable error "controller problems;, asio: Success" #267

Open
WasabiFan opened this issue Feb 7, 2018 · 1 comment

Comments

@WasabiFan
Copy link

I'm currently using socketcan_bridge to interface with CAN in my ROS project. While it does generally work, it will sometimes spit out a sequence of four lines like this:

[ERROR] [1517684645.914299608]: Error: controller problems;, asio: Success

After that, any further attempts to send a message result in:

[ERROR] [1517684649.054520857]: Failed to send message: 9e040005#010384fc7c.

I can send that same CAN frame from the command line. So, it's not an underlying driver failure, but it seems that the bridge doesn't recover from an intermittent bus error. Yes, the ideal solution is to resolve whatever causes it, but if this ever happens in a production environment the whole system grinds to a halt.

This seems to be where the latter message is coming from:

bool res = driver_->send(f);
if (!res)
{
ROS_ERROR("Failed to send message: %s.", can::tostring(f, true).c_str());
}

I am not sure where the former originates or what produces it. In dmesg, I often see the following when, e.g., I plug in a USB device (my guess is that the USB processing is flooding interrupts or something of that nature):

[12842.644761] mttcan c310000.mttcan can0: mttcan_poll_ir: some msgs lost on in Q0

That may or may not be related.

I think #249 would fix the problem, but I don't think I am personally prepared to take on the fixes to that PR that were requested.

Could anyone suggest a fix or resolution?

@mathias-luedtke
Copy link
Member

Could anyone suggest a fix or resolution?

Auto-recover would be an option, but some messages will get lost.
#244 needs to be fixed..

but if this ever happens in a production environment the whole system grinds to a halt.

If USB interrupts can break your application, I would not consider this a production environment.
A lowlatency kernel might help as a work-around.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants