[Questions] Generic server terminating - "badmatch,{error,eacces}" error from rabbit_classic_queue_store_v2 #14094
-
Community Support Policy
RabbitMQ version used4.1.0 Erlang version used27.3.x Operating system (distribution) usedWindows Server 2016 How is RabbitMQ deployed?Windows installer rabbitmq-diagnostics status outputSee https://www.rabbitmq.com/docs/cli to learn how to use rabbitmq-diagnostics
Logs from node 1 (with sensitive values edited out)See https://www.rabbitmq.com/docs/logging to learn how to collect logs
Logs from node 2 (if applicable, with sensitive values edited out)No response Logs from node 3 (if applicable, with sensitive values edited out)No response rabbitmq.confI do not have one. Steps to deploy RabbitMQ cluster
Steps to reproduce the behavior in questionI am unsure what causes this behavior. It happens intermittantly. Other servers with the exact same RabbitMQ/Erlang combination do not have this problem. advanced.configSee https://www.rabbitmq.com/docs/configure#config-location to learn how to find advanced.config file location
Application codeNo response Kubernetes deployment fileNo response What problem are you trying to solve?I work at a software company on a on-premise solution that uses RabbitMQ as a message broker between different services on a application server. After updating to RabbitMQ 4.1.0, Erlang 27.3.3, some queues on one specific customer server crash, drop messages and restart. This can happen to any queue experiencing relatively high traffic. When it does, we get the error message in the RabbitMQ logs that I have shared above. Occassionally this leads to a complete failure of RabbitMQ, where no messages get published up at all anymore. When this happens, the RabbitMQ service is still running and we can still reach Rabbit's localhost management console. After restarting the RabbitMQ service, everything runs normally for a while before the whole issue starts over again. I have spent some time trying to solve this, but so far I am unable to find a clear cause or solution. I've tried the following things:
I want to note that this only happens on one specific customers' server. I guess this is probably some kind of permission issue regarding the buffer files (?), but am unsure how to reproduce the issue or how I could attempt to solve it. If anyone could give some advice on what to do, that'd be much appreciated. |
Beta Was this translation helpful? Give feedback.
Replies: 6 comments 32 replies
-
@Donderstal this is a known Windows-specific error: sometimes Windows returns We will see in the next patch release or two. |
Beta Was this translation helpful? Give feedback.
-
This error occured just now while I was running Process Monitor. This is the error in the RabbitMq logs:
I've added the ProcessMonitor logfile as .CSV as attachment. I also have a full .PML file available but I can't add .PML files into a github post. I could email it if you'd like. |
Beta Was this translation helpful? Give feedback.
-
@michaelklishin @lhoguin |
Beta Was this translation helpful? Give feedback.
-
When we try with the Alpha we get it to fail on index instead |
Beta Was this translation helpful? Give feedback.
-
I installed the alpha version on Tuesday morning 07:30, CET. Since then, we've had the error once on this customers machine. As far as I know this has not resulted in a loss of messages. I will keep an eye on the rabbitMQ logs and will give you guys an update next week to see if the error frequency increases. |
Beta Was this translation helpful? Give feedback.
-
Thanks to @tomerrt, we have learned that this filesystem API behavior is not only Windows-specific, it is specific to a certain set of Windows versions, and the newest versions While our current workaround helps, it must cover all file deletions, which is a fair number of places even in the CQ storage layer alone. Adopting a newer Windows version can be a viable option that would help those running RabbitMQ 3.13.x or any 4.x version before |
Beta Was this translation helpful? Give feedback.
Thanks to @tomerrt, we have learned that this filesystem API behavior is not only Windows-specific, it is specific to a certain set of Windows versions, and the newest versions
should not be affected.
While our current workaround helps, it must cover all file deletions, which is a fair number of places even in the CQ storage layer alone. Adopting a newer Windows version can be a viable option that would help those running RabbitMQ 3.13.x or any 4.x version before
4.1.2
where the first slew of workarounds shipped.