Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

使用redis集群,偶现发生moved,然后连接不上,执行报错 #2000

Open
yfengworld opened this issue Nov 25, 2024 · 6 comments
Open

Comments

@yfengworld
Copy link

错误日志:
[:0000009a][ERROR][00:00:45.45][err_handle.lua:7] ../skynet/lualib/skynet/db/redis/cluster.lua:407: Too many Cluster redirections?,maybe node is disconnected (last error: " 15067 172.16.2.67:6379")
stack traceback:
../src/share/libs/err_handle.lua:6: in function 'err_handle.error_handler'
[C]: in function 'error'
../skynet/lualib/skynet/db/redis/cluster.lua:407: in function <../skynet/lualib/skynet/db/redis/cluster.lua:315>
(...tail calls...)

而且进程出现异常后,debug_console连接上马上就被关闭,无法使用

@cloudwu
Copy link
Owner

cloudwu commented Nov 25, 2024

看代码找问题。

如果无法建立新连接,检查最大文件数。

cc @sundream

@yfengworld
Copy link
Author

yfengworld commented Nov 25, 2024

看代码找问题。

如果无法建立新连接,检查最大文件数。

cc @sundream

根据日志定位到cluster.lua文件里rediscluster:call(...)函数,看到发生moved,返回了正确的ip和端口,但是再次执行
local result = {pcall(function ()
-- TODO: use pipelining to send asking and save a rtt.
if asking then
conn:asking()
end
asking = false
local func = conn[cmd]
return func(conn,table.unpack(argv,2))
end)}
local ok = result[1]
if not ok then
err = table.unpack(result,2)
err = tostring(err)
syslog.error("rediscluster socket error %s", err)

这里的err打印../skynet/lualib/skynet/socketchannel.lua:482: MOVED 1918 172.16.2.207:6379,然后重试结束后抛出错误。检查发现key也确实在172.16.2.207这个节点,

查看文件描述符大小
root@ybxz-obt-center:/data/ybxz-obt# ulimit -n
102400

@firedtoad
Copy link

firedtoad commented Nov 25, 2024 via email

@yfengworld
Copy link
Author

需要提前计算节点ID yfengworld @.> 于2024年11月25日周一 17:35写道:

看代码找问题。 如果无法建立新连接,检查最大文件数。 cc @sundream https://github.com/sundream 根据日志定位到cluster.lua文件里rediscluster:call(...)函数,看到发生moved,返回了正确的ip和端口,但是再次执行 local result = {pcall(function () -- TODO: use pipelining to send asking and save a rtt. if asking then conn:asking() end asking = false local func = conn[cmd] return func(conn,table.unpack(argv,2)) end)} local ok = result[1] if not ok then err = table.unpack(result,2) err = tostring(err) syslog.error("rediscluster socket error %s", err) 这里的err打印../skynet/lualib/skynet/socketchannel.lua:482: MOVED 1918 172.16.2.207:6379,然后重试结束后抛出错误。检查发现key也确实在172.16.2.207这个节点 — Reply to this email directly, view it on GitHub <#2000 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAK6QJQD36UIF64XBLBDXTD2CLVNPAVCNFSM6AAAAABSNNKRKSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOJXGQYTKNJUGE . You are receiving this because you are subscribed to this thread.Message ID: @.
>

什么意思?

@firedtoad
Copy link

大概率你没连接所有的节点

@yfengworld
Copy link
Author

因为开了一定数量的公会服务处理公会,每个服务连接一个redis集群。怀疑是连接太多。公会服务压缩后,问题不再出现。但是不确定具体的原因

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants