You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Following #1498, my team is looking to add one more feature to support our gh-ost workflow. We sometimes run online migrations on databases with low max_binlog_cache_size, which may result in batch inserts failing due to FATAL Error 1197 (HY000): Multi-statement transaction required more than 'max_binlog_cache_size' bytes of storage;. In our current approach with LHM, we mitigated this with retry logic that reduces the batch size gradually inside the retry loop (original PR: Shopify/lhm#165)
I thought we don't have to copy this feature 1:1 and could instead leverage dynamic reconfiguration and hooks:
a batch insert fails with error 1197
a hook is executed and updates chunk-size to a smaller value
batch gets retried until the default-retries is reached
The only issue I found was the lack of a suitable hook to use like that. e.g. onFailure will be called on the error in question, but at this time we'd know the gh-ost process is about to panic with Fatale. So, I'd like to propose adding a new hook that gets invoked before each retry inside iterateChunks and lets us dynamically reconfigure the chunk-size.
We have a working prototype in Shopify#2 that solves our issue with max_binlog_cache_size and also seems like a sound general addition to gh-ost that providees more flexible error-handling options.
For context, let me also share the draft gh-ost-on-batch-copy-retry script that we wrote to handle the binlog cache errors:
#!/usr/bin/env ruby
# frozen_string_literal: true
require 'socket'
socket_path = '/tmp/ghost.sock'
backoff_factor = 0.8
min_chunk_size = 10
output = nil
handled_error_re = %r{Multi-statement transaction required more than 'max_binlog_cache_size' bytes of storage}
mysql_error_mesage = ENV.fetch("GH_OST_LAST_BATCH_COPY_ERROR")
unless mysql_error_mesage.match?(handled_error_re)
puts "Nothing to do for error: #{mysql_error_mesage}"
exit
end
Socket.unix(socket_path) do |socket|
socket.puts 'chunk-size=?'
output = socket.gets
end
chunk_size = output.to_i
new_chunk_size = [(chunk_size * backoff_factor).to_i, min_chunk_size].max
exit if chunk_size == new_chunk_size
Socket.unix(socket_path) do |socket|
socket.puts "chunk-size=#{new_chunk_size}"
end
Please share any suggestions before I open a PR here, thanks!
The text was updated successfully, but these errors were encountered:
Following #1498, my team is looking to add one more feature to support our
gh-ost
workflow. We sometimes run online migrations on databases with lowmax_binlog_cache_size
, which may result in batch inserts failing due toFATAL Error 1197 (HY000): Multi-statement transaction required more than 'max_binlog_cache_size' bytes of storage;
. In our current approach with LHM, we mitigated this with retry logic that reduces the batch size gradually inside the retry loop (original PR: Shopify/lhm#165)I thought we don't have to copy this feature 1:1 and could instead leverage dynamic reconfiguration and hooks:
chunk-size
to a smaller valuedefault-retries
is reachedThe only issue I found was the lack of a suitable hook to use like that. e.g.
onFailure
will be called on the error in question, but at this time we'd know thegh-ost
process is about to panic withFatale
. So, I'd like to propose adding a new hook that gets invoked before each retry insideiterateChunks
and lets us dynamically reconfigure the chunk-size.We have a working prototype in Shopify#2 that solves our issue with
max_binlog_cache_size
and also seems like a sound general addition togh-ost
that providees more flexible error-handling options.For context, let me also share the draft
gh-ost-on-batch-copy-retry
script that we wrote to handle the binlog cache errors:Please share any suggestions before I open a PR here, thanks!
The text was updated successfully, but these errors were encountered: