Add an onBatchCopyRetry hook to allow more fine-grained error handling #1499

grodowski · 2025-02-19T10:15:51Z

Following #1498, my team is looking to add one more feature to support our gh-ost workflow. We sometimes run online migrations on databases with low max_binlog_cache_size, which may result in batch inserts failing due to FATAL Error 1197 (HY000): Multi-statement transaction required more than 'max_binlog_cache_size' bytes of storage;. In our current approach with LHM, we mitigated this with retry logic that reduces the batch size gradually inside the retry loop (original PR: Shopify/lhm#165)

I thought we don't have to copy this feature 1:1 and could instead leverage dynamic reconfiguration and hooks:

a batch insert fails with error 1197
a hook is executed and updates chunk-size to a smaller value
batch gets retried until the default-retries is reached

The only issue I found was the lack of a suitable hook to use like that. e.g. onFailure will be called on the error in question, but at this time we'd know the gh-ost process is about to panic with Fatale. So, I'd like to propose adding a new hook that gets invoked before each retry inside iterateChunks and lets us dynamically reconfigure the chunk-size.

We have a working prototype in Shopify#2 that solves our issue with max_binlog_cache_size and also seems like a sound general addition to gh-ost that providees more flexible error-handling options.

For context, let me also share the draft gh-ost-on-batch-copy-retry script that we wrote to handle the binlog cache errors:

#!/usr/bin/env ruby
# frozen_string_literal: true

require 'socket'

socket_path = '/tmp/ghost.sock'
backoff_factor = 0.8
min_chunk_size = 10
output = nil
handled_error_re = %r{Multi-statement transaction required more than 'max_binlog_cache_size' bytes of storage}

mysql_error_mesage = ENV.fetch("GH_OST_LAST_BATCH_COPY_ERROR")

unless mysql_error_mesage.match?(handled_error_re)
  puts "Nothing to do for error: #{mysql_error_mesage}"
  exit
end

Socket.unix(socket_path) do |socket|
  socket.puts 'chunk-size=?'
  output = socket.gets
end

chunk_size = output.to_i
new_chunk_size = [(chunk_size * backoff_factor).to_i, min_chunk_size].max

exit if chunk_size == new_chunk_size

Socket.unix(socket_path) do |socket|
    socket.puts "chunk-size=#{new_chunk_size}"
end

Please share any suggestions before I open a PR here, thanks!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an onBatchCopyRetry hook to allow more fine-grained error handling #1499

Add an onBatchCopyRetry hook to allow more fine-grained error handling #1499

grodowski commented Feb 19, 2025 •

edited

Loading

Add an onBatchCopyRetry hook to allow more fine-grained error handling #1499

Add an onBatchCopyRetry hook to allow more fine-grained error handling #1499

Comments

grodowski commented Feb 19, 2025 • edited Loading

grodowski commented Feb 19, 2025 •

edited

Loading