Paulo Silva

March 14, 2025

Backfilling Data Safely in Ruby on Rails Migrations

Backfilling data in Ruby on Rails migrations can be tricky, especially when working with large datasets. The strong_migrations gem provides guidelines to prevent performance issues and downtime. When performing backfills, you should be mindful of three key aspects: using find_each for batching, throttling the process, and disabling transactions.


1. Using find_each for Batching

When dealing with large datasets, iterating over records with each can lead to excessive memory usage and long-running transactions. Instead, Rails provides find_each, which loads records in batches (default batch size: 1000) to minimize memory footprint and prevent database locks.

Example:

User.find_each do |user|
  user.update!(normalized_email: user.email.downcase)
end

This approach prevents loading all records at once while ensuring efficient iteration.


2. Throttling the Process

When backfilling large tables, you might put too much strain on the database if updates happen too quickly. To mitigate this, introduce a small sleep interval to reduce load:

User.find_each do |user|
  user.update!(normalized_email: user.email.downcase)
  sleep(0.01) # Prevents overwhelming the database
end

A short delay (e.g., 0.01 seconds) between updates helps balance performance and database stability, especially in production.


3. Disabling Transactions in Migrations

By default, Rails migrations run inside a transaction, but for large data updates, this can cause issues:

  • Transactions lock the entire table until the migration finishes.

  • A large migration inside a transaction might fail, rolling back thousands or millions of updates.

To avoid this, disable transactions by defining:

class BackfillNormalizedEmails < ActiveRecord::Migration[7.0]
  disable_ddl_transaction!

  def up
    User.find_each do |user|
      user.update!(normalized_email: user.email.downcase)
      sleep(0.01)
    end
  end

  def down
    raise ActiveRecord::IrreversibleMigration
  end
end

The disable_ddl_transaction! method prevents the migration from being wrapped in a single transaction, allowing updates to be committed incrementally.


Conclusion

Backfilling data in Rails migrations can be dangerous if not done properly. Using find_each to process data in batches, throttling updates, and disabling transactions when necessary ensures a safe and efficient migration process. Following these best practices helps maintain database stability while making necessary schema updates.

About Paulo Silva

Software Engineer specialized in product development with Ruby on Rails. I help companies turn bright ideas into amazing digital products — I've worked on InvoiceXpress, ClanHR, Today and currently sheerME.