Using Rails Migrations to Manipulate Data

Ryan Sonnek bio photo By Ryan Sonnek Comment

Rails migrations were originally designed for manipulating database schemas. Adding tables, columns and indexes are all usecases that are incredibly well supported and even supports more advanced features like rolling back to previous schema state. Manipulating production data is also a common occurrence for developers, but how do you do that in Rails?

Importing data from flat files, updating records, and cleaning up records in a bad state are regular events. Many developers would tackle these tasks by popping up the Rails console on a production server and manually running a few commands to take care of business. You can accomplish all of these tasks using Rails migrations which is significantly less risky than giving every developer direct access to your database to run one-off scripts live on your production data. Using migrations also provides easy access to feature branch code reviews and even unit testing.

Here are a few protips I recommend when building your Rails data migrations:

Log all the things

Many schema migrations have a clear path for reverting if anything goes wrong, but data migrations are much more challenging to revert. It is absolutely essential for your migrations to provide as much debug information as possible for if/when anything goes wrong.

Here’s a simple example that highlights using the say_with_time helper for auditing how long a migration step took and how many records were affected. Make sure to return the number of affected rows within the block!

class UpdateInvalidUsers < ActiveRecord::Migration
  def up
    say_with_time "Updating invalid users..." do
      count = 0
      User.find_each do |user|
        say "Updating user: #{user.id}"
        user.update!(status: 'valid')
        count += 1
      end
      count
    end
  end
end

Audit state pre and post execution

Logging during execution of a migration is simple enough, and it’s also very helpful to log information before execution about the current state of the system. For example, logging the number of invalid records before the migration begins is a great data point.

A good data migration should also assert that the operation completed successfully and raise an error if something did not complete as expected.

Taking our previous example, here is an expanded version that adds in simple pre and post execution logging plus assertions.

class UpdateInvalidUsers < ActiveRecord::Migration
  def up
    say "Found #{User.where(status: 'invalid').count} invalid records"

    say_with_time "Updating invalid users..." do
      count = 0
      User.find_each do |user|
        say "Updating user: #{user.id}"
        user.update!(status: 'valid')
        count += 1
      end
      count
    end

    invalid_count = User.where(status: 'invalid').count
    fail "Found #{invalid_count} invalid records" unless invalid_count == 0
  end
end

Re-runnable by default

If a data migration raises an error, what should you do? Data migrations should be designed to be re-runnable and if any errors occur, Rails will automatically rerun your migration the next time you run rake db:migrate.

It’s also a good idea to define inner classes within your migration that shadow your application models and use reset_column_information to ensure your migration has access to the latest version of the model’s schema.

Let’s take one last look at this example and make sure that the migration is re-runnable and doesn’t update records that have already been fixed.

class UpdateInvalidUsers < ActiveRecord::Migration
  def up
    say "Found #{User.where(status: 'invalid').count} invalid records"

    say_with_time "Updating invalid users..." do
      count = 0
      # NOTE: only lookup invalid records that have *not* been migrated
      User.where(status: 'invalid').find_each do |user|
        say "Updating user: #{user.id}"
        user.update!(status: 'valid')
        count += 1
      end
      count
    end

    invalid_count = User.where(status: 'invalid').count
    fail "Found #{invalid_count} invalid records" unless invalid_count == 0
  end
end

With these fundamental techniques under your belt, you can take your data migrations to the next level and even explore additional tools like the migration_data gem that make Rails migrations even easier.