Importing invalid legacy data with Rails

Recently I needed to import some legacy data from a list of CSV files (each file representing one database table on the legacy application)  into a fresh new application, but the legacy data was not compliant with the new application rules.

Let’s consider the following example, simplified and modified for the scope of this article:


class Tutor < ApplicationRecord
  has_many :phones

  validates :email, presence: true

  before_save :update_default_rate

  private

  def update_default_rate
    if default_rate.present? and default_rate_changed?
      self.advanced_rate = default_rate * 1.30
    end
  end
end

And the following TutorImporter class, which imports the legacy data, one line at a time:


class TutorImporter
  def import(line)
    Tutor.create!(
      email:         line.email,
      default_rate:  line.rate
      advanced_rate: line.advanced_rate
    )
  end
end

With the code above, if some validation fails, the tutor record will not be saved, and an exception will be raised. Of course, there may be (ahem, there should be!) constraints also at the RDBMS level, but that’s out of the scope of this article.

As you may expect, the CSV data doesn’t match correctly the new application constraints: for example the tutor phones information lies in another CSV file, mixed with phone numbers for other kind of users for the platform.

In order to simplify things I decided to import those phones later, but of course this decision will leave momentarily all the imported tutors without phones, making them all invalid and preventing them to be saved. The first solution that comes to mind is to change the import method as follows:


class TutorImporter
  def import(line)
    tutor = Tutor.new(
      email:         line.email,
      default_rate:  line.rate,
      advanced_rate: line.advanced_rate
    )
    tutor.save validate: false
  end
end

which works like a charm: it will import all rows, with or without valid phones… but also those without emails!

When you use validate: false you loose completely control of what goes into the database. In my case when a tutor has no email, then that’s a real issue. At the end of the day no tutor in the database should be missing the email, so those records should be discarded by the importer as well.

There is another issue there, besides validations: the before_save callback is not applicable when importing the legacy tutors, because they already have the advanced_rate value and it’s different from the one calculated in the update_default_rate method. I don’t want the callback to change the advanced rate that was originally decided for the tutor.

We now may be tempted to modify the Tutor class record… we may add a flag field that allows us to skip the phones validations when necessary, and the before_save callback as well, something like this:


class Tutor < ApplicationRecord
  attr_accessor :skip_phone_validations_and_callbacks

  validates :phones, presence: true, unless: -> (record) { record.skip_phone_validations_and_callbacks }

  before_save :update_default_rate, unless: -> (record) { record.skip_phone_validations_and_callbacks }
end

This should do it. the email validation was not modified, while the other one and the callback are now optional.

There’s a big drawback here: the model was changed in order to make it more flexible for a one-shot operation (the CSV import). After the import, the good developer should come back to the code and revert their changes, which is not very practical… what if the patched model files are 10? What if the changes are far more complex that the ones I showed in the example? Will the developer remember all the changes? What if somebody else has to finish their job? Will that other person know exactly what to do with those files and validations? I doubt it.

Eventually I found another solution, which I think has no major drawbacks, and has the extra benefit of communicating to the reader that there are different rules at play when importing tutors from CSV legacy data:


class TutorImporter
  class Tutor < ApplicationRecord
    self.table_name = 'tutors'

    validates :email, presence: true
  end

  def import(line)
    Tutor.create!(
      email:         line.email,
      default_rate:  line.rate,
      advanced_rate: line.advanced_rate
    )
  end
end

Now I am importing the data using a different Tutor class: TutorImporter::Tutor. This class has only the email validation and has no before_save callback, so it will import all the CSV lines using the right criteria.

The main benefit of this solution is that CSV import logic never leaks outside of its scope, leaving the original Tutor class completely unchanged. You may be worried by the fact that some code got duplicated (the email validation in my example, but it may be much more stuff), but that can be moved in a module shared by both tutor classes.

Leave a Reply

Sort by:   newest | oldest | most voted
Pavel Rodionov
Member

Hi, Andrea. You have an error on line:
Tutor.save validate: false

The correct variant:
tutor.save validate: false

wpDiscuz