Migrating data with rake tasks

The article shows how to restructure/migrate your files using simple rake task.

I’m on a project which was started uploading files with rails for google drive, but each call to Drive API costs between one and two seconds, the project starts growing (imagine a call with 50 files), so we decide to migrate for amazon s3. It will continue to use paperclip gem but not the Google Drive API anymore.

One problem is, paperclip use to store files on amazon with the following pattern: class/column/000/000/id/original/file_name, but the previously plugin use to use another one (id_filename), so I need to restructure a thousand of files. We could override the property path (more information here), but while the project grew it could be a big problem maintain all files in the same folder in the future.

My first thought was use shell script, and I knew could make it in one line, maybe two. But I remember to have several different classes which stay on the same folder in the actual project, so I need iterate in all file classes to send each to the right directory. So I wrote a little script which can make the work:

namespace :files do
  desc "Move files from google drive to amazon s3 pattern"
  task :restructure => :environment do
    classes = ENV['CLASSES'].split(',')
    classes.each do |object|
      Object.const_get(object).find_each do |file|
        original_file_name = file.id.to_s + '_' + file.document_file_name
        # your file should be on home folder
        original_path = Dir.home + '/gdrive/' + original_file_name
        new_path = Dir.home + '/s3/' + file.class.table_name + '/documents/000/000/' + sprintf('%03d',file.id) + '/original/'
        puts original_path + ' => ' + new_path + file.document_file_name

        FileUtils.mv(new_path + original_file_name,new_path + file.document_file_name)

If your classes are ContractFile and ClientFile and all your paperclip fields are called documents and your google drive folder need be on home directory. You can call like that:

rake files:restructure CLASSES=ContractFile, ClientFile

The script will iterate over the classes and parses the old file structure to a new one, first creating the structure, after copying the file and the last rename it (maybe you not need the last action). The commands are similar the Linux ones.

You can find the script in Gist too, feel free to use and edit it.

That’s all folks.


Deixe um comentário

Preencha os seus dados abaixo ou clique em um ícone para log in:

Logotipo do WordPress.com

Você está comentando utilizando sua conta WordPress.com. Sair / Alterar )

Imagem do Twitter

Você está comentando utilizando sua conta Twitter. Sair / Alterar )

Foto do Facebook

Você está comentando utilizando sua conta Facebook. Sair / Alterar )

Foto do Google+

Você está comentando utilizando sua conta Google+. Sair / Alterar )

Conectando a %s