Import workflow
This document describes the import data workflow in detail, with hooks that enable customization of the import process.
Methods highlighted in yellow in the sequence diagram indicate public methods which can be overridden.
The import_data() method of
Resource is responsible for importing data
from a given dataset. Refer to the method documentation for parameters to this method.
This is what happens when the method is invoked:
First, a new
Resultinstance, which holds errors and other information gathered during the import, is initialized.Then, an
BaseInstanceLoaderresponsible for loading existing instances is initialized. A differentBaseInstanceLoadercan be specified viaResourceOptions’sinstance_loader_classattribute. ACachedInstanceLoadercan be used to reduce number of database queries. See theinstance_loadersfor available implementations.The
before_import()hook is called. By implementing this method in your resource, you can customize the import process.Each row of the to-be-imported dataset is processed according to the following steps:
The
before_import_row()hook is called to allow for row data to be modified before it is imported.get_or_init_instance()is called with currentBaseInstanceLoaderand current row of the dataset, returning an object and a Boolean declaring if the object is newly created or not.If no object can be found for the current row,
init_instance()is invoked to initialize an object.As always, you can override the implementation of
init_instance()to customize how the new object is created (i.e. set default values).for_delete()is called to determine if the passedinstanceshould be deleted. In this case, the import process for the current row is stopped at this point.If the instance was not deleted in the previous step,
import_row()is called with theinstanceas current object instance,rowas current row.import_field()is called for each field inResourceskipping many- to-many fields. Many-to-many fields are skipped because they require instances to have a primary key and therefore assignment is postponed to when the object has already been saved.import_field()in turn callssave(), ifField.attributeis set andField.column_nameexists in the given row.It then is determined whether the newly imported object is different from the already present object and if therefore the given row should be skipped or not. This is handled by calling
skip_row()withoriginalas the original object andinstanceas the current object from the dataset.If the current row is to be skipped,
row_result.import_typeis set toIMPORT_TYPE_SKIP.If the current row is not to be skipped,
save_instance()is called and actually saves the instance whendry_runis not set.There are two hook methods (that by default do nothing) giving you the option to customize the import process:
save_m2m()is called to save many to many fields.RowResultis assigned with a diff between the original and the imported object fields, as well as andimport_typeattribute which states whether the row is new, updated, skipped or deleted.If an exception is raised during row processing and
import_row()was invoked withraise_errors=False(which is the default) the particular traceback is appended toRowResultas well.If either the row was not skipped or the
Resourceis configured to report skipped rows, theRowResultis appended to theResultThe
after_import_row()hook is called
The
Resultis returned.
Transaction support
If transaction support is enabled, whole import process is wrapped inside
transaction and rolled back or committed respectively.
All methods called from inside of import_data()
(create / delete / update) receive False for dry_run argument.