Import data workflow¶
This document describes the import data workflow in detail, with hooks that enable
customization of the import process. The central aspect of the import process is a resource’s
import_data() method which is explained below.
import_data(dataset, dry_run=False, raise_errors=False)¶
Resourceis responsible for importing data from a given dataset.
datasetis required and expected to be a
tablib.Datasetwith a header row.
dry_runis a Boolean which determines if changes to the database are made or if the import is only simulated. It defaults to
raise_errorsis a Boolean. If
True, import should raise errors. The default is
False, which means that eventual errors and traceback will be saved in
This is what happens when the method is invoked:
First, a new
Resultinstance, which holds errors and other information gathered during the import, is initialized.
InstanceLoaderresponsible for loading existing instances is intitalized. A different
BaseInstanceLoadercan be specified via
CachedInstanceLoadercan be used to reduce number of database queries. See the source for available implementations.
before_import()hook is called. By implementing this method in your resource, you can customize the import process.
Each row of the to-be-imported dataset is processed according to the following steps:
before_import_row()hook is called to allow for row data to be modified before it is imported
get_or_init_instance()is called with current
BaseInstanceLoaderand current row of the dataset, returning an object and a Boolean declaring if the object is newly created or not.
If no object can be found for the current row,
init_instance()is invoked to initialize an object.
As always, you can override the implementation of
init_instance()to customized how the new object is created (i.e. set default values).
for_delete()is called to determine if the passed
instanceshould be deleted. In this case, the import process for the current row is stopped at this point.
If the instance was not deleted in the previous step,
import_obj()is called with the
instanceas current object,
rowas current row and
import_field()is called for each field in
Resourceskipping many- to-many fields. Many-to-many fields are skipped because they require instances to have a primary key and therefore assignment is postponed to when the object has already been saved.
import_field()in turn calls
Field.attributeis set and
Field.column_nameexists in the given row.
It then is determined whether the newly imported object is different from the already present object and if therefore the given row should be skipped or not. This is handled by calling
originalas the original object and
instanceas the current object from the dataset.
If the current row is to be skipped,
row_result.import_typeis set to
If the current row is not to be skipped,
save_instance()is called and actually saves the instance when
dry_runis not set.
There are two hook methods (that by default do nothing) giving you the option to customize the import process:
Both methods receive
save_m2m()is called to save many to many fields.
RowResultis assigned with a diff between the original and the imported object fields, as well as and
import_typeattribute which states whether the row is new, updated, skipped or deleted.
If an exception is raised during row processing and
import_data()was invoked with
raise_errors=False(which is the default) the particular traceback is appended to
If either the row was not skipped or the
Resourceis configured to report skipped rows, the
RowResultis appended to the
after_import_row()hook is called
If transaction support is enabled, whole import process is wrapped inside
transaction and rollbacked or committed respectively.
All methods called from inside of
import_data (create / delete / update)