Batch processing

Comparing to the CE, the PE comes with a built-in batch processing feature that stacks the data that has to added or updated and executes a multi-value SQL statement based on a set of configurable triggers.

By default, there should be no reason that has to be customized or configured, but in some edge cases or when used as framework, Pacemaker allows configuration when the batch processing will be triggered. The first trigger is the stack size the second one are dedicated attributes that starts the execution of the multi-value SQL statment.

Configure the stack size

By default, the maximum stack size is 1000. Before the next value will be added to the stack, the SQL statemtent will be executed and the stack cleared. Additionally a trigger takes care that the stack will also be processed, when the import has been finished but the stack still contains values.

Actually, it is not possible to configuration the maximum stack size by the configuration of the workflow engine, as this is the case for most of the other configuration options, instead the stack size has to be configured by the DI configuration.

The constructor of the TechDivision\Import\Batch\Actions\Processors\GenericBatchProcessor implementation, which provides the batch functionality, expects four arguments. The fourth argument is the maximum stack size after that the stack will be cleaned-up. In case the maximum stack size of the processor, that handles the creation of the product datetime attribute values, has to be changed, the DI configuration can be overwritten, e. g. with

<service
    id="import_product.action.processor.product.datetime.create"
    class="TechDivision\Import\Batch\Actions\Processors\GenericBatchProcessor">
    <argument type="service" id="connection"/>
    <argument type="service" id="import_batch.repository.sql.statement"/>
    <argument type="collection">
        <argument type="constant">
            TechDivision\Import\Batch\Utils\SqlStatementKeys::CREATE_UPDATE_PRODUCT_DATETIME
        </argument>
    </argument>
    <argument type="integer">2000</argument>
</service>

Configure the dedicated attributes

The option to change the decicated attributes that triggers the stack clean-up allows, besides the url_key attribute, the registration of additional attributes. If, for example, the stack should also be cleaned-up, when a value for the attribute url_path has been added to the stack, the DI configuration can be overwritte by adding the appropriate attribute to the loader’s collection argument.

<service
    id="import_batch.loader.product.varchar.processor.attribute.id"
    class="TechDivision\Import\Batch\Loaders\GenericAttributeIdLoader">
    <argument type="service" id="configuration"/>
    <argument type="service" id="import.processor.import"/>
    <argument type="collection">
        <argument type="constant">TechDivision\Import\Product\Utils\MemberNames::URL_KEY</argument>
        <argument type="constant">TechDivision\Import\Product\Utils\MemberNames::URL_PATH</argument>
    </argument>
</service>

This will be passed as fifth argument to the the processor that handles the creation of the product varchar attribute values an is based on the TechDivision\Import\Batch\Actions\Processors\GenericAttributeBatchProcessor implementation.

The url_key attribute triggers the stack clean-up of the processor, that handles the creation of the product varchar attributes, because URL key management makes it necessary to always have the actual URL keys in the database.