class batchcreate.BatchCreator(records, max_record_size=1000000, max_batch_size=5000000, max_batch_num_records=500)[source]

Batchcreator takes an array of records as input and splits it into suitably sized batches of records which can be further processed or passed to any other system/s. Import BatchCreator iterator class and instantiate it. You can use below parameters to define output batch limits. These parameters are optional. If neither of these parameters is specified then the default value will be used.

Example:

from batchcreate import BatchCreator

batches = BatchCreator(records,
                   max_record_size=60,
                   max_batch_size=200,
                   max_batch_num_records=4)

The iterable BatchCreator object can give suitable batches as needed on iteration. The BatchCreator object can be used in a regular ‘for’ loop.

Example:

for batch in batches:
    print(batch) #batch processing here

OR

Example:

batchItr = iter(batches)
print(next(batchItr)) #batch processing here
Attributes:
records[]

Input list of records to split into batches.

max_record_sizeint, default 1MB

The maximum size limit for a record in the output batch. Any record with larger size than this will be skipped from batching.

max_batch_sizeint, default 5MB

The maximum size limit for a batch.

max_batch_num_recordsint, default 500

The maximum number of records limit for a batch. BatchCreator will put maximum these many records per batch provided batch size satisfies the limit.

Methods:
batches :

Returns the list of all the batches.

batches()[source]

Creates a list of batches by iterating over BatchCreator iterator.

Returns

List of batches.

Example:

batches = BatchCreator(records).batches()