Bulk API

Salesforce Toolkit provides an interface to the Salesforce Bulk API 2.0, which is designed for loading, updating, or deleting large sets of data. The Bulk API is ideal for processing many records asynchronously.

Overview

The Bulk API in Salesforce Toolkit consists of two main components:

BulkApiIngestJob - Represents a Salesforce Bulk API 2.0 job
SObjectList bulk methods - Methods to efficiently process large collections of records

When to Use Bulk API

Use the Bulk API when:

Processing 10,000+ records
Performing batch operations that would otherwise exceed API limits
Running operations that can be processed asynchronously
Needing better performance for large datasets

Basic Usage

The simplest way to use Bulk API is through the SObjectList bulk methods:

from sf_toolkit.data import SObject
from sf_toolkit.data.fields import IdField, TextField

class Account(SObject):
    Id = IdField()
    Name = TextField()
    Industry = TextField()

# Create a list of accounts
accounts = [
    Account(Name=f"Bulk Account {i}", Industry="Technology")
    for i in range(1, 1001)
]

# Create SObjectList
account_list = SObjectList(accounts)

# Insert using bulk API
results = account_list.save_insert_bulk()

print(f"Successfully inserted {results.numberRecordsProcessed} records")
print(f"Failed to insert {results.numberRecordsFailed} records")

Bulk Insert

To insert large sets of records:

# Create SObjectList with many records
contacts = SObjectList([
    Contact(FirstName=f"Contact{i}", LastName=f"Bulk{i}")
    for i in range(1, 50000)
])

# Insert using bulk API
bulk_job = contacts.save_insert_bulk()

# Check job status
print(f"Job ID: {bulk_job.id}")
print(f"Status: {bulk_job.state}")

# Refresh to get latest status
updated_job = bulk_job.refresh()
print(f"Updated status: {updated_job.state}")

Bulk Update

To update large sets of records:

# Get existing records
contacts = Contact.query().where(LastName="Bulk").execute()

# Convert to SObjectList
contact_list = SObjectList(contacts)

# Update all records
for contact in contact_list:
    contact.Title = "Bulk API Example"

# Update using bulk API
bulk_job = contact_list.save_update_bulk()

print(f"Records processed: {bulk_job.numberRecordsProcessed}")

Bulk Upsert

To upsert (insert or update) records based on an external ID:

# Create or update records with external ID
accounts = SObjectList([
    Account(ExternalId__c=f"EXT-{i}", Name=f"Upsert Account {i}")
    for i in range(1, 10000)
])

# Upsert using bulk API with external ID field
bulk_job = accounts.save_upsert_bulk(external_id_field="ExternalId__c")

print(f"Job state: {bulk_job.state}")
print(f"Records processed: {bulk_job.numberRecordsProcessed}")
print(f"Records failed: {bulk_job.numberRecordsFailed}")

Working with BulkApiIngestJob Directly

For more control, you can work with the BulkApiIngestJob class directly:

from sf_toolkit.data.bulk import BulkApiIngestJob

# Initialize a new bulk job
bulk_job = BulkApiIngestJob.init_job(
    sobject_type="Account",
    operation="insert",
    column_delimiter="COMMA",
    line_ending="LF",
    connection=client  # Your SalesforceClient instance
)

# Create a list of records
accounts = SObjectList([
    Account(Name=f"Direct Bulk Job {i}")
    for i in range(1, 5000)
])

# Upload data batches
bulk_job = bulk_job.upload_batches(accounts)

# Monitor job status
print(f"Job ID: {bulk_job.id}")
print(f"Current state: {bulk_job.state}")

# Refresh to get latest status
updated_job = bulk_job.refresh()

# Check final results
if updated_job.state == "JobComplete":
    print(f"Successfully processed: {updated_job.numberRecordsProcessed}")
    print(f"Failed records: {updated_job.numberRecordsFailed}")

Bulk Job States

A Bulk API job can be in one of these states:

Open - Job has been created and is ready for data upload
UploadComplete - All data has been uploaded and the job is being processed
Aborted - Job was aborted by the user
JobComplete - Job has completed processing
Failed - Job has failed

Monitoring Job Status

You can monitor the status of a bulk job:

# Get a job by ID
job_id = "750xx000000001234"
connection = SalesforceClient(login=cli_login())

# Create a job instance with just the ID
job = BulkApiIngestJob(id=job_id, connection=connection)

# Refresh to get current status
job = job.refresh()

print(f"Job state: {job.state}")
print(f"Records processed: {job.numberRecordsProcessed}")
print(f"Records failed: {job.numberRecordsFailed}")
print(f"Error message: {job.errorMessage}")

Performance Considerations

When using the Bulk API:

Batch size - Data is automatically split into optimal batch sizes (up to 100MB per batch)
Column delimiter - Default is COMMA, but you can choose others like TAB or PIPE
Parallel processing - Salesforce processes batches in parallel
API limits - Bulk API operations don’t count against your regular API limits

Error Handling

For bulk operations, errors are tracked at the job level:

bulk_job = accounts.save_insert_bulk()

# Check for errors
if bulk_job.state == "Failed":
    print(f"Job failed: {bulk_job.errorMessage}")
elif bulk_job.numberRecordsFailed > 0:
    print(f"{bulk_job.numberRecordsFailed} records failed to process")

# For partial failures, some records processed successfully
if bulk_job.numberRecordsProcessed > 0:
    print(f"{bulk_job.numberRecordsProcessed} records processed successfully")

Advanced Configuration

You can configure various aspects of the bulk job:

# Custom column delimiter
bulk_job = BulkApiIngestJob.init_job(
    sobject_type="Account",
    operation="insert",
    column_delimiter="TAB",  # Use tab delimiter
    connection=client
)

# Create a job for hard delete operation
delete_job = BulkApiIngestJob.init_job(
    sobject_type="Account",
    operation="hardDelete",  # Permanently delete records
    connection=client
)

Limitations

Bulk API 2.0 only supports CSV format (not JSON or XML)
Maximum file size for a single upload is 100MB (base64 encoded size up to 150MB)
Certain SObject types are not supported in Bulk API
Some operations like merge are not supported
Processing is asynchronous; results are not immediately available

For more details on Salesforce Bulk API 2.0, see the Salesforce Bulk API Developer Guide.