Skip to content

Batch

thoth.ingestion.flows.batch

Batch processing workflow.

Handles the /ingest-batch endpoint which processes a specific batch of files in parallel. Each batch writes to its own isolated LanceDB table.

logger = setup_logger(__name__) module-attribute

BATCH_PREFIX_PATTERN = 'lancedb_batch_' module-attribute

process_batch(request: Request) -> JSONResponse async

Process a specific batch of files (called by Cloud Tasks).

Each batch is stored in a unique GCS prefix to avoid conflicts during parallel processing. Use /merge-batches to consolidate.