Batch
thoth.ingestion.flows.batch
¶
Batch processing workflow.
Handles the /ingest-batch endpoint which processes a specific batch of files in parallel. Each batch writes to its own isolated LanceDB table.
logger = setup_logger(__name__)
module-attribute
¶
BATCH_PREFIX_PATTERN = 'lancedb_batch_'
module-attribute
¶
process_batch(request: Request) -> JSONResponse
async
¶
Process a specific batch of files (called by Cloud Tasks).
Each batch is stored in a unique GCS prefix to avoid conflicts during parallel processing. Use /merge-batches to consolidate.