Skip to content

Merge

thoth.ingestion.flows.merge

Merge batches workflow.

Handles the /merge-batches endpoint which consolidates all batch LanceDB tables into the main store after parallel processing completes.

logger = setup_logger(__name__) module-attribute

BATCH_PREFIX_PATTERN = 'lancedb_batch_' module-attribute

merge_batches(request: Request) -> JSONResponse async

Merge all batch LanceDB tables from GCS into the main store.

Expects JSON body

collection_name: Collection to merge (optional, default: handbook_documents) cleanup: Delete batches after merge (optional, default: True)

Returns:

Type Description
JSONResponse

JSONResponse with status, merged_count, batches_merged, batches_cleaned