Merge
thoth.ingestion.flows.merge
¶
Merge batches workflow.
Handles the /merge-batches endpoint which consolidates all batch LanceDB tables into the main store after parallel processing completes.
logger = setup_logger(__name__)
module-attribute
¶
BATCH_PREFIX_PATTERN = 'lancedb_batch_'
module-attribute
¶
merge_batches(request: Request) -> JSONResponse
async
¶
Merge all batch LanceDB tables from GCS into the main store.
Expects JSON body
collection_name: Collection to merge (optional, default: handbook_documents) cleanup: Delete batches after merge (optional, default: True)
Returns:
| Type | Description |
|---|---|
JSONResponse
|
JSONResponse with status, merged_count, batches_merged, batches_cleaned |