thoth.ingestion.flows.merge¶
Merge batches workflow.
Handles the /merge-batches endpoint which consolidates all batch LanceDB tables into the main store after parallel processing completes.
Functions
|
Merge all batch LanceDB tables from GCS into the main store. |
|
Create and configure a logger with structured JSON output. |
Classes
|
Special type indicating an unconstrained type. |
|
Manages sync of local vector DB directories to/from Google Cloud Storage. |
|
|
|
|
|
Vector store for document embeddings using LanceDB. |
- async thoth.ingestion.flows.merge.merge_batches(request: Request) JSONResponse[source]¶
Merge all batch LanceDB tables from GCS into the main store.
- Expects JSON body:
collection_name: Collection to merge (optional, default: handbook_documents) cleanup: Delete batches after merge (optional, default: True)
- Returns:
JSONResponse with status, merged_count, batches_merged, batches_cleaned