thoth.ingestion.flows.merge

Merge batches workflow.

Handles the /merge-batches endpoint which consolidates all batch LanceDB tables into the main store after parallel processing completes.

Functions

merge_batches(request)

Merge all batch LanceDB tables from GCS into the main store.

setup_logger(name[, level, simple, json_output])

Create and configure a logger with structured JSON output.

Classes

Any(*args, **kwargs)

Special type indicating an unconstrained type.

GCSSync(bucket_name[, project_id, ...])

Manages sync of local vector DB directories to/from Google Cloud Storage.

JSONResponse(content[, status_code, ...])

Request(scope, ~typing.Any], receive, ...)

VectorStore([persist_directory, ...])

Vector store for document embeddings using LanceDB.

async thoth.ingestion.flows.merge.merge_batches(request: Request) JSONResponse[source]

Merge all batch LanceDB tables from GCS into the main store.

Expects JSON body:

collection_name: Collection to merge (optional, default: handbook_documents) cleanup: Delete batches after merge (optional, default: True)

Returns:

JSONResponse with status, merged_count, batches_merged, batches_cleaned