thoth.cli

Command-line interface for Thoth ingestion pipeline.

This module provides a Click-based CLI for running the ingestion pipeline, checking status, and managing the vector store.

Functions

setup_logger(name[, level, simple])

Creates and configures a secure logger with automatic sensitive data redaction.

setup_pipeline(repo_url, clone_path, ...)

Set up the ingestion pipeline with given configuration.

Classes

BarColumn([bar_width, style, ...])

Renders a visual progress bar.

Console(*, color_system, , , , ] | None =, ...)

A high level console interface.

Embedder([model_name, device, batch_size])

Generate embeddings from text using sentence-transformers.

HandbookRepoManager([repo_url, clone_path, ...])

Manages the GitLab handbook repository.

IngestionPipeline([repo_manager, chunker, ...])

Orchestrates the complete ingestion pipeline.

MarkdownChunker([min_chunk_size, ...])

Intelligent markdown-aware chunking.

Panel(renderable[, box, title, title_align, ...])

A console renderable that draws a border around its contents.

Path(*args, **kwargs)

PurePath subclass that can make system calls.

Progress(*columns[, console, auto_refresh, ...])

Renders an auto-updating progress bar(s).

SpinnerColumn([spinner_name, style, speed, ...])

A column with a 'spinner' animation.

Table(*headers[, title, caption, width, ...])

A console renderable to draw a table.

TaskProgressColumn([text_format, ...])

Show task progress as a percentage.

TextColumn(text_format[, style, justify, ...])

A column containing text.

TimeRemainingColumn([compact, ...])

Renders estimated time remaining.

VectorStore([persist_directory, ...])

Vector store for managing document embeddings using ChromaDB.

thoth.cli.setup_pipeline(repo_url: str | None, clone_path: str | None, db_path: str | None, collection: str | None) IngestionPipeline[source]

Set up the ingestion pipeline with given configuration.

Parameters:
  • repo_url – Repository URL (None for default)

  • clone_path – Local clone path (None for default)

  • db_path – Database path (None for default)

  • collection – Collection name (None for default)

Returns:

Configured IngestionPipeline instance