Skip to content

Cli

thoth.shared.cli

Command-line interface for Thoth ingestion pipeline.

This module provides a Click-based CLI for running the ingestion pipeline, checking status, and managing the vector store.

console = Console() module-attribute

setup_pipeline(repo_url: str | None, clone_path: str | None, db_path: str | None, collection: str | None) -> IngestionPipeline

Set up the ingestion pipeline with given configuration.

Parameters:

Name Type Description Default
repo_url str | None

Repository URL (None for default)

required
clone_path str | None

Local clone path (None for default)

required
db_path str | None

Database path (None for default)

required
collection str | None

Collection name (None for default)

required

Returns:

Type Description
IngestionPipeline

Configured IngestionPipeline instance

cli() -> None

Thoth - GitLab Handbook Ingestion Pipeline.

Ingest, process, and index the GitLab handbook for semantic search.

ingest(repo_url: str | None, clone_path: str | None, db_path: str | None, collection: str | None, force: bool, full: bool, batch_size: int) -> None

Run the ingestion pipeline to index the GitLab handbook.

This command will: 1. Clone or update the GitLab handbook repository 2. Discover and process markdown files 3. Generate chunks and embeddings 4. Store in the vector database

By default, runs in incremental mode (only processes changed files). Use --full to process all files regardless of changes.

status(clone_path: str | None, db_path: str | None, collection: str | None) -> None

Show current pipeline status and statistics.

reset(clone_path: str | None, db_path: str | None, collection: str | None, keep_repo: bool) -> None

Reset the pipeline state and vector database.

This will clear all processed data and start fresh. Use --keep-repo to preserve the cloned repository.

search(db_path: str | None, collection: str | None, query: str, limit: int) -> None

Search the indexed handbook for relevant content.

Example: thoth search -q "How to contribute to GitLab?" -n 3

schedule(repo_url: str | None, clone_path: str | None, db_path: str | None, collection: str | None, interval: int, cron_hour: int | None, cron_minute: int, start_immediately: bool) -> None

Start the scheduler for automated syncs.

By default, syncs run every 60 minutes. Use --interval to change frequency, or use --cron-hour and --cron-minute for cron-style scheduling.

Examples:

Run every 30 minutes

thoth schedule --interval 30

Run daily at 2:30 AM

thoth schedule --cron-hour 2 --cron-minute 30

Run every hour, starting immediately

thoth schedule --start-immediately

Press Ctrl+C to stop the scheduler.

health(clone_path: str | None, db_path: str | None) -> None

Check system health status.

Runs health checks on key components and displays the results.

sync(repo_url: str | None, clone_path: str | None, db_path: str | None, collection: str | None) -> None

Manually trigger a sync operation.

This is useful for testing the scheduler or running a one-off sync.