Cli
thoth.shared.cli
¶
Command-line interface for Thoth ingestion pipeline.
This module provides a Click-based CLI for running the ingestion pipeline, checking status, and managing the vector store.
console = Console()
module-attribute
¶
setup_pipeline(repo_url: str | None, clone_path: str | None, db_path: str | None, collection: str | None) -> IngestionPipeline
¶
Set up the ingestion pipeline with given configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
repo_url
|
str | None
|
Repository URL (None for default) |
required |
clone_path
|
str | None
|
Local clone path (None for default) |
required |
db_path
|
str | None
|
Database path (None for default) |
required |
collection
|
str | None
|
Collection name (None for default) |
required |
Returns:
| Type | Description |
|---|---|
IngestionPipeline
|
Configured IngestionPipeline instance |
cli() -> None
¶
Thoth - GitLab Handbook Ingestion Pipeline.
Ingest, process, and index the GitLab handbook for semantic search.
ingest(repo_url: str | None, clone_path: str | None, db_path: str | None, collection: str | None, force: bool, full: bool, batch_size: int) -> None
¶
Run the ingestion pipeline to index the GitLab handbook.
This command will: 1. Clone or update the GitLab handbook repository 2. Discover and process markdown files 3. Generate chunks and embeddings 4. Store in the vector database
By default, runs in incremental mode (only processes changed files). Use --full to process all files regardless of changes.
status(clone_path: str | None, db_path: str | None, collection: str | None) -> None
¶
Show current pipeline status and statistics.
reset(clone_path: str | None, db_path: str | None, collection: str | None, keep_repo: bool) -> None
¶
Reset the pipeline state and vector database.
This will clear all processed data and start fresh. Use --keep-repo to preserve the cloned repository.
search(db_path: str | None, collection: str | None, query: str, limit: int) -> None
¶
Search the indexed handbook for relevant content.
Example: thoth search -q "How to contribute to GitLab?" -n 3
schedule(repo_url: str | None, clone_path: str | None, db_path: str | None, collection: str | None, interval: int, cron_hour: int | None, cron_minute: int, start_immediately: bool) -> None
¶
Start the scheduler for automated syncs.
By default, syncs run every 60 minutes. Use --interval to change frequency, or use --cron-hour and --cron-minute for cron-style scheduling.
Examples:
Run every 30 minutes¶
thoth schedule --interval 30
Run daily at 2:30 AM¶
thoth schedule --cron-hour 2 --cron-minute 30
Run every hour, starting immediately¶
thoth schedule --start-immediately
Press Ctrl+C to stop the scheduler.
health(clone_path: str | None, db_path: str | None) -> None
¶
Check system health status.
Runs health checks on key components and displays the results.
sync(repo_url: str | None, clone_path: str | None, db_path: str | None, collection: str | None) -> None
¶
Manually trigger a sync operation.
This is useful for testing the scheduler or running a one-off sync.