thoth.shared.sources¶
Source configuration module for multi-source ingestion.
Classes
|
Configuration for a single data source (handbook, D&D, personal, etc.). |
Registry for managing data source configurations. |
- class thoth.shared.sources.SourceConfig(name: str, collection_name: str, gcs_prefix: str, supported_formats: list[str] = <factory>, description: str = '')[source]¶
Bases:
objectConfiguration for a single data source (handbook, D&D, personal, etc.).
Each source has a unique name, a LanceDB table (collection) name, a GCS prefix for stored files, and a list of supported file extensions. Used by the ingestion pipeline and MCP server to route and filter documents.
- class thoth.shared.sources.SourceRegistry[source]¶
Bases:
objectRegistry for managing data source configurations.
The registry loads default configurations and supports environment variable overrides for GCS prefixes.
- Environment variables:
THOTH_SOURCE_{NAME}_GCS_PREFIX: Override GCS prefix for a source THOTH_SOURCE_{NAME}_COLLECTION: Override collection name for a source
Example
THOTH_SOURCE_HANDBOOK_GCS_PREFIX=custom_handbook THOTH_SOURCE_DND_COLLECTION=my_dnd_collection
- get(name: str) → SourceConfig | None[source]¶
Get source configuration by name.
- Parameters:
name – Source identifier (e.g., ‘handbook’, ‘dnd’, ‘personal’)
- Returns:
SourceConfig if found, None otherwise
- list_configs() → list[SourceConfig][source]¶
List all source configurations.
- Returns:
List of SourceConfig instances
- register(config: SourceConfig) → None[source]¶
Register a new source configuration.
- Parameters:
config – SourceConfig to register
- Raises:
ValueError – If source with same name already exists
- update(config: SourceConfig) → None[source]¶
Update an existing source configuration.
- Parameters:
config – SourceConfig with updated values
Modules
Source configuration for multi-source ingestion. |