thoth.shared.sources

Source configuration module for multi-source ingestion.

Classes

SourceConfig(name, collection_name, ...)

Configuration for a single data source (handbook, D&D, personal, etc.).

SourceRegistry()

Registry for managing data source configurations.

class thoth.shared.sources.SourceConfig(name: str, collection_name: str, gcs_prefix: str, supported_formats: list[str] = <factory>, description: str = '')[source]

Bases: object

Configuration for a single data source (handbook, D&D, personal, etc.).

Each source has a unique name, a LanceDB table (collection) name, a GCS prefix for stored files, and a list of supported file extensions. Used by the ingestion pipeline and MCP server to route and filter documents.

name

Unique identifier for the source (e.g., ‘handbook’, ‘dnd’, ‘personal’).

Type:

str

collection_name

LanceDB table name for this source (e.g., ‘handbook_documents’).

Type:

str

gcs_prefix

GCS path prefix where source files are stored in the bucket.

Type:

str

supported_formats

File extensions supported for ingestion (e.g., [‘.md’, ‘.pdf’]).

Type:

list[str]

description

Human-readable description of the source for logging and UI.

Type:

str

name: str
collection_name: str
gcs_prefix: str
supported_formats: list[str]
description: str = ''
supports_format(extension: str) bool[source]

Check if this source supports a file format.

Parameters:

extension – File extension including dot (e.g., ‘.md’)

Returns:

True if format is supported

__init__(name: str, collection_name: str, gcs_prefix: str, supported_formats: list[str] = <factory>, description: str = '') None
class thoth.shared.sources.SourceRegistry[source]

Bases: object

Registry for managing data source configurations.

The registry loads default configurations and supports environment variable overrides for GCS prefixes.

Environment variables:

THOTH_SOURCE_{NAME}_GCS_PREFIX: Override GCS prefix for a source THOTH_SOURCE_{NAME}_COLLECTION: Override collection name for a source

Example

THOTH_SOURCE_HANDBOOK_GCS_PREFIX=custom_handbook THOTH_SOURCE_DND_COLLECTION=my_dnd_collection

__init__() None[source]

Initialize the source registry with defaults and overrides.

get(name: str) SourceConfig | None[source]

Get source configuration by name.

Parameters:

name – Source identifier (e.g., ‘handbook’, ‘dnd’, ‘personal’)

Returns:

SourceConfig if found, None otherwise

list_sources() list[str][source]

List all registered source names.

Returns:

List of source names

list_configs() list[SourceConfig][source]

List all source configurations.

Returns:

List of SourceConfig instances

register(config: SourceConfig) None[source]

Register a new source configuration.

Parameters:

config – SourceConfig to register

Raises:

ValueError – If source with same name already exists

update(config: SourceConfig) None[source]

Update an existing source configuration.

Parameters:

config – SourceConfig with updated values

get_all_collections() list[str][source]

Get all collection names.

Returns:

List of collection names from all sources

Modules

config

Source configuration for multi-source ingestion.