thoth.shared.utils.logger¶
Structured Logging Utilities for GCP Cloud Logging and Grafana Loki.
This module provides a structured JSON logging framework that is compatible with: - Google Cloud Logging (Cloud Run, GKE, Cloud Functions) - Grafana Loki - Any JSON-aware log aggregation system
Key Features: - Structured JSON output with consistent field schema - GCP Cloud Logging special fields (sourceLocation, trace, labels) - Job/request correlation via JobLoggerAdapter - Automatic sensitive data redaction - Verbose source location (file, line, function) - Metrics-ready numeric fields for dashboards
Example
>>> from thoth.shared.utils.logger import setup_logger, get_job_logger
>>>
>>> # Basic usage
>>> logger = setup_logger("myapp")
>>> logger.info("Server started", extra={"port": 8080})
>>>
>>> # Job-scoped logging
>>> job_logger = get_job_logger(logger, job_id="job_123", source="handbook")
>>> job_logger.info("Processing file", extra={"file_path": "docs/readme.md"})
Functions
|
Cast a value to a type. |
|
Configure the root logger for the application. |
|
Extract trace ID from X-Cloud-Trace-Context header. |
|
Create a job-scoped logger adapter. |
Get the current trace context. |
|
|
Set the trace context for the current request/task. |
|
Create and configure a logger with structured JSON output. |
Classes
|
Special type indicating an unconstrained type. |
|
|
|
JSON formatter compatible with GCP Cloud Logging and Grafana Loki. |
|
Logger adapter that automatically includes job context in all log messages. |
|
A MutableMapping is a generic container for associating key/value pairs. |
|
Legacy SecureLogger class for backward compatibility. |
|
Simple text formatter for local development/debugging. |
|
The year, month and day arguments are required. |
|
- thoth.shared.utils.logger.set_trace_context(trace_id: str | None, project_id: str | None = None) → None[source]¶
Set the trace context for the current request/task.
Call this at the start of each request handler to enable log correlation.
- Parameters:
trace_id – The trace ID from X-Cloud-Trace-Context header
project_id – GCP project ID for constructing full trace URL
- thoth.shared.utils.logger.extract_trace_id_from_header(header_value: str | None) → str | None[source]¶
Extract trace ID from X-Cloud-Trace-Context header.
The header format is: TRACE_ID/SPAN_ID;o=TRACE_TRUE
- Parameters:
header_value – The X-Cloud-Trace-Context header value
- Returns:
The trace ID portion, or None if header is missing/invalid
- class thoth.shared.utils.logger.GCPStructuredFormatter(*args: Any, **kwargs: Any)[source]¶
Bases:
JsonFormatterJSON formatter compatible with GCP Cloud Logging and Grafana Loki.
This formatter produces structured JSON logs with: - Standard fields (timestamp, severity, message, logger) - Verbose source location (pathname, filename, lineno, funcName) - GCP special fields (sourceLocation, trace, labels) - Custom context fields (job_id, source, operation, etc.) - Automatic sensitive data redaction
The output is compatible with: - GCP Cloud Logging (jsonPayload with special field recognition) - Grafana Loki (JSON parsing and label extraction) - Any JSON-aware log aggregation system
- Example output:
- {
“timestamp”: “2026-01-30T10:15:30.123456Z”, “severity”: “INFO”, “message”: “Processing file”, “logger”: “thoth.ingestion.pipeline”, “pathname”: “/app/thoth/ingestion/pipeline.py”, “filename”: “pipeline.py”, “lineno”: 456, “funcName”: “_process_file”, “module”: “pipeline”, “logging.googleapis.com/sourceLocation”: {
“file”: “thoth/ingestion/pipeline.py”, “line”: “456”, “function”: “_process_file”
}, “job_id”: “job_xyz789”, “source”: “handbook”
}
- SENSITIVE_KEYWORDS: ClassVar[list[str]] = ['password', 'passwd', 'pwd', 'secret', 'token', 'apikey', 'api_key', 'auth', 'authorization', 'credential', 'key', 'private', 'session', 'cookie', 'jwt', 'bearer', 'oauth']¶
- class thoth.shared.utils.logger.SimpleFormatter(fmt: str | None = None, **kwargs: Any)[source]¶
Bases:
FormatterSimple text formatter for local development/debugging.
Uses a human-readable format without JSON structure. Still includes sensitive data redaction.
- SENSITIVE_KEYWORDS: ClassVar[list[str]] = ['password', 'passwd', 'pwd', 'secret', 'token', 'apikey', 'api_key', 'auth', 'authorization', 'credential', 'key', 'private', 'session', 'cookie', 'jwt', 'bearer', 'oauth']¶
- class thoth.shared.utils.logger.JobLoggerAdapter(logger: Logger, job_id: str, source: str | None = None, collection: str | None = None, **extra_context: Any)[source]¶
Bases:
LoggerAdapterLogger adapter that automatically includes job context in all log messages.
This adapter enriches log messages with job-specific context like job_id, source, and collection. Use this when processing a specific job to ensure all logs can be correlated.
Example
>>> base_logger = setup_logger("thoth.worker") >>> job_logger = JobLoggerAdapter(base_logger, job_id="job_123", source="handbook") >>> job_logger.info("Processing started") >>> job_logger.info("File processed", extra={"file_path": "readme.md"})
- __init__(logger: Logger, job_id: str, source: str | None = None, collection: str | None = None, **extra_context: Any) → None[source]¶
Initialize the job logger adapter.
- Parameters:
logger – The base logger to wrap
job_id – Unique identifier for the job/run
source – Source being processed (e.g., “handbook”, “dnd”)
collection – Collection name being used
**extra_context – Additional context to include in all logs
- process(msg: str, kwargs: MutableMapping[str, Any]) → tuple[str, MutableMapping[str, Any]][source]¶
Process the log message to include job context.
- Parameters:
msg – The log message
kwargs – Keyword arguments for the log call
- Returns:
Tuple of (message, kwargs) with context added to extra
- with_operation(operation: str) → JobLoggerAdapter[source]¶
Create a child logger for a specific operation.
- Parameters:
operation – The operation name (e.g., “chunking”, “embedding”, “storing”)
- Returns:
A new JobLoggerAdapter with the operation context added
- class thoth.shared.utils.logger.SecureLogger(name: str, level: int = 0)[source]¶
Bases:
LoggerLegacy SecureLogger class for backward compatibility.
New code should use setup_logger() which returns a standard Logger with GCPStructuredFormatter attached.
This class is maintained for backward compatibility with existing code that checks isinstance(logger, SecureLogger).
- SENSITIVE_KEYWORDS: ClassVar[list[str]] = ['password', 'passwd', 'pwd', 'secret', 'token', 'apikey', 'api_key', 'auth', 'authorization', 'credential', 'key', 'private', 'session', 'cookie', 'jwt', 'bearer', 'oauth']¶
- debug(msg: Any, *args: Any, **kwargs: Any) → None[source]¶
Log a debug message with safe formatting.
- warning(msg: Any, *args: Any, **kwargs: Any) → None[source]¶
Log a warning message with safe formatting.
- thoth.shared.utils.logger.SensitiveDataFormatter¶
alias of
SimpleFormatter
- thoth.shared.utils.logger.setup_logger(name: str, level: int = 20, simple: bool = False, json_output: bool | None = None) → Logger[source]¶
Create and configure a logger with structured JSON output.
This function creates a logger that outputs structured JSON logs compatible with GCP Cloud Logging and Grafana Loki. By default, it auto-detects whether to use JSON output based on the environment.
- Parameters:
name – Name of the logger (typically __name__)
level – Logging level (default: INFO)
simple – If True, use simple text format instead of JSON (for local dev)
json_output – Explicit control over JSON output. If None, auto-detects: - True in Cloud Run (GCS_BUCKET_NAME set) - True if LOG_FORMAT=json - False otherwise (local development)
- Returns:
Configured logger instance
Example
>>> logger = setup_logger(__name__) >>> logger.info("Server started", extra={"port": 8080})
>>> # With job context >>> logger.info("Processing", extra={"job_id": "abc123", "source": "handbook"})
- thoth.shared.utils.logger.get_job_logger(base_logger: Logger, job_id: str, source: str | None = None, collection: str | None = None, **extra_context: Any) → JobLoggerAdapter[source]¶
Create a job-scoped logger adapter.
This is the recommended way to create loggers for job processing. All log messages will automatically include the job context.
- Parameters:
base_logger – The base logger (from setup_logger)
job_id – Unique identifier for the job
source – Source being processed (e.g., “handbook”)
collection – Collection name
**extra_context – Additional context fields
- Returns:
JobLoggerAdapter with job context
Example
>>> logger = setup_logger("thoth.worker") >>> job_logger = get_job_logger(logger, job_id="job_123", source="handbook") >>> job_logger.info("Starting ingestion") >>> job_logger.info("Processed file", extra={"file_path": "readme.md", "chunks_created": 15})
- thoth.shared.utils.logger.configure_root_logger(level: int = 20, json_output: bool | None = None) → None[source]¶
Configure the root logger for the application.
Call this once at application startup to configure global logging behavior.
- Parameters:
level – Root logging level
json_output – Whether to use JSON output (auto-detects if None)