thoth.shared.monitoring¶
Monitoring and health check system for Thoth.
This module provides metrics tracking, health status monitoring, and alerting hooks for the ingestion pipeline and scheduled operations.
Functions
Create default health check functions for common components. |
|
|
Add dunder methods based on the fields defined in the class. |
|
Return an object to identify dataclass fields. |
|
Create and configure a logger with structured JSON output. |
Classes
|
Special type indicating an unconstrained type. |
|
|
|
Create a collection of name/value pairs. |
|
Represents a health check result. |
|
Enumeration of possible health statuses. |
|
Tracks operational metrics. |
|
Monitoring system for tracking metrics and health status. |
|
PurePath subclass that can make system calls. |
|
The year, month and day arguments are required. |
- class thoth.shared.monitoring.HealthCheck(name: str, status: HealthStatus, message: str, timestamp: datetime = <factory>, metadata: dict[str, ~typing.Any]=<factory>)[source]¶
Bases:
objectRepresents a health check result.
- status¶
Health status result
- timestamp¶
When the check was performed
- Type:
- status: HealthStatus¶
- class thoth.shared.monitoring.HealthStatus(*values)[source]¶
Bases:
EnumEnumeration of possible health statuses.
- HEALTHY = 'healthy'¶
- DEGRADED = 'degraded'¶
- UNHEALTHY = 'unhealthy'¶
- UNKNOWN = 'unknown'¶
- class thoth.shared.monitoring.Metrics(sync_count: int = 0, sync_success_count: int = 0, sync_failure_count: int = 0, last_sync_time: datetime | None = None, last_sync_duration: float = 0.0, total_files_processed: int = 0, total_chunks_created: int = 0, errors: list[dict[str, str]]=<factory>)[source]¶
Bases:
objectTracks operational metrics.
- last_sync_time¶
Timestamp of last sync attempt
- Type:
datetime.datetime | None
- class thoth.shared.monitoring.Monitor(logger_instance: Logger | None = None, max_errors: int = 100)[source]¶
Bases:
objectMonitoring system for tracking metrics and health status.
This class provides centralized monitoring with thread-safe metric collection, health checks, and alerting capabilities.
- metrics¶
Current operational metrics
- health_checks¶
Dictionary of registered health checks
- alert_callbacks¶
List of functions to call on alerts
- logger¶
Logger instance
- __init__(logger_instance: Logger | None = None, max_errors: int = 100)[source]¶
Initialize the monitoring system.
- Parameters:
logger_instance – Optional logger instance
max_errors – Maximum number of errors to retain
- health_checks: dict[str, Callable[[], HealthCheck]]¶
- record_sync_success(files_processed: int, chunks_created: int, duration: float) → None[source]¶
Record a successful sync operation.
- Parameters:
files_processed – Number of files processed
chunks_created – Number of chunks created
duration – Duration in seconds
- record_sync_failure(error: Exception) → None[source]¶
Record a failed sync operation.
- Parameters:
error – Exception that caused the failure
- register_health_check(name: str, check_function: Callable[[], HealthCheck]) → None[source]¶
Register a health check function.
- Parameters:
name – Unique name for the health check
check_function – Function that returns a HealthCheck
- run_health_checks() → dict[str, HealthCheck][source]¶
Run all registered health checks.
- Returns:
Dictionary mapping check names to results
- get_overall_health() → HealthStatus[source]¶
Determine overall system health based on all checks.
- Returns:
Overall HealthStatus
- get_health_report() → dict[str, Any][source]¶
Generate a comprehensive health report.
- Returns:
Dictionary containing overall health and individual checks
- get_metrics() → dict[str, Any][source]¶
Get current metrics snapshot.
- Returns:
Dictionary containing current metrics
- add_alert_callback(callback: Callable[[str, dict[str, Any]], None]) → None[source]¶
Add a callback function for alerts.
The callback will be called with (alert_type, data) when alerts trigger.
- Parameters:
callback – Function to call on alerts
- thoth.shared.monitoring.create_default_health_checks(vector_store_path: Path, repo_path: Path) → dict[str, Callable[[], HealthCheck]][source]¶
Create default health check functions for common components.
- Parameters:
vector_store_path – Path to vector store database
repo_path – Path to repository
- Returns:
Dictionary of health check functions