thoth.ingestion.repo_manager¶
Repository manager for cloning and tracking the GitLab handbook.
Functions
|
Create and configure a logger with structured JSON output. |
Classes
|
Special type indicating an unconstrained type. |
|
Progress handler for git clone operations. |
|
Manages the GitLab handbook repository. |
|
PurePath subclass that can make system calls. |
|
Handler providing an interface to parse progress information emitted by git-push(1) and git-fetch(1) and to dispatch callbacks allowing subclasses to react to the progress. |
|
Represents a git repository and allows you to query references, create commit information, generate diffs, create and clone repositories, and query the log. |
Exceptions
|
Thrown if execution of the git command fails with non-zero status code. |
|
Thrown if the given repository appears to have an invalid format. |
- class thoth.ingestion.repo_manager.CloneProgress(logger: Logger | LoggerAdapter)[source]¶
Bases:
RemoteProgressProgress handler for git clone operations.
Logs progress updates during clone/fetch operations to provide visibility into long-running git operations.
- OP_NAMES: ClassVar[dict[int, str]] = {4: 'Counting objects', 8: 'Compressing objects', 16: 'Writing objects', 32: 'Receiving objects', 64: 'Resolving deltas', 128: 'Finding sources', 256: 'Checking out files'}¶
- __init__(logger: Logger | LoggerAdapter) None[source]¶
Initialize the progress handler.
- Parameters:
logger – Logger instance for progress messages
- update(op_code: int, cur_count: str | float, max_count: str | float | None = None, message: str = '') None[source]¶
Called for each progress update from git.
- Parameters:
op_code – Operation code indicating the current stage
cur_count – Current progress count
max_count – Maximum count (if known)
message – Optional message from git
- class thoth.ingestion.repo_manager.HandbookRepoManager(repo_url: str = 'https://gitlab.com/gitlab-com/content-sites/handbook.git', clone_path: Path | None = None, logger: Logger | LoggerAdapter | None = None)[source]¶
Bases:
objectManages the GitLab handbook repository.
- __init__(repo_url: str = 'https://gitlab.com/gitlab-com/content-sites/handbook.git', clone_path: Path | None = None, logger: Logger | LoggerAdapter | None = None)[source]¶
Initialize the repository manager.
- Parameters:
repo_url – URL of the GitLab handbook repository
clone_path – Local path to clone/store the repository
logger – Logger instance for logging messages
- logger: Logger | LoggerAdapter¶
- is_valid_repo() bool[source]¶
Check if clone_path contains a valid git repository.
- Returns:
True if valid repo exists, False otherwise
- clone_handbook(force: bool = False, max_retries: int = 3, retry_delay: int = 5, shallow: bool = True) Path[source]¶
Clone the GitLab handbook repository.
- Parameters:
force – If True, remove existing repository and re-clone
max_retries – Maximum number of clone attempts
retry_delay – Delay in seconds between retries
shallow – If True, perform shallow clone (depth=1) for faster cloning. Shallow clones only fetch the latest commit, significantly reducing clone time for large repositories.
- Returns:
Path to the cloned repository
- Raises:
RuntimeError – If repository exists and force=False
GitCommandError – If cloning fails after all retries
- update_repository() bool[source]¶
Update the repository by pulling latest changes.
For shallow clones, this fetches only the latest changes while maintaining the shallow history.
- Returns:
True if update successful, False otherwise
- Raises:
RuntimeError – If repository doesn’t exist
- get_current_commit() str | None[source]¶
Get the current commit SHA of the repository.
- Returns:
Commit SHA as string, or None if error occurs
- Raises:
RuntimeError – If repository doesn’t exist
- save_metadata(commit_sha: str) bool[source]¶
Save repository metadata to a JSON file.
- Parameters:
commit_sha – Current commit SHA to save
- Returns:
True if save successful, False otherwise
- load_metadata() dict[str, Any] | None[source]¶
Load repository metadata from JSON file.
- Returns:
Metadata dictionary with commit_sha, clone_path, repo_url, or None if error
- get_changed_files(since_commit: str) list[str] | None[source]¶
Get list of files changed since a specific commit.
Note: For shallow clones, this may fail if the comparison commit is not in the shallow history. In this case, None is returned and callers should fall back to full processing.
- Parameters:
since_commit – Commit SHA to compare against
- Returns:
List of changed file paths, or None if error occurs
- Raises:
RuntimeError – If repository doesn’t exist
- get_file_changes(since_commit: str) dict[str, list[str]] | None[source]¶
Get categorized file changes since a specific commit.
Note: For shallow clones, this may fail if the comparison commit is not in the shallow history. In this case, None is returned and callers should fall back to full processing.
- Parameters:
since_commit – Commit SHA to compare against
- Returns:
Dictionary with keys ‘added’, ‘modified’, ‘deleted’ containing lists of file paths, or None if error occurs
- Raises:
RuntimeError – If repository doesn’t exist