thoth.ingestion.repo_manager

Repository manager for cloning and tracking the GitLab handbook.

Classes

Any(*args, **kwargs)

Special type indicating an unconstrained type.

HandbookRepoManager([repo_url, clone_path, ...])

Manages the GitLab handbook repository.

Path(*args, **kwargs)

PurePath subclass that can make system calls.

Repo(path, odbt, search_parent_directories, ...)

Represents a git repository and allows you to query references, create commit information, generate diffs, create and clone repositories, and query the log.

Exceptions

GitCommandError(command[, status, stderr, ...])

Thrown if execution of the git command fails with non-zero status code.

InvalidGitRepositoryError

Thrown if the given repository appears to have an invalid format.

class thoth.ingestion.repo_manager.HandbookRepoManager(repo_url: str = 'https://gitlab.com/gitlab-com/content-sites/handbook.git', clone_path: Path | None = None, logger: Logger | None = None)[source]

Bases: object

Manages the GitLab handbook repository.

__init__(repo_url: str = 'https://gitlab.com/gitlab-com/content-sites/handbook.git', clone_path: Path | None = None, logger: Logger | None = None)[source]

Initialize the repository manager.

Parameters:
  • repo_url – URL of the GitLab handbook repository

  • clone_path – Local path to clone/store the repository

  • logger – Logger instance for logging messages

clone_handbook(force: bool = False, max_retries: int = 3, retry_delay: int = 5) Path[source]

Clone the GitLab handbook repository.

Parameters:
  • force – If True, remove existing repository and re-clone

  • max_retries – Maximum number of clone attempts

  • retry_delay – Delay in seconds between retries

Returns:

Path to the cloned repository

Raises:
  • RuntimeError – If repository exists and force=False

  • GitCommandError – If cloning fails after all retries

update_repository() bool[source]

Update the repository by pulling latest changes.

Returns:

True if update successful, False otherwise

Raises:

RuntimeError – If repository doesn’t exist

get_current_commit() str | None[source]

Get the current commit SHA of the repository.

Returns:

Commit SHA as string, or None if error occurs

Raises:

RuntimeError – If repository doesn’t exist

save_metadata(commit_sha: str) bool[source]

Save repository metadata to a JSON file.

Parameters:

commit_sha – Current commit SHA to save

Returns:

True if save successful, False otherwise

load_metadata() dict[str, Any] | None[source]

Load repository metadata from JSON file.

Returns:

Metadata dictionary with commit_sha, clone_path, repo_url, or None if error

get_changed_files(since_commit: str) list[str] | None[source]

Get list of files changed since a specific commit.

Parameters:

since_commit – Commit SHA to compare against

Returns:

List of changed file paths, or None if error occurs

Raises:

RuntimeError – If repository doesn’t exist