Repo manager
thoth.ingestion.repo_manager
¶
Repository manager for cloning and tracking the GitLab handbook.
DEFAULT_REPO_URL = 'https://gitlab.com/gitlab-com/content-sites/handbook.git'
module-attribute
¶
DEFAULT_CLONE_PATH = Path.home() / '.thoth' / 'handbook'
module-attribute
¶
METADATA_FILE = 'repo_metadata.json'
module-attribute
¶
MSG_REPO_EXISTS = 'Repository already exists at {path}. Use force=True to re-clone.'
module-attribute
¶
MSG_CLONE_FAILED = 'Failed to clone repository after {attempts} attempts'
module-attribute
¶
MSG_UPDATE_FAILED = 'Failed to update repository'
module-attribute
¶
MSG_NO_REPO = 'No repository found at {path}. Clone the repository first.'
module-attribute
¶
MSG_METADATA_SAVE_FAILED = 'Failed to save metadata'
module-attribute
¶
MSG_METADATA_LOAD_FAILED = 'Failed to load metadata'
module-attribute
¶
MSG_DIFF_FAILED = 'Failed to get changed files'
module-attribute
¶
CloneProgress
¶
Progress handler for git clone operations.
Logs progress updates during clone/fetch operations to provide visibility into long-running git operations.
OP_NAMES: dict[int, str] = {RemoteProgress.COUNTING: 'Counting objects', RemoteProgress.COMPRESSING: 'Compressing objects', RemoteProgress.WRITING: 'Writing objects', RemoteProgress.RECEIVING: 'Receiving objects', RemoteProgress.RESOLVING: 'Resolving deltas', RemoteProgress.FINDING_SOURCES: 'Finding sources', RemoteProgress.CHECKING_OUT: 'Checking out files'}
class-attribute
¶
logger = logger
instance-attribute
¶
__init__(logger: logging.Logger | logging.LoggerAdapter) -> None
¶
Initialize the progress handler.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
logger
|
Logger | LoggerAdapter
|
Logger instance for progress messages |
required |
update(op_code: int, cur_count: str | float, max_count: str | float | None = None, message: str = '') -> None
¶
Called for each progress update from git.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
op_code
|
int
|
Operation code indicating the current stage |
required |
cur_count
|
str | float
|
Current progress count |
required |
max_count
|
str | float | None
|
Maximum count (if known) |
None
|
message
|
str
|
Optional message from git |
''
|
HandbookRepoManager
¶
Manages the GitLab handbook repository.
repo_url = repo_url
instance-attribute
¶
clone_path = clone_path or DEFAULT_CLONE_PATH
instance-attribute
¶
metadata_path = self.clone_path.parent / METADATA_FILE
instance-attribute
¶
logger: logging.Logger | logging.LoggerAdapter = logger or setup_logger(__name__)
instance-attribute
¶
__init__(repo_url: str = DEFAULT_REPO_URL, clone_path: Path | None = None, logger: logging.Logger | logging.LoggerAdapter | None = None)
¶
Initialize the repository manager.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
repo_url
|
str
|
URL of the GitLab handbook repository |
DEFAULT_REPO_URL
|
clone_path
|
Path | None
|
Local path to clone/store the repository |
None
|
logger
|
Logger | LoggerAdapter | None
|
Logger instance for logging messages |
None
|
is_valid_repo() -> bool
¶
Check if clone_path contains a valid git repository.
Returns:
| Type | Description |
|---|---|
bool
|
True if valid repo exists, False otherwise |
clone_handbook(force: bool = False, max_retries: int = 3, retry_delay: int = 5, shallow: bool = True) -> Path
¶
Clone the GitLab handbook repository.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
force
|
bool
|
If True, remove existing repository and re-clone |
False
|
max_retries
|
int
|
Maximum number of clone attempts |
3
|
retry_delay
|
int
|
Delay in seconds between retries |
5
|
shallow
|
bool
|
If True, perform shallow clone (depth=1) for faster cloning. Shallow clones only fetch the latest commit, significantly reducing clone time for large repositories. |
True
|
Returns:
| Type | Description |
|---|---|
Path
|
Path to the cloned repository |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If repository exists and force=False |
GitCommandError
|
If cloning fails after all retries |
update_repository() -> bool
¶
Update the repository by pulling latest changes.
For shallow clones, this fetches only the latest changes while maintaining the shallow history.
Returns:
| Type | Description |
|---|---|
bool
|
True if update successful, False otherwise |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If repository doesn't exist |
get_current_commit() -> str | None
¶
Get the current commit SHA of the repository.
Returns:
| Type | Description |
|---|---|
str | None
|
Commit SHA as string, or None if error occurs |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If repository doesn't exist |
save_metadata(commit_sha: str) -> bool
¶
Save repository metadata to a JSON file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
commit_sha
|
str
|
Current commit SHA to save |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if save successful, False otherwise |
load_metadata() -> dict[str, Any] | None
¶
Load repository metadata from JSON file.
Returns:
| Type | Description |
|---|---|
dict[str, Any] | None
|
Metadata dictionary with commit_sha, clone_path, repo_url, or None if error |
get_changed_files(since_commit: str) -> list[str] | None
¶
Get list of files changed since a specific commit.
Note: For shallow clones, this may fail if the comparison commit is not in the shallow history. In this case, None is returned and callers should fall back to full processing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
since_commit
|
str
|
Commit SHA to compare against |
required |
Returns:
| Type | Description |
|---|---|
list[str] | None
|
List of changed file paths, or None if error occurs |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If repository doesn't exist |
get_file_changes(since_commit: str) -> dict[str, list[str]] | None
¶
Get categorized file changes since a specific commit.
Note: For shallow clones, this may fail if the comparison commit is not in the shallow history. In this case, None is returned and callers should fall back to full processing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
since_commit
|
str
|
Commit SHA to compare against |
required |
Returns:
| Type | Description |
|---|---|
dict[str, list[str]] | None
|
Dictionary with keys 'added', 'modified', 'deleted' containing |
dict[str, list[str]] | None
|
lists of file paths, or None if error occurs |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If repository doesn't exist |