Markdown
thoth.ingestion.parsers.markdown
¶
Markdown document parser.
This module provides parsing for Markdown files with support for YAML frontmatter extraction.
logger = setup_logger(__name__)
module-attribute
¶
MarkdownParser
¶
Parser for Markdown files.
Supports: - Standard Markdown (.md, .markdown, .mdown) - YAML frontmatter extraction - UTF-8 encoding
supported_extensions: list[str]
property
¶
Return supported Markdown extensions.
parse(file_path: Path) -> ParsedDocument
¶
Parse a Markdown file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_path
|
Path
|
Path to the Markdown file |
required |
Returns:
| Type | Description |
|---|---|
ParsedDocument
|
ParsedDocument with content and metadata |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If file doesn't exist |
UnicodeDecodeError
|
If file isn't valid UTF-8 |
parse_content(content: bytes, source_path: str) -> ParsedDocument
¶
Parse Markdown content from bytes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
content
|
bytes
|
Raw file content as bytes |
required |
source_path
|
str
|
Original source path for metadata |
required |
Returns:
| Type | Description |
|---|---|
ParsedDocument
|
ParsedDocument with content and extracted metadata |