Skip to content

Markdown

thoth.ingestion.parsers.markdown

Markdown document parser.

This module provides parsing for Markdown files with support for YAML frontmatter extraction.

logger = setup_logger(__name__) module-attribute

MarkdownParser

Parser for Markdown files.

Supports: - Standard Markdown (.md, .markdown, .mdown) - YAML frontmatter extraction - UTF-8 encoding

supported_extensions: list[str] property

Return supported Markdown extensions.

parse(file_path: Path) -> ParsedDocument

Parse a Markdown file.

Parameters:

Name Type Description Default
file_path Path

Path to the Markdown file

required

Returns:

Type Description
ParsedDocument

ParsedDocument with content and metadata

Raises:

Type Description
FileNotFoundError

If file doesn't exist

UnicodeDecodeError

If file isn't valid UTF-8

parse_content(content: bytes, source_path: str) -> ParsedDocument

Parse Markdown content from bytes.

Parameters:

Name Type Description Default
content bytes

Raw file content as bytes

required
source_path str

Original source path for metadata

required

Returns:

Type Description
ParsedDocument

ParsedDocument with content and extracted metadata