thoth.ingestion.parsers.markdown¶
Markdown document parser.
This module provides parsing for Markdown files with support for YAML frontmatter extraction.
Functions
|
Create and configure a logger with structured JSON output. |
Classes
|
Abstract base class for document parsers. |
Parser for Markdown files. |
|
|
Result of parsing a document. |
|
PurePath subclass that can make system calls. |
- class thoth.ingestion.parsers.markdown.MarkdownParser[source]¶
Bases:
DocumentParserParser for Markdown files.
Supports: - Standard Markdown (.md, .markdown, .mdown) - YAML frontmatter extraction - UTF-8 encoding
- parse(file_path: Path) ParsedDocument[source]¶
Parse a Markdown file.
- Parameters:
file_path – Path to the Markdown file
- Returns:
ParsedDocument with content and metadata
- Raises:
FileNotFoundError – If file doesn’t exist
UnicodeDecodeError – If file isn’t valid UTF-8
- parse_content(content: bytes, source_path: str) ParsedDocument[source]¶
Parse Markdown content from bytes.
- Parameters:
content – Raw file content as bytes
source_path – Original source path for metadata
- Returns:
ParsedDocument with content and extracted metadata