thoth.ingestion.parsers.markdown

Markdown document parser.

This module provides parsing for Markdown files with support for YAML frontmatter extraction.

Functions

setup_logger(name[, level, simple, json_output])

Create and configure a logger with structured JSON output.

Classes

DocumentParser()

Abstract base class for document parsers.

MarkdownParser()

Parser for Markdown files.

ParsedDocument(content, metadata, ...)

Result of parsing a document.

Path(*args, **kwargs)

PurePath subclass that can make system calls.

class thoth.ingestion.parsers.markdown.MarkdownParser[source]

Bases: DocumentParser

Parser for Markdown files.

Supports: - Standard Markdown (.md, .markdown, .mdown) - YAML frontmatter extraction - UTF-8 encoding

property supported_extensions: list[str]

Return supported Markdown extensions.

parse(file_path: Path) ParsedDocument[source]

Parse a Markdown file.

Parameters:

file_path – Path to the Markdown file

Returns:

ParsedDocument with content and metadata

Raises:
parse_content(content: bytes, source_path: str) ParsedDocument[source]

Parse Markdown content from bytes.

Parameters:
  • content – Raw file content as bytes

  • source_path – Original source path for metadata

Returns:

ParsedDocument with content and extracted metadata