thoth.ingestion.parsers.text¶
Plain text document parser.
This module provides parsing for plain text files.
Functions
|
Create and configure a logger with structured JSON output. |
Classes
|
Abstract base class for document parsers. |
|
Result of parsing a document. |
|
PurePath subclass that can make system calls. |
Parser for plain text files. |
- class thoth.ingestion.parsers.text.TextParser[source]¶
Bases:
DocumentParserParser for plain text files.
Supports: - Plain text files (.txt, .text) - UTF-8 encoding with fallback to latin-1
- parse(file_path: Path) ParsedDocument[source]¶
Parse a plain text file.
- Parameters:
file_path – Path to the text file
- Returns:
ParsedDocument with content
- Raises:
FileNotFoundError – If file doesn’t exist
- parse_content(content: bytes, source_path: str) ParsedDocument[source]¶
Parse text content from bytes.
- Parameters:
content – Raw file content as bytes
source_path – Original source path for metadata
- Returns:
ParsedDocument with content