GitHub - microsoft/markitdown: Python tool for converting files and office documents to Markdown.
MarkItDown
The MarkItDown library is a utility tool for converting various files to Markdown (e.g., for indexing, text analysis, etc.)
It presently supports:
PDF (.pdf)
PowerPoint (.pptx)
Word (.docx)
Excel (.xlsx)
Images (EXIF metadata, and OCR)
Audio (EXIF metadata, and speech transcription)
HTML (special handling of Wikipedia, etc.)
Various other text-based formats (csv, json, xml, etc.)
The API is simple:
from markitdown import MarkItDown
markitdown = MarkItDown()
result = markitdown.convert...
Read more at github.com