Releases: microsoft/markitdown
Releases · microsoft/markitdown
v0.0.1a5
What's Changed
- Fixed compatibility with markdownify v1.0.0
New Contributors
Full Changelog: v0.0.1a4...v0.0.1a5
MarkItDown version v0.0.1a4
Some of What's Changed
- feat: Add RSSConverter by @Soulter in #97
- feat: Add IpynbConverter by @AumGupta in #71
- feat(devcontainer): Add DevContainer Configuration for Easier Contribution Setup by @l-lumin in #64
- feat: add support for conversion via Document Intelligence by @KennyZhang1 in #303
- feat: add version option to markitdown CLI by @l-lumin in #172
- feat: enable Git support in devcontainer by @numekudi in #136
- feat: outlook ".msg" file converter by @muratcankurtulus in #196
- feat: Add xls support by @yeungadrian in #169
- feat: support image description with LLM for pptx files by @masquare in #306
- fix: Safeguard against path traversal for ZipConverter by @finchy in #129
- fix: support -o param to avoid encoding issues by @Soulter in #116
- fix(transcription): TRANSCRIPTION_CAPABLE should be iniztialized by @absadiki in #194
- fix: added a test for leading spaces. by @afourney in #258
- fix: If puremagic has no guesses, try again after ltrim. by @afourney in #260
- fix: Recognize json as plain text (if no other handlers are present). by @afourney in #261
- fix: Set exiftool path explicitly. by @afourney in #267
- fix: remove leading and trailing \n for HtmlConverter by @ZeyuTeng96 in #262
- fix: argparse CLI option ordering, fixes #268 by @slhck in #290
- fix: for mimetype issue with csv files on windows. by @wunde005 in #273
- docs: update README.md by @eltociear in #182
- docs: Add documentation for docintel by @KennyZhang1 in #312
New Contributors
- @AumGupta made their first contribution in #71
- @diya155 made their first contribution in #80
- @l-lumin made their first contribution in #64
- @waterimp made their first contribution in #98
- @finchy made their first contribution in #129
- @sugatoray made their first contribution in #130
- @PetrAPConsulting made their first contribution in #91
- @SigireddyBalasai made their first contribution in #93
- @dependabot made their first contribution in #177
- @numekudi made their first contribution in #136
- @eltociear made their first contribution in #182
- @absadiki made their first contribution in #194
- @muratcankurtulus made their first contribution in #196
- @yeungadrian made their first contribution in #169
- @KennyZhang1 made their first contribution in #303
- @ZeyuTeng96 made their first contribution in #262
- @jamesmh made their first contribution in #270
- @masquare made their first contribution in #306
- @slhck made their first contribution in #290
- @wunde005 made their first contribution in #273
Full Changelog: v0.0.1a3...v0.0.1a4
v0.0.1a3
New Features and Formats
- Add zip handling by @Josh-XT in #22
- Add PPTX chart support by @nyosegawa in #33
Breaking Changes
Renamed mlm_client
and mlm_model
arguments to llm_client
and llm_model
, and added appropriate deprecation warnings.
See:
- Fix LLM terminology in code by @CharlesCNorton in #73
- Fix LLM terms by @CharlesCNorton in #72
- Added deprecation warnings for mlm_* arguments. by @afourney in #101
Bug fixes and enhancements
- Remove invalid classifiers by @simonw in #10
- Add installation instructions from haesleinhuepf:patch-1 by @gagb in #27
- Update README.md by @gagb in #28
- Improve the readme with contributing guidelines by @gagb in #7
- Add installation instructions by @haesleinhuepf in #24
- Update README.md by @pawarbi in #26
- Update README.md by @gagb in #29
- CLI usage instructions by @simonw in #11
- Fix character decoding issues with text-like files by @brc-dd in #19
- Catching pydub's warning of ffmpeg or avconv missing by @SH4DOW4RE in #39
- Exclude test files from language statistics using linguist-vendored by @Y-Kim-64 in #44
- Support specifying YouTube transcript language by @narumiruna in #50
- Add passing style_map kwarg to Mammoth when converting docx to allow keeping comments by @VillePuuska in #38
- Fix: pass the kwargs to _convert method when converting an url file by @Soulter in #48
- Added Dockerfile by @madduci in #60
- fix issue #65 by @DIMAX99 in #67
- Cybernobie/main by @gagb in #75
- Ensure hatch is installed before running tests by @cybernobie in #63
- Kevinclb/main by @gagb in #77
- feature: add argument parsing for cli tool capability by @kevinclb in #46
- Added llm tests to the local test set. by @afourney in #100
New Contributors
- @simonw made their first contribution in #10
- @gagb made their first contribution in #27
- @haesleinhuepf made their first contribution in #24
- @pawarbi made their first contribution in #26
- @brc-dd made their first contribution in #19
- @Josh-XT made their first contribution in #22
- @nyosegawa made their first contribution in #33
- @VillePuuska made their first contribution in #38
- @SH4DOW4RE made their first contribution in #39
- @Y-Kim-64 made their first contribution in #44
- @Soulter made their first contribution in #48
- @narumiruna made their first contribution in #50
- @madduci made their first contribution in #60
- @CharlesCNorton made their first contribution in #73
- @DIMAX99 made their first contribution in #67
- @cybernobie made their first contribution in #63
- @kevinclb made their first contribution in #46
Full Changelog: v0.0.1a2...v0.0.1a3
v0.0.1a2
Initial Release of markitdown
The MarkItDown library is a utility tool for converting various files to Markdown (e.g., for indexing, text analysis, etc.)
It presently supports:
- PDF (.pdf)
- PowerPoint (.pptx)
- Word (.docx)
- Excel (.xlsx)
- Images (EXIF metadata, and OCR)
- Audio (EXIF metadata, and speech transcription)
- HTML (special handling of Wikipedia, etc.)
- Various other text-based formats (csv, json, xml, etc.)
The API is simple:
from markitdown import MarkItDown
markitdown = MarkItDown()
result = markitdown.convert("test.xlsx")
print(result.text_content)