Releases · microsoft/markitdown

feat: Add RSSConverter by @Soulter in #97
feat: Add IpynbConverter by @AumGupta in #71
feat(devcontainer): Add DevContainer Configuration for Easier Contribution Setup by @l-lumin in #64
feat: add support for conversion via Document Intelligence by @KennyZhang1 in #303
feat: add version option to markitdown CLI by @l-lumin in #172
feat: enable Git support in devcontainer by @numekudi in #136
feat: outlook ".msg" file converter by @muratcankurtulus in #196
feat: Add xls support by @yeungadrian in #169
feat: support image description with LLM for pptx files by @masquare in #306
fix: Safeguard against path traversal for ZipConverter by @finchy in #129
fix: support -o param to avoid encoding issues by @Soulter in #116
fix(transcription): TRANSCRIPTION_CAPABLE should be iniztialized by @absadiki in #194
fix: added a test for leading spaces. by @afourney in #258
fix: If puremagic has no guesses, try again after ltrim. by @afourney in #260
fix: Recognize json as plain text (if no other handlers are present). by @afourney in #261
fix: Set exiftool path explicitly. by @afourney in #267
fix: remove leading and trailing \n for HtmlConverter by @ZeyuTeng96 in #262
fix: argparse CLI option ordering, fixes #268 by @slhck in #290
fix: for mimetype issue with csv files on windows. by @wunde005 in #273
docs: update README.md by @eltociear in #182
docs: Add documentation for docintel by @KennyZhang1 in #312

New Contributors

@AumGupta made their first contribution in #71
@diya155 made their first contribution in #80
@l-lumin made their first contribution in #64
@waterimp made their first contribution in #98
@finchy made their first contribution in #129
@sugatoray made their first contribution in #130
@PetrAPConsulting made their first contribution in #91
@SigireddyBalasai made their first contribution in #93
@dependabot made their first contribution in #177
@numekudi made their first contribution in #136
@eltociear made their first contribution in #182
@absadiki made their first contribution in #194
@muratcankurtulus made their first contribution in #196
@yeungadrian made their first contribution in #169
@KennyZhang1 made their first contribution in #303
@ZeyuTeng96 made their first contribution in #262
@jamesmh made their first contribution in #270
@masquare made their first contribution in #306
@slhck made their first contribution in #290
@wunde005 made their first contribution in #273

Full Changelog: v0.0.1a3...v0.0.1a4

Contributors

slhck, finchy, and 20 other contributors

Assets 2

17 Dec 22:31

afourney

v0.0.1a3

3ce21a4

v0.0.1a3 Pre-release

Pre-release

New Features and Formats

Add zip handling by @Josh-XT in #22
Add PPTX chart support by @nyosegawa in #33

Breaking Changes

Renamed mlm_client and mlm_model arguments to llm_client and llm_model, and added appropriate deprecation warnings.

See:

Fix LLM terminology in code by @CharlesCNorton in #73
Fix LLM terms by @CharlesCNorton in #72
Added deprecation warnings for mlm_* arguments. by @afourney in #101

Bug fixes and enhancements

Remove invalid classifiers by @simonw in #10
Add installation instructions from haesleinhuepf:patch-1 by @gagb in #27
Update README.md by @gagb in #28
Improve the readme with contributing guidelines by @gagb in #7
Add installation instructions by @haesleinhuepf in #24
Update README.md by @pawarbi in #26
Update README.md by @gagb in #29
CLI usage instructions by @simonw in #11
Fix character decoding issues with text-like files by @brc-dd in #19
Catching pydub's warning of ffmpeg or avconv missing by @SH4DOW4RE in #39
Exclude test files from language statistics using linguist-vendored by @Y-Kim-64 in #44
Support specifying YouTube transcript language by @narumiruna in #50
Add passing style_map kwarg to Mammoth when converting docx to allow keeping comments by @VillePuuska in #38
Fix: pass the kwargs to _convert method when converting an url file by @Soulter in #48
Added Dockerfile by @madduci in #60
fix issue #65 by @DIMAX99 in #67
Cybernobie/main by @gagb in #75
Ensure hatch is installed before running tests by @cybernobie in #63
Kevinclb/main by @gagb in #77
feature: add argument parsing for cli tool capability by @kevinclb in #46
Added llm tests to the local test set. by @afourney in #100

New Contributors

@simonw made their first contribution in #10
@gagb made their first contribution in #27
@haesleinhuepf made their first contribution in #24
@pawarbi made their first contribution in #26
@brc-dd made their first contribution in #19
@Josh-XT made their first contribution in #22
@nyosegawa made their first contribution in #33
@VillePuuska made their first contribution in #38
@SH4DOW4RE made their first contribution in #39
@Y-Kim-64 made their first contribution in #44
@Soulter made their first contribution in #48
@narumiruna made their first contribution in #50
@madduci made their first contribution in #60
@CharlesCNorton made their first contribution in #73
@DIMAX99 made their first contribution in #67
@cybernobie made their first contribution in #63
@kevinclb made their first contribution in #46

Full Changelog: v0.0.1a2...v0.0.1a3

Contributors

simonw, madduci, and 16 other contributors

Assets 2

17 Dec 22:17

afourney

v0.0.1a2

b401396

v0.0.1a2 Pre-release

Pre-release

Initial Release of markitdown

The MarkItDown library is a utility tool for converting various files to Markdown (e.g., for indexing, text analysis, etc.)

It presently supports:

PDF (.pdf)
PowerPoint (.pptx)
Word (.docx)
Excel (.xlsx)
Images (EXIF metadata, and OCR)
Audio (EXIF metadata, and speech transcription)
HTML (special handling of Wikipedia, etc.)
Various other text-based formats (csv, json, xml, etc.)

The API is simple:

from markitdown import MarkItDown

markitdown = MarkItDown()
result = markitdown.convert("test.xlsx")
print(result.text_content)

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

New Contributors

Contributors

Some of What's Changed

New Contributors

Contributors

New Features and Formats

Breaking Changes

Bug fixes and enhancements

New Contributors

Contributors

Initial Release of markitdown

Releases: microsoft/markitdown

v0.0.1a5

What's Changed

New Contributors

Contributors

MarkItDown version v0.0.1a4

Some of What's Changed

New Contributors

Contributors

v0.0.1a3

New Features and Formats

Breaking Changes

Bug fixes and enhancements

New Contributors

Contributors

v0.0.1a2

Initial Release of markitdown