Reading Word documents

As you might be aware, Microsoft Office started providing a new extension to Word documents, which is .docx, from Office 2007 onwards. With this change, documents moved to XML-based file formats (Office Open XML) with ZIP compression. Microsoft made this change when the business community asked for an open file format that could help with transferring data across applications. So, let's begin our journey with DOCX files!

Getting ready

In this recipe, we will use the python-docx module to read Word documents. The python-docx is a comprehensive module that performs both read and write operations on Word documents. Let's install this module with our favorite tool, pip:

pip install python-docx

How to do it...

  1. We start by creating ...

Get Automate it! - Recipes to upskill your business now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.