As you might be aware, Microsoft Office started providing a new extension to Word documents, which is
.docx, from Office 2007 onwards. With this change, documents moved to XML-based file formats (Office Open XML) with ZIP compression. Microsoft made this change when the business community asked for an open file format that could help with transferring data across applications. So, let's begin our journey with DOCX files!
In this recipe, we will use the
python-docx module to read Word documents. The
python-docx is a comprehensive module that performs both read and write operations on Word documents. Let's install this module with our favorite tool,
pip install python-docx