O'Reilly logo

Automate it! - Recipes to upskill your business by Chetan Giridhar

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Reading Word documents

As you might be aware, Microsoft Office started providing a new extension to Word documents, which is .docx, from Office 2007 onwards. With this change, documents moved to XML-based file formats (Office Open XML) with ZIP compression. Microsoft made this change when the business community asked for an open file format that could help with transferring data across applications. So, let's begin our journey with DOCX files!

Getting ready

In this recipe, we will use the python-docx module to read Word documents. The python-docx is a comprehensive module that performs both read and write operations on Word documents. Let's install this module with our favorite tool, pip:

pip install python-docx

How to do it...

  1. We start by creating ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required