Denoising documents with autoencoders

So far, we have applied our denoising autoencoder on the MNIST dataset, which is a pretty simple dataset. Let's take a look now at a more complicated dataset, which better represents the challenges of denoising documents in real life.

The dataset that we will be using is provided for free by the University of California Irvine (UCI). For more information on the dataset, you can visit UCI's website at https://archive.ics.uci.edu/ml/datasets/NoisyOffice.

The dataset can be found in the accompanying GitHub repository for this book. For more information on downloading the code and dataset for this chapter from the GitHub repository, please refer to the Technical requirements section earlier in the chapter. ...

Get Neural Network Projects with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.