Let's return to our dataset and train an MLP in Apache Spark to recognize and classify letters from the English alphabet. If you open ocr-data/letter-recognition.data in any text editor, from either the GitHub repository accompanying this book or from UCI's machine learning repository, you will find 20,000 rows of data, described by the following schema:
Column name |
Data type |
Description |
lettr |
String |
English letter (one of 26 values, from A to Z) |
x-box |
Integer |
Horizontal position of box |
y-box |
Integer |
Vertical position of box |
width |
Integer |
Width of box |
high |
Integer |
Height of box |
onpix |
Integer |
Total number of on pixels |
x-bar |
Integer |
Mean x of on pixels in the ... |