After we have collected a set of tweets (our dataset), we need labels to perform classification. We are going to label the dataset by setting up a form in a Jupyter Notebook to allow us to enter the labels. We do this by loading the tweets we collected in the previous section, iterating over them and providing (manually) a classification on whether they refer to Python the programming language or not.
The dataset we have stored is nearly, but not quite, in a JSON format. JSON is a format for data that doesn't impose much structure on the contents, just on the syntax. The idea behind JSON is that the data is in a format directly readable in JavaScript (hence the name, JavaScript Object Notation). JSON defines ...