Garbage in, garbage out (or GIGO) is an adage of computer science which is even more important when dealing with machine learning and possibly even more so when dealing with textual data. Garbage in, garbage out means that if we have poorly formatted data, it is likely we will have poor results.
![](/api/v2/epubs/9781788838535/files/assets/2f80aca9-1de1-420d-8633-eb6f7e987398.jpg)
While more data usually leads to a better prediction, it isn't always the same case with text analysis, where more data can result in nonsense results or results which we don't always want. An intuitive example: the part of speech, articles, such as ...