
365
10
Mining Unstructured Data
As seen in Chapter 5, software repositories can be mined to assess the data stored over
a long period of time. Most of the previous chapters focused on techniques that can be
applied on structured data. However, in addition to structured data, these repositories
contain large amount of data present in unstructured form such as the natural language
text in the form of bug reports, mailing list archives, requirements documents, source code
comments, and a number of identier names. Manually analyzing such large amount of
data is very time consuming and practically impossible. Hence, text mining techniques are
requ ...