Structure your unstructured data first!

The case of summarizing unstructured data with tag clouds

A. Bacchelli    Delft University of Technology, Delft, The Netherlands


Unstructured software data, such as emails and discussions in technical forum, are a rich form of information about software systems. Nevertheless, mining this form of data is hard as it comprises different languages that cannot be processed with the same techniques.

In this chapter, we show how we can summarize unstructured software data by first giving it the structure it needs.


Unstructured software data; Natural language; Source code; Unstructured software data summarization; Tag cloud; Classification; Source code detection approach; Parse

Get Perspectives on Data Science for Software Engineering now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.