Text mining

The N-gram and the orthogonal sparse bigram (OSB) transformations are the main text-mining transformations available in Amazon ML.

In text mining, the classic approach is called the bag-of-words approach. This approach boils down to discarding the order of the word in a given text and only considering the relative frequency of the words in the documents. Although it may seem to be overly simplistic, since the order of the words is essential to understand a message, this approach has given satisfying results in all types of natural language processing problems. A key part of the bag-of-words method, is driven by the need to extract the words from a given text. However, instead of considering single words as the only elements holding ...

