3Text Generation & Classification in NLP: A Review

Kuldeep Vayadande¹^*, Dattatray Raghunath Kale², Jagannath Nalavade², R. Kumar³ and Hanmant D. Magar⁴

¹Vishwakarma Institute of Technology, Pune, Maharashtra, India

²MIT Art Design and Technology University, Maharashtra, Pune, India

³VIT-AP University, Inavolu, Beside AP Secretariat, Amaravati AP, India

⁴Vishwakarma Institute of Information Technology, Kondhawa, Pune, Maharashtra, India

Abstract

The initial stage in natural language processing is to break down the text into separate tokens. When the text corpus is huge, covering all words is inefficient regarding size of vocabulary. The effectiveness of a specific tokenization method varies on various factors, such as size of the dataset, the nature of the task, and the morphological complexity of the dataset. By comparing the algorithms, it can be concluded that no tokenization technique is the best choice. In this survey, various applications are being surveyed and the comparison of these various algorithms is done by estimating them on classification tasks like sentiment analysis. Question answering and translation applications use the available datasets. This survey paper also shows the tokenization based on the noisy text data and how various tokenization algorithm works on these data are being compared, and what is the average number of segmented subword accuracy being discussed. Basically, sentiment analysis studies the information in an expression and classifies them ...

Get How Machine Learning is Innovating Today's World now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

How Machine Learning is Innovating Today's World by Arindam Dey, Sukanta Nayak, Ranjan Kumar, Sachi Nandan Mohanty

3Text Generation & Classification in NLP: A Review

Abstract

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly