14

LDA and BERTopic

Since the Transformer model came to the NLP stage in 2017 in the seminar paper Attention Is All You Need [1], many Transformer-based large language models (LLMs) such as BERT (Bidirectional Encoder Representations from Transformers) [3], ChatGPT, and GPT-4 [4] have seized the technology headlines. The word embeddings by these LLMs can discover more latent semantic relationships between words and documents than those by pre-LLM techniques such as BoW, TF-IDF, or Word2Vec.

The semantic relationships between words and documents naturally extend to document grouping, which is the aim of topic modeling that clusters documents into homogeneous document groups. Can we take advantage of the word embeddings of LLMs for topic modeling? ...

Get The Handbook of NLP with Gensim now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.