Skip to Content
图解大模型 : 生成式AI 原理与实战
book

图解大模型 : 生成式AI 原理与实战

by Jay Alammar, Maarten Grootendorst
May 2025
Intermediate to advanced
382 pages
10h 33m
Chinese
Posts & Telecom Press
Content preview from 图解大模型 : 生成式AI 原理与实战
文本聚类和主题建模
127
你可以把这种模块化看作乐高积木:处理流程的每个部分都可以完全被另一个类似的算法
替换。通过这种模块化的方式,新发布的模型可以被整合到其架构中。随着语言人工智能
领域的发展,
BERTopic
也在不断成长!
BERTopic
的模块化特性
BERTopic
的模块化设计还有另一个优势:它可以在使用同一个基础模型的前提下,根
据不同的使用场景灵活调整。例如,
BERTopic
支持多种算法变体:
引导式主题建模
(半)监督主题建模
层次化主题建模
动态主题建模
多模态主题建模
多视角主题建模
在线和增量主题建模
零样本主题建模
……
模块化和算法的灵活性是作者将
BERTopic
打造成主题建模一站式解决方案的基础。
你可以在
BERTopic
的官方文档或
GitHub
仓库中找到其完整功能概述。
要在我们的
ArXiv
数据集上运行
BERTopic
,可以使用之前定义的模型和嵌入向量(虽然
这不是必需的):
from bertopic import BERTopic
# 使用之前定义的模型训练我们的模型
topic_model = BERTopic
(
embedding_model=embedding_model
,
umap_model=umap_model
,
hdbscan_model=hdbscan_model
,
verbose=True
)
.fit
(
abstracts
,
embeddings
)
让我们先来探索一下创建的主题。
get_topic_info()
方法可以帮助我们快速了解所发现的 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

大模型应用开发极简入门 : 基于GPT-4 和ChatGPT(第2版)

大模型应用开发极简入门 : 基于GPT-4 和ChatGPT(第2版)

Olivier Caelen, Marie-Alice Blete
生成式人工智能可视化

生成式人工智能可视化

Priyanka Vergadia, Valliappa Lakshmanan

Publisher Resources

ISBN: 9787115670830