Skip to Content
大型语言模型的隐私与安全 (Chinese Edition)
book

大型语言模型的隐私与安全 (Chinese Edition)

by Baihan Lin
January 2026
Beginner to intermediate
318 pages
3h 38m
Chinese
O'Reilly Media, Inc.
Content preview from 大型语言模型的隐私与安全 (Chinese Edition)

第4章 隐私保护 训练技术

本作品已使用人工智能进行翻译。欢迎您提供反馈和意见:translation-feedback@oreilly.com

在我们迄今的探索中,您已掌握创建LLMs的方法,并学会如何从隐私与安全角度评估其健康状态。接下来,您将学习如何通过在模型中直接构建保护机制来维护这些人工智能伙伴的健康。本章将探讨一类技术,使人工智能能在处理敏感信息时保持信息隐蔽性。

隐私保护方法是人工智能发展的关键前沿领域,尤其当LLMs日益处理个人、医疗、金融等敏感信息时。这些方法使模型能够从数据中提取有价值的模式和洞察,同时不损害个体记录或示例的保密性。其运作原理在于创建数学保证和密码学保护机制,限制从训练模型中可提取或推断的信息范围。

本章将探讨若干核心技术,使人工智能系统能在强隐私保护下学习敏感信息。这些方法融合了机器学习、密码学与隐私理论,构建出能够分析原始数据却无法完全"窥见"其完整形态的系统。

我们将涵盖五大类隐私保护技术:差分隐私、联合学习、同态加密、多方计算以及隐私保护数据转换。此外,您还将探索现代参数高效的微调方法——通过限制可训练参数数量来降低隐私风险。

训练阶段隐私泄露的真实案例 在深入探讨解决方案之前,让我们先理解隐私保护训练技术为何至关重要。

在深入探讨解决方案前,让我们先理解隐私保护训练技术为何至关重要。 假设您是医生,正训练AI辅助诊断罕见疾病。输入数千份病历后,神奇的医疗AI诞生了!但且慢——若有人能从该AI中提取个体患者信息呢?这不仅令人尴尬,更严重违反医疗伦理与隐私法规(详见第八章)。

本节将首先通过逻辑回归模型作为基础案例,引出基于模型的隐私泄露问题。随后我们将引入更贴近现实的Transformer架构,模拟复杂模型环境,这更符合现代LLM的应用场景。

让我们通过简化案例了解此类泄露可能发生的途径:

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Simulated patient data (age, blood pressure, cholesterol, diagnosis)
np.random.seed(42)
data = np.random.rand(1000, 3)
labels = (data[:,0] + data[:,1] + data[:,2] > 1.5).astype(int)

# Add some "unique" patients
unique_patients = np.array([
    [0.1, 0.1, 0.1, 0],  # Alice
    [0.9, 0.9, 0.9, 1],  # Bob
])
data = np.vstack([data, unique_patients[:,:3]])
labels = np.concatenate([labels, unique_patients[:,3]])

# Train a simple model
X_train, X_test, y_train, y_test = train_test_split(data, labels, test_size=0.2)
model = LogisticRegression()
model.fit(X_train ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

产品思维工程师 (Chinese Edition)

产品思维工程师 (Chinese Edition)

Drew Hoskins

Publisher Resources

ISBN: 0642572313869