book

DeepSeek in Practice

Name: DeepSeek in Practice
ISBN: 9781806020850

by Andy Peng, Alex Strick van Linschoten, Duarte O.Carmo

November 2025

Intermediate to advanced

478 pages

11h 23m

English

Packt Publishing

Read now

Unlock full access

Preface
Who this book is forWhat this book coversTo get the most out of this bookDeclarationGet in touchFree Benefits with Your BookHow to Unlock Stay tunedJoin our communities on Discord and Reddit
Understanding and Exploring DeepSeek
Stay tunedJoin our communities on Discord and Reddit
What Is DeepSeek?
Introducing DeepSeekUnderstanding the technical breakthroughs of DeepSeekThe training processReinforcement learningArchitecture modificationsComparison of major LLM architectures (2025)MoE architectureHow DeepSeek implements MoETraining dataset and philosophyVersions and evolution of DeepSeekDeepSeek’s evolving ecosystemDeep dive: Feature comparison of each modelDeepSeek product ecosystemPlatform integrations and deployment ecosystemDeepSeek’s impact on the global AI ecosystemMarket disruption and price warsSparking the next wave of open source AIA symbol of the technological maturity of ChinaControversies surrounding DeepSeekPotential for bias and misuseOpenness and national security concernsCultural and political perceptionsAcademic and industrial pushbackEthical debates around RL methodsDiscourse as a feature, not a flawSummaryGet This Book’s PDF Version and Exclusive Extras
Deep Dive into DeepSeek
Key architectural components of DeepSeekPrompt routing in DeepSeekDecoderInternal mechanics of the decoder moduleWorking exampleMixture of expertsHow are experts initialized and configured?How does gating compute which experts to activate?How are experts activated?Multi-head latent attention: Memory-efficient attention for long contextsDeepSeek features for efficient and context-aware responsesFP8 training and precision controlMulti-token prediction for strategic decodingGRPO and advanced RLHFChain-of-thought reasoning: CoT and long CoTChoice of datasetsTest-time scalingUnderstanding the reasoning mechanics of DeepSeekDeepSeek’s thinkingThe think and answer blocksEvaluating response quality with GRPORule-based RLHFHow DeepSeek handles complex scenariosDeepSeek trainingCold start data for trainingThe training pipeline of R1Distillation and distributionEmergent patterns in self-teachingAdvanced capabilities of DeepSeekVision capabilities of DeepSeekArchitectural changes for vision-language integrationAgentic reasoning and tool integrationDeepSeek in the global LLM landscapeDeepSeek’s limitations and how they compare to other modelsScaling challenges and sparse expertise limitationsInference latency and real-time interaction trade-offsInterpretability and alignment risksContext length and compression ceilingDataset and cultural scope gapsOutlook on limitationsSummaryGet This Book’s PDF Version and Exclusive Extras
Prompting DeepSeek
Technical requirementsCore mental models and principles of DeepSeekWhy structured, minimal prompts “click” with the R series (what’s happening under the hood)Why don’t other LLMs behave like this?General tips and advice for prompting DeepSeekThe few-shot fallacySystem promptsThe verbose prompt trapOther factors impacting DeepSeek’s response to promptsAdvanced techniques and tooling for structured outputNative JSON modeFunction callingType-enforced generation with Pydantic (via Instructor)Using Instructor for type safetyStrategies for robustness and special casesV series: unique prompting techniques for V-series modelsThe template tango: Unicode characters and special tokensThe formatting fiesta: When Markdown goes wildThe context window confusionHidden superpowers: Features the R series doesn’t haveTroubleshootingPrompt migration guideSummaryGet This Book’s PDF Version and Exclusive Extras
Part 2: Using DeepSeek
Stay tunedJoin our communities on Discord and Reddit
Using DeepSeek: Case Studies
Technical requirementsSetting up your development environmentConfiguring DeepSeek in the Cursor IDEOrganizing your development workspaceConfiguration managementBenchmarking tools setup and prompt designUse case study: Document understandingPrompting DeepSeekResponse evaluationEvaluation methodology for technical analysisFollow-up code generation requestFollow-up response evaluationEvaluation methodology and metrics selectionRecalibration through iterative promptingPractice exerciseUse case study: Financial document analysis and benchmarkingTest document creation and benchmarking setupPrompting DeepSeek for financial document extractionResponse evaluationMetric 1: Field extraction accuracyMetric 2: Table parsing accuracyMetric 3: Entity recognition accuracyMetric 4: Structure preservation scoreMetric 5: Processing timeMetric 6: Cost per documentTest data analysisComputing the final scoreBenchmark comparison: From document to accuracy scoresUnderstanding DeepSeek-R1’s extraction errorsComparing the three approachesTesting Docling and MarkItDownComputing benchmark metricsQuantitative comparison across six metricsInterpreting the resultsUnderstanding tool capabilitiesChoosing the right tool for your use caseHybrid approach: Best practicesRecommended workflow architectureCost-benefit analysis and economic justificationSummaryGet This Book’s PDF Version and Exclusive Extras
Building with DeepSeek
Technical requirementsBuilding the first prototypeFetching our dataCreating the contextDefining the structured outputCreating the Daily Health SummaryRefactoring into an APIDeploying with DockerInteracting with DeepSeek modelsLiteLLMRunning locally with OllamaCPU-based inference with Transformers and XGrammarRefactoring for local generation on the CPUDeploying an isolated model service with AWSInference backendsDeploying DeepSeek with LMI containersUpdating our service to use Amazon SageMaker endpointsBest practices and recommendationsSummaryGet This Book’s PDF Version and Exclusive Extras
Agents with DeepSeek
Technical requirementsA gentle introduction to agentsToolsUnderstanding the Model Context ProtocolWorking with agents and workflowsExploring various agentic systemsWorkflow: Evaluator-optimizerAn example: Summarizing arXiv papersWorkflow: Orchestrator-workersAn example: A report-generating workflowAgent: Tool-calling agentAn example: A web search agentAn important note about evaluating agentsSummaryGet This Book’s PDF Version and Exclusive Extras
Part 3: Distilling and Deploying DeepSeek
Stay tunedJoin our communities on Discord and Reddit

DeepSeek-Driven Fine-Tuning of Gemma 3 for Legal Reasoning
Technical requirementsDependenciesCreating your local environment with ZenMLCreating your ZenML Cloud accountAPI keys and environment variablesEnhanced CUAD datasetFine-tuning without ZenML (standalone script)Optional standalone script to fine-tune without ZenMLUnderstanding the importance of distillation and fine-tuningUse case and datasetThe multi-label extraction problem in legal textsIntroducing CUAD: A structured benchmark for legal clause classificationExtending CUAD: Why we add rationalesOverview of the distillation fine-tuning process with CUAD and enhanced CUAD datasetsLLMOps tools for model distillationThe two-stage workflow for legal rationale distillationStage 1: DistillationZenML pipeline data processingThe NONE label decisionInstructional format for fine-tuningStage 2: Fine-tuning Gemma 3 on CUADThe fine-tuning processTraining dynamics of legal AI learningEvaluation and resultsPerformance metricsEvaluation resultsError analysis: Understanding model limitationsPerformance optimization potentialKey takeawaysWhen does distillation and fine-tuning make sense?SummaryGet This Book’s PDF Version and Exclusive Extras
Deploying DeepSeek Models
Technical requirementsThe DeepSeek deployment landscapeDeepSeek’s deployment quirksWhy self-deploy and what makes DeepSeek unique?A decision-making framework for choosing your deployment strategyCost sanity checkThree paths to deploymentPath 0: The baseline (official DeepSeek API)Path 1: Local and on-premises (for dev and specialized cases)Path 2: Managed inference services (the balanced approach)Path 3: DIY on IaaS (for maximum control)Hardware and inference optimization engines for deploymentChoosing your hardwareInference enginesThe power of quantizationHands-on deployment guidesExample 1: Local deployment with Ollama and a quantized DeepSeek Coder modelPrerequisites and performance expectationsWhat this teachesExample 2: Managed deployment on Amazon BedrockWhat this teachesExample 3: Deployment of DeepSeek V3 to the cloud using Hugging Face Inference EndpointsWhat this teachesProduction operations and monitoringMonitoring and observabilityScaling and performanceCost managementSecurity in practiceCI/CD for modelsYour deployment playbookSummaryGet This Book’s PDF Version and Exclusive Extras
Epilogue
Appendix
Technical requirementsGetting started with the official DeepSeek APISetting upUsing the available modelsTemperaturePricing and rate limitsAPI featuresReasoningStreamingJSON outputFunction callingFIMUsing common third-party APIsCloudflareQuirks and tipsAWSQuirks and tipsOpenRouterQuirks and tipsWorking with Cursor’s IDE for DeepSeekSetting up your development environmentConfiguring DeepSeek in Cursor’s IDEDirect integration: Connecting Cursor to DeepSeekAlternative: Command-line integrationRunning or deploying DeepSeek yourselfUsing llama.cppOllamaDeploying DeepSeek yourselfBuilding your own setup for DeepSeekLiteLLMLangChainInstructorOther interesting libraries and resourcesGet This Book’s PDF Version and Exclusive Extras
Unlock Your Exclusive Benefits
Unlock this Book’s Free Benefits in 3 Easy StepsStep 1Step 2Step 3Need help?
Other Books You May Enjoy
Index

Content preview from DeepSeek in Practice

8 Deploying DeepSeek Models

In the previous chapter, we distilled and fine-tuned smaller, domain-specific models that you could run on modest hardware and within strict privacy boundaries. That work is optimized for efficiency and control at a smaller scale. This chapter takes the complementary step of deploying full-parameter DeepSeek models (V3 and R1) as dependable production services.

Deployment is the bridge from research to production. It forces concrete choices about memory footprint, throughput, and operational risk. DeepSeek’s architectures magnify these trade-offs: V3’s Mixture-of-Experts (MoE) stresses VRAM placement; R1’s extended reasoning inflates token counts and time-to-first-token. The right path depends on your constraints. ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781806020850

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

DeepSeek in Practice

by Andy Peng, Alex Strick van Linschoten, Duarte O.Carmo

8

Deploying DeepSeek Models

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.