book

Applied AI for Enterprise Java Development

by Alex Soto Bueno, Markus Eisele, Natale Vinto

November 2025

Intermediate to advanced

430 pages

10h 29m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Preface
Beyond Prototypes: Building Resilient AI-Infused Applications with JavaWho Should Read This BookHow the Book Is OrganizedPrerequisites and SoftwareConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgmentsAlexMarkusNatale
1. The Enterprise AI Conundrum
The AI Landscape: A Technical Perspective All the Way to GenAIMachine Learning: The Foundation of Today’s AIDeep Learning: A Powerful Tool in the AI ArsenalGenerative AI: The Future of Content GenerationOpen Source Models and Training DataWhy Open Source Is an Important Driver for GenAIThe Hidden Cost of Bad Data: Understanding Model Behavior Through Training InputsAdding Company-Specific Data to LLMsExplainable and Transparent AI DecisionsEthical and Sustainability ConsiderationsThe Lifecycle of LLMs and Ways to Influence Their BehaviorMLOps Versus DevOps (and the Rise of AIOps and GenAIOps)Conclusion
2. The New Types of Applications
Understanding Large Language ModelsKey Elements of a Large Language ModelDeployment of ModelsChoosing the Right LLM for Your ApplicationModel TypeModel Size and EfficiencyDeployment ApproachesSupported Precision and Hardware OptimizationEthical Considerations and BiasCommunity and Documentation SupportClosed Versus Open SourceExample CategorizationFoundation Models or Expert Models: Where Are We Headed?Using Supporting TechnologiesEmbedding Models and Vector DatabasesCaching and Performance OptimizationAI Agent FrameworksModel Context ProtocolAPI IntegrationModel Security, Compliance, and Access ControlConclusion
3. Prompts for Developers: Why Prompts Matter in AI-Infused Applications
Types of PromptsUser Prompts: Direct Input from the UserSystem Prompts: Instructions That Guide Model BehaviorContextual Prompts: Prepopulated or Dynamically Generated InputsPrinciples of Writing Effective PromptsPrompting TechniquesZero-Shot Prompting: Asking Without ContextFew-Shot Prompting: Providing Examples to Guide ResponsesChain-of-Thought Prompting: Encouraging Step-by-Step ReasoningSelf-Consistency: Improving Accuracy by Generating Multiple ResponsesInstruction Prompting: Directing the Model ExplicitlyRetrieval-Augmented Generation: Enhancing Prompts with External DataAdvanced StrategiesConstructing Dynamic Prompts: Combining Static and Generated InputsUsing Prompt Chaining to Maintain ContextUsing Guardrails and Validations for Safer OutputsLeveraging APIs for Prompt CustomizationOptimizing for Performance Versus CostDebugging Prompts: Troubleshooting Poor ResponsesTool Use and Function CallingContext Engineering as the New Prompt EngineeringDesigning Memory and Storage for ContextFast Access with In-Memory CachesHot Memory for Short-Term ContextVector Databases for Long-Term Semantic MemoryCold Storage for Archival Data and Large RepositoriesCombining Storage Tiers for Effective Context DeliveryConclusion
4. AI Architectures for Applications
Beyond Traditional Architectures: Why AI-Infused Systems Require a New ApproachOverview of Core Architectural Pillars: A Roadmap for the ChapterApplication ComponentsQueries and Data: Managing Application InputsThe AI Gateway: Managing Inputs and OutputsContext and MemoryInteraction and Transport: Using Tools and AgentsDiscovery and Access ControlModel ServingThe Data Preparation PipelineObservability and Monitoring: The End-to-End AI StackConclusion
5. Embedding Vectors, Vector Stores, and Running Models Locally
Embedding Vectors and Their RoleWhy Are Embeddings Needed?Structure of an Embedding VectorMeasuring Similarity: Cosine Similarity and DistanceCommon Embedding ModelsHow Are Embeddings Used in AI Applications?Other Similarity MethodsUncommon Uses of Embedding VectorsVector Stores and Querying MechanismsHow Vector Databases Store and Retrieve EmbeddingsExamples of Common Vector StoresRetrieval-Augmented GenerationIndexing or Generating Vector Embeddings at ScaleWhy Run Models Locally?Ollama: Local Inferencing with a Simple InterfacePodman Desktop: Using Containerized Environments for AI WorkloadsJlama: Java-Native Model Inferencing for JVM-Based ApplicationsComparing Local Inferencing MethodsUsing OpenAI’s REST APIOverview of OpenAI’s Models and EndpointsGenerating Embeddings with OpenAI’s APIConclusion
6. Inference APIs
What Is an Inference API?Benefits of an Inference APIExamples of Inference APIsDeploying Inference Models in JavaInferencing Models with DJLLooking Under the HoodInferencing Models with gRPCConclusion
7. Accessing the Inference Model with Java
Connecting to an Inference API with QuarkusThe ArchitectureThe Fraud Inference APIThe Quarkus ProjectThe REST Client InterfaceThe REST ResourceTesting the ExampleConnecting to an Inference API with Spring Boot WebClientAdding WebClient DependencyUsing the WebClientConnecting to the Inference API with the Quarkus gRPC ClientAdding gRPC DependenciesImplementing the gRPC ClientConclusion
8. LangChain4j
What Is LangChain4j?Unified APIsPrompt TemplatesStructured OutputsMemoryData AugmentationToolsHigh-Level APILangChain4j with Plain JavaExtracting Information from Unstructured TextPerforming Text ClassificationGenerating Images and DescriptionsSpring Boot IntegrationAdding Spring Boot DependenciesDefining the AI ServiceCreating a REST ControllerQuarkus IntegrationQuarkus DependenciesFrontendThe AI ServiceWebSocketOptical Character RecognitionToolsDependenciesRides PersistenceWaiting Times ServiceAI ServiceREST EndpointDynamic ToolingFinal Notes About ToolingMemoryDependenciesChanges to CodeConclusion
9. Vector Embeddings and Stores
Calculating Vector EmbeddingsVector Embeddings Using DJLVector Embeddings Using In-Process LangChain4jVector Embeddings Using Remote Models with LangChain4jText ClassifierEmbedding Text-Classification DependenciesProviding Examples and Categorizing InputsText ClusteringAdding Text Clustering DependenciesReading Headline NewsCalculating the Vector EmbeddingClustering NewsSummarizing News HeadlinesSemantic SearchAdding Semantic Search DependenciesImporting MoviesQuerying for SimilaritiesSemantic CacheRAGIngestionRetrievalRerankingQuery RouterIngestion Splitting WindowFiltering ResultsConclusion

10. LangGraph4j
Understanding Graphs in LangGraph4jNodesEdgesStateUsing LangGraph4jDefining a StateDefining a NodeDefining a GraphAdding Conditional EdgesAppending ValuesUsing LangChain4j with LangGraph4jRouting AgentsHuman Interaction with LangGraph4jAdvanced RAG Schema with Self-ReflectionExploring Additional FeaturesSubgraphsParallel ExecutionTime TravelConclusion
11. Image Processing
OpenCVInitializing the LibraryLoading and Saving ImagesPerforming Basic TransformationsOverlaying ElementsImage ProcessingReading Barcodes and QR CodesStream ProcessingProcessing VideosProcessing Webcam ImagesOpenCV and JavaOCRConclusion
12. Advanced Topics in AI Java Development
StreamingStreaming with a Low-Level APIStreaming with AI ServicesUsing LangChain4j and Streaming IntegrationsGuardrailsInput GuardrailOutput GuardrailGuardrail Use CasesModel Context ProtocolMCP ArchitectureMCP Client with JavaMCP Client with QuarkusMCP Server with QuarkusKey Benefits of MCPNext Steps
Index
About the Authors

Content preview from Applied AI for Enterprise Java Development

Chapter 6. Inference APIs

You’ve already expanded your knowledge about AI and the many types of models. Moreover, you deployed these models locally (if possible) and tested them with queries. But when it is time to use models, you need to expose them properly, follow your organization’s best practices, and provide developers with an easy way to consume the model.

An inference API helps solve these problems, making models accessible to all developers. This chapter explores how to expose an AI/ML model by using an inference API in Java.

What Is an Inference API?

An inference API allows developers to send data (in any protocol, such as HTTP, gRPC, or Kafka) to a server with an ML model deployed and receive the predictions or classifications as a result. Practically, every time you access cloud models like OpenAI or Gemini or locally deployed models using Ollama, you do so through their inference API.

Even though it is common these days to use big models trained by big corporations like Google, IBM, or Meta, mostly for LLM purposes, you might need to use small custom-trained models to solve one specific problem for your business. Usually, these models are developed by your organization’s data scientists, and you must develop code to infer them.

For example, suppose you are working for a bank, and data scientists have trained a custom model to detect whether a credit card transaction can be considered fraud. The model is a predictive AI model in ONNX format with six input parameters ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Java SE 17 Developer (1Z0-829)

Simon Roberts

AI Codecon: Coding with AI—The End of Software Development as We Know It

Tim O'Reilly, Addy Osmani, Gergely Orosz, Kent Beck, Camille Fournier, Avi Flombaum, Maxi Ferreira, Harper Reed, Jay Parikh, Birgitta Böckeler, Angie Jones, Craig McLuckie, Patty O’Callaghan, Chip Huyen, swyx, Andrew Stellman, Phillip Carter, Nikola Balic, Brett Smith, Chelsea Troy, Lili Jiang, Iyanuoluwa Ajao

Kubernetes for the Absolute Beginners - Hands-On

KodeKloud

Generative AI on AWS

Chris Fregly, Antje Barth, Shelbee Eigenbrode

Publisher Resources

ISBN: 9781098174491Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Applied AI for Enterprise Java Development

by Alex Soto Bueno, Markus Eisele, Natale Vinto

Chapter 6. Inference APIs

What Is an Inference API?

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.