Gen AI Foundations – Architecture, Inference, and Optimization Essentials

Published by Pearson

Intermediate

Build a solid foundation for deploying gen AI and LLMs

Demystify how ChatGPT-style models actually work.
Learn the LLM mechanics that matter in real-world agentic systems.
Learn abstract AI concepts through concrete, interactive demonstrations.

Gen AI Foundations is the course you need to quickly understand how AI and LLMs work so you can start using it today. This 4-hour course will help you build a clear mental model of large language models from first principles. You will get a solid foundation for how tokenization, attention mechanisms, transformer architectures, and training objectives work together to produce fluent, capable AI systems.

This course removes the “black box” and replaces it with architectural insight. This training has been designed to go beyond conceptual overviews and focus on the practical levers used in production.

Learn inference-time hyperparameters, model behavior trade-offs, fine-tuning strategies, and optimization considerations that directly impact cost, latency, and output quality. Complex ideas become intuitive through live demonstrations that visualize tokenization, attention behavior, and inference dynamics, allowing you to observe how small configuration changes materially alter model outputs and system performance.

What you’ll learn and how you can apply it

Explain how any modern LLM processes text from input to output, including tokenization, attention, and inference behavior.
Tune inference-time hyperparameters to deliberately shape model responses, control variability, and improve output quality.
Evaluate and compare LLMs using professional benchmarking concepts, understanding strengths, limitations, and trade-offs across model families.
Apply core optimization principles to balance quality, latency, and cost when using LLMs in real-world and production-oriented scenarios.

This live event is for you because...

This live event is for you because you are working with large language models and want a clear, practical understanding of how they actually function. This course is ideal for technically minded professionals who need to move beyond surface-level usage and gain the confidence to select models intelligently, tune them effectively, and explain their behavior to peers, stakeholders, or customers. No academic AI or machine learning background is required—just technical curiosity and a desire to understand LLMs at a systems level.

Prerequisites

Basic understanding of Generative AI systems, such as ChatGPT
Coding experience not required but beneficial

Course Set-up

No specific setup required
Course files available here

Recommended Preparation

Attend: Mastering AI and ML Fundamentals with Robert Barton and Jerome Henry
Read: Demystifying Generative AI: A Practical and Intuitive Introduction by Robert Barton and Jerome Henry

Recommended Follow-up

Watch: AI & ML Foundations by Robert Barton and Jerome Henry
Attend: Build Your Own AI Lab with Omar Santos

Schedule

The time frames are only estimates and may vary according to how the class is progressing.

Segment 1: Welcome and Overview (10 min)

Course objectives and learning outcomes
LLMs’ place in today’s AI landscape
Course links to AI deployment and agentic systems
Key takeaways

Segment 2: Language Modeling Foundations (45 min)

Next-token prediction defines language modeling
Language modeling powers LLMs
Evolution: statistical to neural models
Tokenization strategies and impact
Word embeddings: tokens to vectors
Early neural models (RNNs, LSTMs) and limits

Break (5 min)

Segment 3: Attention Mechanisms and Transformer Architectures (55 min)

Limits of sequential models at scale
Attention: Queries, Keys, Values
Self-attention for sequence modeling
Multi-head attention for diverse patterns
Positional encoding (absolute, RoPE)
Transformer types: encoder, decoder, hybrid
End-to-end text processing in transformers

Break (5 min)

Segment 4: Inference and Hyperparameters (25 min)

The inference lifecycle: from prompt to generated output
Temperature and randomness control
Top-k and top-p (nucleus) sampling
Output length controls: max tokens and stop sequences
Frequency and presence penalties for repetition management
Demo: how parameter choices affect model behavior

Segment 5: Classes and Families of LLMs (30 min)

Overview of major LLM architecture families and design trade-offs
Encoder-only models (e.g., BERT-style) and representation learning
Decoder-only models (e.g., GPT-style) and autoregressive generation
Instruction-tuned and task-specialized models
Open vs. closed models: capability, control, and deployment considerations
Aligning model architectures with real-world use cases
Model Benchmarking

Break (5 min)

Segment 6: Fine-Tuning LLMs for Agentic and Task-Oriented Systems (30 min)

Why and when fine-tuning is appropriate
Fine-tuning vs. prompt engineering vs. RAG
Instruction tuning and task adaptation
Parameter Efficient Methods: LoRA
Risks, limitations, and failure modes of fine-tuning

Segment 7: Optimizing and Scaling LLM Inference in Production (20 min)

Model quantization and reduced-precision inference
Pruning and distillation for smaller, faster models
Distributed inference concepts and batching strategies
Mixture-of-Experts (MoE) models and scaling efficiency
Hardware considerations: GPUs, memory, and throughput constraints
Balancing quality, latency, and cost in real-world systems

Course wrap-up and next steps (5 minutes)

Your Instructors

Rob Barton
Rob Barton is a Distinguished Engineer with Cisco. Rob has worked in the IT industry for over 27 years, the last 25 of which have been with Cisco. Rob Graduated from the University of British Columbia with a degree in Engineering Physics. Rob is a published author, with titles on subjects of Generative AI, Quality of Service (QoS), Wireless Communications, and IoT. Additionally, he has co-authored many peer-reviewed research papers and leads Cisco’s academic research partnership program. Rob holds numerous patents in the areas of AI, wireless communications, network security, cloud networking, and IoT. His current areas of work include network automation and Agentic models for IT management.

search
Jerome Henry
Jerome Henry is a Distinguished Engineer in the Office of the Wireless CTO at Cisco Systems. His main field of research is around optimization of performances in unlicensed wireless networks, which includes aspects of QoS, IoT, privacy, indoor location, but also AI/Machine Learning and LLMs centered on network languages. Jerome has more than 25 years of experience teaching technical courses in more than 15 different countries and 4 different languages, to audiences ranging from graduate degree students to networking professionals and technical support engineers. Jerome joined Cisco in 2012. Before that time, he was consulting and teaching heterogeneous networks and wireless integration with the European Airespace team, which was later acquired by Cisco to become their main wireless solution.

search

Skill covered

Generative AI

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Gen AI Foundations – Architecture, Inference, and Optimization Essentials

What you’ll learn and how you can apply it

This live event is for you because...

Prerequisites

Schedule

Your Instructors

Rob Barton

Jerome Henry

Skill covered