Gen AI Foundations – Architecture, Inference, and Optimization Essentials
Published by Pearson
Build a solid foundation for deploying gen AI and LLMs
- Demystify how ChatGPT-style models actually work.
- Learn the LLM mechanics that matter in real-world agentic systems.
- Learn abstract AI concepts through concrete, interactive demonstrations.
Gen AI Foundations is the course you need to quickly understand how AI and LLMs work so you can start using it today. This 4-hour course will help you build a clear mental model of large language models from first principles. You will get a solid foundation for how tokenization, attention mechanisms, transformer architectures, and training objectives work together to produce fluent, capable AI systems.
This course removes the “black box” and replaces it with architectural insight. This training has been designed to go beyond conceptual overviews and focus on the practical levers used in production.
Learn inference-time hyperparameters, model behavior trade-offs, fine-tuning strategies, and optimization considerations that directly impact cost, latency, and output quality. Complex ideas become intuitive through live demonstrations that visualize tokenization, attention behavior, and inference dynamics, allowing you to observe how small configuration changes materially alter model outputs and system performance.
What you’ll learn and how you can apply it
- Explain how any modern LLM processes text from input to output, including tokenization, attention, and inference behavior.
- Tune inference-time hyperparameters to deliberately shape model responses, control variability, and improve output quality.
- Evaluate and compare LLMs using professional benchmarking concepts, understanding strengths, limitations, and trade-offs across model families.
- Apply core optimization principles to balance quality, latency, and cost when using LLMs in real-world and production-oriented scenarios.
This live event is for you because...
This live event is for you because you are working with large language models and want a clear, practical understanding of how they actually function. This course is ideal for technically minded professionals who need to move beyond surface-level usage and gain the confidence to select models intelligently, tune them effectively, and explain their behavior to peers, stakeholders, or customers. No academic AI or machine learning background is required—just technical curiosity and a desire to understand LLMs at a systems level.
Prerequisites
- Basic understanding of Generative AI systems, such as ChatGPT
- Coding experience not required but beneficial
Course Set-up
- No specific setup required
- Course files available here
Recommended Preparation
- Attend: Mastering AI and ML Fundamentals with Robert Barton and Jerome Henry
- Read: Demystifying Generative AI: A Practical and Intuitive Introduction by Robert Barton and Jerome Henry
Recommended Follow-up
- Watch: AI & ML Foundations by Robert Barton and Jerome Henry
- Attend: Build Your Own AI Lab with Omar Santos
Schedule
The time frames are only estimates and may vary according to how the class is progressing.
Segment 1: Welcome and Overview (10 min)
- Course objectives and learning outcomes
- LLMs’ place in today’s AI landscape
- Course links to AI deployment and agentic systems
- Key takeaways
Segment 2: Language Modeling Foundations (45 min)
- Next-token prediction defines language modeling
- Language modeling powers LLMs
- Evolution: statistical to neural models
- Tokenization strategies and impact
- Word embeddings: tokens to vectors
- Early neural models (RNNs, LSTMs) and limits
Break (5 min)
Segment 3: Attention Mechanisms and Transformer Architectures (55 min)
- Limits of sequential models at scale
- Attention: Queries, Keys, Values
- Self-attention for sequence modeling
- Multi-head attention for diverse patterns
- Positional encoding (absolute, RoPE)
- Transformer types: encoder, decoder, hybrid
- End-to-end text processing in transformers
Break (5 min)
Segment 4: Inference and Hyperparameters (25 min)
- The inference lifecycle: from prompt to generated output
- Temperature and randomness control
- Top-k and top-p (nucleus) sampling
- Output length controls: max tokens and stop sequences
- Frequency and presence penalties for repetition management
- Demo: how parameter choices affect model behavior
Segment 5: Classes and Families of LLMs (30 min)
- Overview of major LLM architecture families and design trade-offs
- Encoder-only models (e.g., BERT-style) and representation learning
- Decoder-only models (e.g., GPT-style) and autoregressive generation
- Instruction-tuned and task-specialized models
- Open vs. closed models: capability, control, and deployment considerations
- Aligning model architectures with real-world use cases
- Model Benchmarking
Break (5 min)
Segment 6: Fine-Tuning LLMs for Agentic and Task-Oriented Systems (30 min)
- Why and when fine-tuning is appropriate
- Fine-tuning vs. prompt engineering vs. RAG
- Instruction tuning and task adaptation
- Parameter Efficient Methods: LoRA
- Risks, limitations, and failure modes of fine-tuning
Segment 7: Optimizing and Scaling LLM Inference in Production (20 min)
- Model quantization and reduced-precision inference
- Pruning and distillation for smaller, faster models
- Distributed inference concepts and batching strategies
- Mixture-of-Experts (MoE) models and scaling efficiency
- Hardware considerations: GPUs, memory, and throughput constraints
- Balancing quality, latency, and cost in real-world systems
Course wrap-up and next steps (5 minutes)
Your Instructors
Rob Barton
Rob Barton is a Distinguished Engineer with Cisco. Rob has worked in the IT industry for over 27 years, the last 25 of which have been with Cisco. Rob Graduated from the University of British Columbia with a degree in Engineering Physics. Rob is a published author, with titles on subjects of Generative AI, Quality of Service (QoS), Wireless Communications, and IoT. Additionally, he has co-authored many peer-reviewed research papers and leads Cisco’s academic research partnership program. Rob holds numerous patents in the areas of AI, wireless communications, network security, cloud networking, and IoT. His current areas of work include network automation and Agentic models for IT management.
Jerome Henry
Jerome Henry is a Distinguished Engineer in the Office of the Wireless CTO at Cisco Systems. His main field of research is around optimization of performances in unlicensed wireless networks, which includes aspects of QoS, IoT, privacy, indoor location, but also AI/Machine Learning and LLMs centered on network languages. Jerome has more than 25 years of experience teaching technical courses in more than 15 different countries and 4 different languages, to audiences ranging from graduate degree students to networking professionals and technical support engineers. Jerome joined Cisco in 2012. Before that time, he was consulting and teaching heterogeneous networks and wireless integration with the European Airespace team, which was later acquired by Cisco to become their main wireless solution.