Attention Is All You Need: The Original
Transformer ArchitecturePositional EncodingsMulti-Head AttentionBuilding the Rest of the TransformerBuilding an English-to-Spanish TransformerEncoder-Only Transformers for Natural
Language UnderstandingBERT’s ArchitectureBERT PretrainingBERT Fine-TuningOther Encoder-Only ModelsDecoder-Only TransformersGPT-1 Architecture and Generative PretrainingGPT-2 and Zero-Shot LearningGPT-3, In-Context Learning, One-Shot Learning,
and Few-Shot LearningUsing GPT-2 to Generate TextUsing GPT-2 for Question AnsweringDownloading and Running an Even Larger Model: Mistral-7BTurning a Large Language Model into a ChatbotFine-Tuning a Model for Chatting and Following
Instructions Using SFT and RLHFDirect Preference Optimization (DPO)Fine-Tuning a Model Using the TRL LibraryFrom a Chatbot Model to a Full Chatbot SystemModel Context ProtocolLibraries and ToolsEncoder-Decoder ModelsExercises