Formulation of Temporal-Difference LearningQ-LearningSARSAQ-Learning Versus SARSACase Study: Automatically Scaling Application Containers to
Reduce CostIndustrial Example: Real-Time Bidding in AdvertisingDefining the MDPResults of the Real-Time Bidding EnvironmentsFurther ImprovementsExtensions to Q-LearningDouble Q-LearningDelayed Q-LearningComparing Standard, Double, and Delayed Q-learningOpposition Learningn-Step Algorithmsn-Step Algorithms on Grid EnvironmentsEligibility TracesExtensions to Eligibility TracesWatkins’s Q(λ)Fuzzy Wipes in Watkins’s Q(λ)Speedy Q-LearningAccumulating Versus Replacing Eligibility TracesSummaryFurther Reading