PreliminariesMarkov Decision ProcessTerminologyDifferent SettingsModel-FreeObservation SettingSingle-Player and Adversarial GamesQ-LearningFrom Policy to Neural Networks the followingPolicy IterationExploration Versus ExploitationBellman EquationInitial State SamplingQ-Learning ImplementationModeling Q(s,a)Experience ReplayConvolutional Layers and Image PreprocessingHistory ProcessingDouble Q-LearningClippingScaling RewardsPrioritized ReplayGraph, Visualization, and Mean-QRL4JConclusion