book

Generative Deep Learning

by David Foster

June 2019

Intermediate to advanced

327 pages

7h 36m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Objective and ApproachPrerequisitesOther ResourcesConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
What Is Generative Modeling?Generative Versus Discriminative ModelingAdvances in Machine LearningThe Rise of Generative ModelingThe Generative Modeling FrameworkProbabilistic Generative ModelsHello Wrodl!Your First Probabilistic Generative ModelNaive BayesHello Wrodl! ContinuedThe Challenges of Generative ModelingRepresentation LearningSetting Up Your EnvironmentSummary
Structured and Unstructured DataDeep Neural NetworksKeras and TensorFlowYour First Deep Neural NetworkLoading the DataBuilding the ModelCompiling the ModelTraining the ModelEvaluating the ModelImproving the ModelConvolutional LayersBatch NormalizationDropout LayersPutting It All TogetherSummary
The Art ExhibitionAutoencodersYour First AutoencoderThe EncoderThe DecoderJoining the Encoder to the DecoderAnalysis of the AutoencoderThe Variational Art ExhibitionBuilding a Variational AutoencoderThe EncoderThe Loss FunctionAnalysis of the Variational AutoencoderUsing VAEs to Generate FacesTraining the VAEAnalysis of the VAEGenerating New FacesLatent Space ArithmeticMorphing Between FacesSummary
GanimalsIntroduction to GANsYour First GANThe DiscriminatorThe GeneratorTraining the GANGAN ChallengesOscillating LossMode CollapseUninformative LossHyperparametersTackling the GAN ChallengesWasserstein GANWasserstein LossThe Lipschitz ConstraintWeight ClippingTraining the WGANAnalysis of the WGANWGAN-GPThe Gradient Penalty LossAnalysis of WGAN-GPSummary
Apples and OrgangesCycleGANYour First CycleGANOverviewThe Generators (U-Net)The DiscriminatorsCompiling the CycleGANTraining the CycleGANAnalysis of the CycleGANCreating a CycleGAN to Paint Like MonetThe Generators (ResNet)Analysis of the CycleGANNeural Style TransferContent LossStyle LossTotal Variance LossRunning the Neural Style TransferAnalysis of the Neural Style Transfer ModelSummary
The Literary Society for Troublesome MiscreantsLong Short-Term Memory NetworksYour First LSTM NetworkTokenizationBuilding the DatasetThe LSTM ArchitectureThe Embedding LayerThe LSTM LayerThe LSTM CellGenerating New TextRNN ExtensionsStacked Recurrent NetworksGated Recurrent UnitsBidirectional CellsEncoder–Decoder ModelsA Question and Answer GeneratorA Question-Answer DatasetModel ArchitectureInferenceModel ResultsSummary
PreliminariesMusical NotationYour First Music-Generating RNNAttentionBuilding an Attention Mechanism in KerasAnalysis of the RNN with AttentionAttention in Encoder–Decoder NetworksGenerating Polyphonic MusicThe Musical OrganYour First MuseGANThe MuseGAN GeneratorChords, Style, Melody, and GrooveThe Bar GeneratorPutting It All TogetherThe CriticAnalysis of the MuseGANSummary

Reinforcement LearningOpenAI GymWorld Model ArchitectureThe Variational AutoencoderThe MDN-RNNThe ControllerSetupTraining Process OverviewCollecting Random Rollout DataTraining the VAEThe VAE ArchitectureExploring the VAECollecting Data to Train the RNNTraining the MDN-RNNThe MDN-RNN ArchitectureSampling the Next z and Reward from the MDN-RNNThe MDN-RNN Loss FunctionTraining the ControllerThe Controller ArchitectureCMA-ESParallelizing CMA-ESOutput from the Controller TrainingIn-Dream TrainingIn-Dream Training the ControllerChallenges of In-Dream TrainingSummary
Five Years of ProgressThe TransformerPositional EncodingMultihead AttentionThe DecoderAnalysis of the TransformerBERTGPT-2MuseNetAdvances in Image GenerationProGANSelf-Attention GAN (SAGAN)BigGANStyleGANApplications of Generative ModelingAI ArtAI Music

Content preview from Generative Deep Learning

Chapter 7. Compose

Alongside visual art and creative writing, musical composition is another core act of creativity that we consider to be uniquely human.

For a machine to compose music that is pleasing to our ear, it must master many of the same technical challenges that we saw in the previous chapter in relation to text. In particular, our model must be able to learn from and re-create the sequential structure of music and must also be able to choose from a discrete set of possibilities for subsequent notes.

However, musical generation presents additional challenges that are not required for text generation, namely pitch and rhythm. Music is often polyphonic—that is, there are several streams of notes played simultaneously on different instruments, which combine to create harmonies that are either dissonant (clashing) or consonant (harmonious). Text generation only requires us to handle a single stream of text, rather than the parallel streams of chords that are present in music.

Also, text generation can be handled one word at a time. We must consider carefully if this is an appropriate way to process musical data, as much of the interest that stems from listening to music is in the interplay between different rhythms across the ensemble. A guitarist might play a flurry of quicker notes while the pianist holds a longer sustained chord, for example. Therefore, generating music note by note is complex, because we often do not want all the instruments to change note simultaneously ...