book

Programming PyTorch for Deep Learning

by Ian Pointer

September 2019

Intermediate to advanced

217 pages

5h 39m

English

O'Reilly Media, Inc.

Book available

Read now

Unlock full access

Deep Learning in the World TodayBut What Is Deep Learning Exactly, and Do I Need a PhD to Understand It?PyTorchWhat About TensorFlow?Conventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
Building a Custom Deep Learning MachineGPUCPU/MotherboardRAMStorageDeep Learning in the CloudGoogle ColaboratoryCloud ProvidersWhich Cloud Provider Should I Use?Using Jupyter NotebookInstalling PyTorch from ScratchDownload CUDAAnacondaFinally, PyTorch! (and Jupyter Notebook)TensorsTensor OperationsTensor BroadcastingConclusionFurther Reading
Our Classification ProblemTraditional ChallengesBut First, DataPyTorch and Data LoadersBuilding a Training DatasetBuilding Validation and Test DatasetsFinally, a Neural Network!Activation FunctionsCreating a NetworkLoss FunctionsOptimizingTrainingMaking It Work on the GPUPutting It All TogetherMaking PredictionsModel SavingConclusionFurther Reading
Our First Convolutional ModelConvolutionsPoolingDropoutHistory of CNN ArchitecturesAlexNetInception/GoogLeNetVGGResNetOther Architectures Are Available!Using Pretrained Models in PyTorchExamining a Model’s StructureBatchNormWhich Model Should You Use?One-Stop Shopping for Models: PyTorch HubConclusionFurther Reading
Transfer Learning with ResNetFinding That Learning RateDifferential Learning RatesData AugmentationTorchvision TransformsColor Spaces and Lambda TransformsCustom Transform ClassesStart Small and Get Bigger!EnsemblesConclusionFurther Reading
Recurrent Neural NetworksLong Short-Term Memory NetworksGated Recurrent UnitsbiLSTMEmbeddingstorchtextGetting Our Data: Tweets!Defining FieldsBuilding a VocabularyCreating Our ModelUpdating the Training LoopClassifying TweetsData AugmentationRandom InsertionRandom DeletionRandom SwapBack TranslationAugmentation and torchtextTransfer Learning?ConclusionFurther Reading
SoundThe ESC-50 DatasetObtaining the DatasetPlaying Audio in JupyterExploring ESC-50SoX and LibROSAtorchaudioBuilding an ESC-50 DatasetA CNN Model for ESC-50This Frequency Is My UniverseMel SpectrogramsA New DatasetA Wild ResNet AppearsFinding a Learning RateAudio Data Augmentationtorchaudio TransformsSoX Effect ChainsSpecAugmentFurther ExperimentsConclusionFurther Reading
It’s 3 a.m. What Is Your Data Doing?TensorBoardInstalling TensorBoardSending Data to TensorBoardPyTorch HooksPlotting Mean and Standard DeviationClass Activation MappingFlame GraphsInstalling py-spyReading Flame GraphsFixing a Slow TransformationDebugging GPU IssuesChecking Your GPUGradient CheckpointingConclusionFurther Reading
Model ServingBuilding a Flask ServiceSetting Up the Model ParametersBuilding the Docker ContainerLocal Versus Cloud StorageLogging and TelemetryDeploying on KubernetesSetting Up on Google Kubernetes EngineCreating a k8s ClusterScaling ServicesUpdates and Cleaning UpTorchScriptTracingScriptingTorchScript LimitationsWorking with libTorchObtaining libTorch and Hello WorldImporting a TorchScript ModelConclusionFurther Reading
Data Augmentation: Mixed and SmoothedmixupLabel SmoothingComputer, Enhance!Introduction to Super-ResolutionAn Introduction to GANsThe Forger and the CriticTraining a GANThe Dangers of Mode CollapseESRGANFurther Adventures in Image DetectionObject DetectionFaster R-CNN and Mask R-CNNAdversarial SamplesBlack-Box AttacksDefending Against Adversarial AttacksMore Than Meets the Eye: The Transformer ArchitecturePaying AttentionAttention Is All You NeedBERTFastBERTGPT-2Generating Text with GPT-2ULMFiTWhat to Use?ConclusionFurther Reading

Content preview from Programming PyTorch for Deep Learning

Chapter 6. A Journey into Sound

One of the most successful applications of deep learning is something that we carry around with us every day. Whether it’s Siri or Google Now, the engines that power both systems and Amazon’s Alexa are neural networks. In this chapter, we’ll take a look at PyTorch’s torchaudio library. You’ll learn how to use it to construct a pipeline for classifying audio data with a convolutional-based model. After that, I’ll suggest a different approach that will allow you to use some of the tricks you learned for images and obtain good accuracy on the ESC-50 audio dataset.

But first, let’s take a look at sound itself. What is it? How is it often represented in data form, and does that provide us with any clues as to what type of neural net we should use to gain insight from our data?

Sound

Sound is created via the vibration of air. All the sounds we hear are combinations of high and low pressure that we often represent in a waveform, like the one in Figure 6-1. In this image, the wave above the origin is high pressure, and the part below is low pressure.