book

Data Labeling in Machine Learning with Python

Name: Data Labeling in Machine Learning with Python
Author: Vijaya Kumar Suda
ISBN: 9781804610541

by Vijaya Kumar Suda

January 2024

Intermediate to advanced

398 pages

9h 32m

English

Packt Publishing

Read now

Unlock full access

Data Labeling in Machine Learning with Python
AcknowledgmentsContributorsAbout the authorAbout the reviewers
Preface
Who this book is forWhat this book coversTo get the most out of this bookDownload the example code filesConventions usedGet in touchShare your thoughtsDownload a free PDF copy of this book
Part 1: Labeling Tabular Data
Chapter 1: Exploring Data for Machine Learning
Technical requirementsEDA and data labelingUnderstanding the ML project life cycleDefining the business problemData discovery and data collectionData explorationData labelingModel trainingModel evaluationModel deploymentIntroducing Pandas DataFramesSummary statistics and data aggregatesSummary statisticsData aggregates of the feature for each target classCreating visualizations using Seaborn for univariate and bivariate analysisUnivariate analysisBivariate analysisProfiling data using the ydata-profiling libraryVariables sectionInteractions sectionCorrelationsMissing valuesSample dataUnlocking insights from data with OpenAI and LangChainSummary
Chapter 2: Labeling Data for Classification
Technical requirementsPredicting labels with LLMs for tabular dataData labeling using SnorkelWhat is Snorkel?Why is Snorkel popular?Loading unlabeled dataCreating the labeling functionsLabeling rulesConstantsLabeling functionsCreating a label modelPredicting labelsLabeling data using the Compose libraryLabeling data using semi-supervised learningWhat is semi-supervised learning?What is pseudo-labeling?Labeling data using K-means clusteringWhat is unsupervised learning?K-means clusteringInertiaDunn's indexSummary
Chapter 3: Labeling Data for Regression
Technical requirementsUsing summary statistics to generate housing price labelsFinding the closest labeled observation to match the labelUsing semi-supervised learning to label regression dataPseudo-labelingUsing data augmentation to label regression dataUsing k-means clustering to label regression dataSummary
Part 2: Labeling Image Data
Chapter 4: Exploring Image Data
Technical requirementsVisualizing image data using Matplotlib in PythonLoading the dataChecking the dimensionsVisualizing the dataChecking for outliersPerforming data preprocessingChecking for class imbalanceIdentifying patterns and relationshipsEvaluating the impact of preprocessingPractice example of visualizing dataPractice example for adding annotations to an imagePractice example of image segmentationPractice example for feature extractionAnalyzing image size and aspect ratioImpact of aspect ratios on model performanceImage resizingImage normalizationPerforming transformations on images – image augmentationSummary
Chapter 5: Labeling Image Data Using Rules
Technical requirementsLabeling rules based on image visualizationImage labeling using rules with SnorkelWeak supervisionRules based on the manual visualization of an image’s object colorReal-world applicationsA practical example of plant disease detectionLabeling images using rules based on propertiesBounding boxesExample 1 – image classification – a bicycle with and without a personExample 2 – image classification – dog and cat imagesLabeling images using transfer learningExample – digit classification using a pre-trained classifierExample – person image detection using the YOLO V3 pre-trained classifierExample – bicycle image detection using the YOLO V3 pre-trained classifierLabeling images using transformationsSummary
Chapter 6: Labeling Image Data Using Data Augmentation
Technical requirementsTraining support vector machines with augmented image dataKernel trickData augmentationImage data augmentationImplementing an SVM with data augmentation in PythonIntroducing the CIFAR-10 datasetLoading the CIFAR-10 dataset in PythonPreprocessing the data for SVM trainingImplementing an SVM with the default hyperparametersEvaluating SVM on the original datasetImplementing an SVM with an augmented datasetTraining the SVM on augmented dataEvaluating the SVM’s performance on the augmented datasetImage classification using the SVM with data augmentation on the MNIST datasetConvolutional neural networks using augmented image dataHow CNNs workPractical example of a CNN using data augmentationCNN using image data augmentation with the CIFAR-10 datasetSummary

Part 3: Labeling Text, Audio, and Video Data
Chapter 7: Labeling Text Data
Technical requirementsReal-world applications of text data labelingTools and frameworks for text data labelingExploratory data analysis of textLoading the dataUnderstanding the dataCleaning and preprocessing the dataExploring the text’s contentAnalyzing relationships between text and other variablesVisualizing the resultsExploratory data analysis of sample text data setExploring Generative AI and OpenAI for labeling text dataGPT models by OpenAIZero-shot learning capabilitiesText classification with OpenAI modelsData labeling assistanceOpenAI API overviewUse case 1 – summarizing the textUse case 2 – topic generation for news articlesUse case 3 – classification of customer queries using the user-defined categories and sub-categoriesUse case 4 – information retrieval using entity extractionUse case 5 – aspect-based sentiment analysisHands-on labeling of text data using the Snorkel APIHands-on text labeling using Logistic RegressionHands-on label prediction using K-means clusteringGenerating labels for customer reviews (sentiment analysis)Summary
Chapter 8: Exploring Video Data
Technical requirementsLoading video data using cv2Extracting frames from video data for analysisExtracting features from video framesColor histogramOptical flow featuresMotion vectorsDeep learning featuresAppearance and shape descriptorsVisualizing video data using MatplotlibFrame visualizationTemporal visualizationMotion visualizationLabeling video data using k-means clusteringOverview of data labeling using k-means clusteringExample of video data labeling using k-means clustering with a color histogramAdvanced concepts in video data analysisMotion analysis in videosObject tracking in videosFacial recognition in videosVideo compression techniquesReal-time video processingVideo data formats and quality in machine learningCommon issues in handling video data for ML modelsTroubleshooting stepsSummary
Chapter 9: Labeling Video Data
Technical requirementsCapturing real-time videoKey components and featuresA hands-on example to capture real-time video using a webcamBuilding a CNN model for labeling video dataUsing autoencoders for video data labelingA hands-on example to label video data using autoencodersTransfer learningUsing the Watershed algorithm for video data labelingA hands-on example to label video data segmentation using the Watershed algorithmComputational complexityPerformance metricsReal-world examples for video data labelingAdvances in video data labeling and classificationSummary
Chapter 10: Exploring Audio Data
Technical requirementsReal-life applications for labeling audio dataAudio data fundamentalsHands-on with analyzing audio dataExample code for loading and analyzing sample audio fileBest practices for audio format conversionExample code for audio data cleaningExtracting properties from audio dataTempoChroma featuresMel-frequency cepstral coefficients (MFCCs)Zero-crossing rateSpectral contrastConsiderations for extracting propertiesVisualizing audio data with matplotlib and LibrosaWaveform visualizationLoudness visualizationSpectrogram visualizationMel spectrogram visualizationConsiderations for visualizationsEthical implications of audio dataRecent advances in audio data analysisTroubleshooting common issues during data analysisTroubleshooting common installation issues for audio librariesSummary
Chapter 11: Labeling Audio Data
Technical requirementsDownloading FFmpegAzure Machine LearningReal-time voice classification with Random ForestTranscribing audio using the OpenAI Whisper modelStep 1 – importing the Whisper modelStep 2 – loading the base Whisper modelStep 3 – setting up FFmpegStep 4 – transcribing the YouTube audio using the Whisper modelClassifying a transcription using Hugging Face transformersHands-on – labeling audio data using a CNNExploring audio data augmentationIntroducing Azure Cognitive Services – the speech serviceCreating an Azure Speech serviceSpeech to textSpeech translationSummary
Chapter 12: Hands-On Exploring Data Labeling Tools
Technical requirementsAzure Machine Learning data labelingLabel StudiopyOpenAnnotateData labeling using Azure Machine LearningBenefits of data labeling with Azure Machine LearningData labeling steps using Azure Machine LearningImage data labeling with Azure Machine LearningText data labeling with Azure Machine LearningAudio data labeling using Azure Machine LearningIntegration of the Azure Machine Learning pipeline with the labeled datasetExploring Label StudioLabeling the image dataLabeling the text dataLabeling the video datapyOpenAnnotateComputer Vision Annotation ToolComparison of data labeling toolsAdvanced methods in data labelingActive learningSemi-automated labelingSummary
Index
Why subscribe?
Other Books You May EnjoyPackt is searching for authors like youShare your thoughtsDownload a free PDF copy of this book

Content preview from Data Labeling in Machine Learning with Python

11 Labeling Audio Data

In this chapter, we will embark on this transformative journey through the realms of real-time audio capture, cutting-edge transcription with the Whisper model, and audio classification using a convolutional neural network (CNN), with a focus on spectrograms. Additionally, we’ll explore innovative audio augmentation techniques. This chapter not only equips you with the tools and techniques essential for comprehensive audio data labeling but also unveils the boundless possibilities that lie at the intersection of AI and audio processing, redefining the landscape of audio data labeling.

Welcome to a journey through the intricate world of audio data labeling! In this chapter, we embark on an exploration of cutting-edge techniques ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Data-Centric Machine Learning with Python

Publisher Resources

ISBN: 9781804610541

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Data Labeling in Machine Learning with Python

by Vijaya Kumar Suda

11

Labeling Audio Data

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.