Data Science with .NET and Polyglot Notebooks

Book description

Expand your skillset by learning how to perform data science, machine learning, and generative AI experiments in .NET Interactive notebooks using a variety of languages, including C#, F#, SQL, and PowerShell

Key Features

  • Conduct a full range of data science experiments with clear explanations from start to finish
  • Learn key concepts in data analytics, machine learning, and AI and apply them to solve real-world problems
  • Access all of the code online as a notebook and interactive GitHub Codespace
  • Purchase of the print or Kindle book includes a free PDF eBook

Book Description

As the fields of data science, machine learning, and artificial intelligence rapidly evolve, .NET developers are eager to leverage their expertise to dive into these exciting domains but are often unsure of how to do so. Data Science in .NET with Polyglot Notebooks is the practical guide you need to seamlessly bring your .NET skills into the world of analytics and AI.

With Microsoft’s .NET platform now robustly supporting machine learning and AI tasks, the introduction of tools such as .NET Interactive kernels and Polyglot Notebooks has opened up a world of possibilities for .NET developers. This book empowers you to harness the full potential of these cutting-edge technologies, guiding you through hands-on experiments that illustrate key concepts and principles. Through a series of interactive notebooks, you’ll not only master technical processes but also discover how to integrate these new skills into your current role or pivot to exciting opportunities in the data science field.

By the end of the book, you’ll have acquired the necessary knowledge and confidence to apply cutting-edge data science techniques and deliver impactful solutions within the .NET ecosystem.

What you will learn

  • Load, analyze, and transform data using DataFrames, data visualization, and descriptive statistics
  • Train machine learning models with ML.NET for classification and regression tasks
  • Customize ML.NET model training pipelines with AutoML, transforms, and model trainers
  • Apply best practices for deploying models and monitoring their performance
  • Connect to generative AI models using Polyglot Notebooks
  • Chain together complex AI tasks with AI orchestration, RAG, and Semantic Kernel
  • Create interactive online documentation with Mermaid charts and GitHub Codespaces

Who this book is for

This book is for experienced C# or F# developers who want to transition into data science and machine learning while leveraging their .NET expertise. It’s ideal for those looking to learn ML.NET and Semantic kernel and extend their .NET skills to data science, machine learning, and Generative AI Workflows.

Table of contents

  1. Data Science with .NET and Polyglot Notebooks
  2. Contributors
  3. About the author
  4. About the reviewers
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
    4. Download the example code files
    5. Conventions used
    6. Get in touch
    7. Share your thoughts
    8. Download a free PDF copy of this book
  6. Part 1: Data Analysis in Polyglot Notebooks
  7. Chapter 1: Data Science, Notebooks, and Kernels
    1. Exploring the field of data science
      1. The rise of big data
      2. Data analytics
      3. Machine learning
      4. Artificial intelligence
    2. Data science notebooks and Project Jupyter
    3. Extending notebooks with kernels
    4. Polyglot Notebooks and .NET Interactive
    5. Summary
    6. Further reading
  8. Chapter 2: Exploring Polyglot Notebooks
    1. Technical requirements
    2. Installing Polyglot Notebooks
    3. Creating your first notebook
    4. Executing notebook cells
      1. Adding code cells
      2. Working with variables
      3. The Variables view
      4. Markdown cells
    5. Declaring classes and methods
      1. Declaring methods
      2. Declaring classes
    6. Working with other languages
    7. Sharing variables between languages
      1. Exporting variables
    8. Troubleshooting notebook execution
      1. Resolving compiler errors
      2. Problems with notebook execution
      3. Diagnostic output for Polyglot Notebooks errors
      4. Issues and the Polyglot Notebooks repository
    9. Summary
    10. Further reading
  9. Chapter 3: Getting Data and Code into Your Notebooks
    1. Technical requirements
    2. Importing code and NuGet packages
      1. Importing code files
      2. Importing NuGet packages
      3. Importing project files
    3. Reading CSV data
      1. Understanding CSV data
      2. Reading CSV data into a DataFrame
      3. Specialized CSV loading scenarios
      4. Troubleshooting CSV loading errors
      5. Loading TSV and other delimited file formats
    4. Getting JSON data with PowerShell
    5. Building DataFrames from objects
    6. Connecting to databases with SQL
      1. Connecting to a SQL database
      2. Executing SQL from SQL kernels
      3. Sharing SQL results with other kernels
      4. Alternative ways of connecting to the Database
    7. Querying Kusto clusters with KQL
    8. Summary
    9. Further reading
  10. Chapter 4: Working with Tabular Data and DataFrames
    1. Technical requirements
    2. Understanding data cleaning and data wrangling
      1. Where unclean data comes from
      2. The impact of unclean data
      3. Data cleaning and data wrangling
    3. Working with DataFrames in C#
      1. Viewing and sampling data
      2. Rows
      3. Getting and setting cell values
      4. Iterating over rows
    4. Working with columns
      1. Columns
      2. Analyzing columns
      3. Removing columns
      4. Renaming columns
      5. Adding a new column
    5. Handling missing values
    6. Sorting, filtering, grouping, and merging data
      1. Sorting DataFrames
      2. Grouping and aggregating DataFrames
      3. Merging DataFrames
      4. Filtering DataFrames
    7. DataFrames in other languages
    8. Summary
    9. Further reading
  11. Chapter 5: Visualizing Data
    1. Technical requirements
    2. Understanding exploratory data analysis
      1. Data visualization’s role in exploratory data analysis
      2. Descriptive statistics for EDA
    3. Extracting insights with descriptive statistics
      1. Using DataFrame.Description to generate descriptive statistics
      2. Descriptive statistics with MathNet.Numerics
    4. Creating a box plot with ScottPlot
    5. Performing univariate analysis with Plotly.NET
      1. Plotly and Plotly.NET
      2. Box plots in Plotly.NET
      3. Violin plots with Plotly.NET
      4. Histograms with Plotly.NET
    6. Summary
    7. Further reading
  12. Chapter 6: Variable Correlations
    1. Technical requirements
    2. Performing multivariate analysis with Plotly.NET
      1. Loading data and dependencies
      2. Multivariate analysis with box and violin plots
      3. Plotting multiple values with scatter plots
      4. Adding color to a scatter plot
      5. 3D scatter plots with Plotly.NET
    3. Identifying variable correlations
      1. Calculating variable correlations
      2. Building feature correlation matrixes
    4. Summary
    5. Further reading
  13. Part 2: Machine Learning with Polyglot Notebooks and ML.NET
  14. Chapter 7: Classification Experiments with ML.NET AutoML
    1. Technical requirements
    2. Understanding machine learning
      1. Supervised learning
      2. Classification and regression
    3. Introducing ML.NET and AutoML
      1. Understanding AutoML
      2. AutoML and data pre-processing
    4. Creating training and testing datasets
    5. Training a classification model with ML.NET AutoML
    6. Evaluating binary classification models
      1. Evaluating our model
      2. Calculating feature importance
    7. Predicting values with binary classification models
    8. Summary
    9. Further reading
  15. Chapter 8: Regression Experiments with ML.NET AutoML
    1. Technical requirements
    2. Understanding regression
      1. Our regression task
      2. Regression as a numerical formula
      3. Our regression dataset
    3. Performing a regression experiment
      1. Understanding cross-validation
    4. Interpreting cross-validation results
    5. Evaluating regression metrics
      1. Predicting values for outliers
      2. Applying PFI to regression models
    6. Applying a regression model
    7. Summary
    8. Further reading
  16. Chapter 9: Beyond AutoML: Pipelines, Trainers, and Transforms
    1. Technical requirements
    2. Performing regression without AutoML
      1. Features and pipelines
    3. Creating an AutoML pipeline
    4. Controlling AutoML pipelines
      1. Customizing the Featurizer
      2. Customizing the model trainer selector
    5. Customizing hyperparameter tuning
      1. Understanding the search space
      2. Customizing the search space
      3. Customizing the hyperparameter tuner
    6. Scaling numeric columns
    7. Selecting regression algorithms
    8. Selecting binary classification algorithms
    9. Summary
    10. Further reading
  17. Chapter 10: Deploying Machine Learning Models
    1. Technical requirements
    2. Introducing our multi-class classification model
      1. Training our model
      2. Evaluating multi-class classification models
      3. Generating test predictions
    3. Exporting ML.NET models
    4. Hosting ML.NET models in ASP.NET web applications
      1. Configuring a PredictionEnginePool
      2. Using the PredictionEnginePool
    5. Understanding model performance, data drift, and MLOps
      1. Detecting model drift
      2. MLOps and updating models
    6. Surveying additional ML.NET capabilities
      1. ONNX and TensorFlow models in ML.NET
    7. Summary
    8. Further reading
  18. Part 3: Exploring Generative AI with Polyglot Notebooks
  19. Chapter 11: Generative AI in Polyglot Notebooks
    1. Technical requirements
    2. Understanding Generative AI
    3. Deploying generative AI models on Azure
      1. Creating an Azure OpenAI Service
      2. Deploying models on Azure OpenAI Service
      3. Getting access credentials for Azure OpenAI
    4. Connecting to an Azure OpenAI Service
    5. Chatting with a deployed model
    6. Customizing model behavior with prompt engineering
      1. Zero-shot, one-shot, and few-shot inferencing
    7. Using text embeddings
    8. Generating images with DALL-E
    9. Summary
    10. Further reading
  20. Chapter 12: AI Orchestration with Semantic Kernel
    1. Technical requirements
    2. Understanding RAG and AI orchestration
    3. Introducing Semantic Kernel
    4. Chatting with Semantic Kernel functions
      1. Building the Kernel
      2. Creating a prompt function
    5. Adding memory to Semantic Kernel
    6. Defining complex functions
      1. Creating functions from methods
      2. Accepting KernelFunction parameters
      3. Defining a memory function
    7. Calling multiple functions using plugins
      1. Examining FunctionResult objects
      2. Azure OpenAI content filtering
    8. Handling complex requests with planners
    9. Knowing where to go from here
    10. Summary
    11. Further reading
  21. Part 4: Polyglot Notebooks in the Enterprise
  22. Chapter 13: Enriching Documentation with Mermaid Diagrams
    1. Technical requirements
    2. Introducing Mermaid diagrams
    3. Communicating logic with flowcharts
    4. Communicating structure with class diagrams
    5. Communicating data with Entity Relationship Diagrams
    6. Communicating behavior with state diagrams
    7. Communicating flow with sequence diagrams
    8. Communicating workflow with Git graphs
    9. Summary
    10. Further reading
  23. Chapter 14: Extending Polyglot Notebooks
    1. Technical requirements
    2. Understanding default formatting behavior
      1. Default object formatting
      2. Default collection formatting
    3. Styling output with custom formatters
    4. Exploring magic commands
    5. Creating a Polyglot Notebook extension
    6. Working with parameters
    7. Invoking code on kernels
    8. Summary
    9. Further reading
  24. Chapter 15: Adopting and Deploying Polyglot Notebooks
    1. Technical requirements
    2. Integrating Polyglot Notebooks into your day job
      1. Enabling rapid experimentation
      2. Supporting AI and analytics workloads
      3. Assisting testing workloads
      4. Training new team members with Polyglot Notebooks
    3. Sharing Polyglot Notebooks with your team
      1. Integrating Polyglot Notebooks into Jupyter or JupyterLab
      2. Storing Notebooks in source control
    4. Deploying Polyglot Notebooks to GitHub Codespaces
      1. Configuring GitHub codespaces
      2. Creating a codespace on GitHub
    5. Advancing into machine learning and AI
      1. Adding data science to your day job
      2. Getting into data science
      3. Succeeding in data science
    6. Summary
    7. Further reading
  25. Index
    1. Why subscribe?
  26. Other Books You May Enjoy
    1. Packt is searching for authors like you
    2. Share your thoughts
    3. Download a free PDF copy of this book

Product information

  • Title: Data Science with .NET and Polyglot Notebooks
  • Author(s): Matt Eland
  • Release date: August 2024
  • Publisher(s): Packt Publishing
  • ISBN: 9781835882962