book

Extending Power BI with Python and R - Second Edition

Name: Extending Power BI with Python and R - Second Edition
Author: Luca Zavarella
ISBN: 9781837639533

by Luca Zavarella

March 2024

Intermediate to advanced

814 pages

22h 10m

English

Packt Publishing

Read now

Unlock full access

Preface
Who this book is forWhat this book coversSoftware used in this bookTo get the most out of this bookGet in touch
Where and How to Use R and Python Scripts in Power BI
Technical requirementsInjecting R or Python scripts into Power BIData loadingData transformationData visualizationUsing R and Python to interact with your dataPython and R compatibility across Power BI productsSummaryTest your knowledge
Configuring R with Power BI
Technical requirementsThe available R enginesThe CRAN R distributionThe Microsoft R Open distribution and MRANMulti-threading in MROChoosing an R engine to installThe R engines used by Power BIInstalling the suggested R enginesThe R engine for data transformationThe R engine for R script visuals on the Power BI serviceWhat to do when the Power BI service upgrades the R engineInstalling an IDE for R developmentInstalling RStudioInstalling RToolsLinking Intel’s MKL to RConfiguring Power BI Desktop to work with RDebugging an R script visualConfiguring the Power BI service to work with RInstalling the on-premises data gateway in personal modeSharing reports that use R scripts in the Power BI serviceR script visuals limitationsSummaryTest your knowledge
Configuring Python with Power BI
Technical requirementsThe available Python enginesChoosing a Python engine to installThe Python engines used by Power BIInstalling the suggested Python enginesThe Python engine for data transformationCreating an environment for data transformations using pipCreating an optimized environment for data transformations using condaCreating an environment for Python script visuals on the Power BI serviceWhat to do when the Power BI service upgrades the Python engineInstalling an IDE for Python developmentConfiguring Python with RStudioConfiguring Python with Visual Studio CodeWorking with the Python Interactive window in Visual Studio CodeConfiguring Power BI Desktop to work with PythonConfiguring the Power BI service to work with PythonSharing reports that use Python scripts in the Power BI serviceLimitations of Python visualsSummaryTest your knowledge
Solving Common Issues When Using Python and R in Power BI
Technical requirementsAvoiding the ADO.NET error when running a Python script in Power BIThe real cause of the problemA practical solution to the problemAvoiding the Formula.Firewall errorIncompatible privacy levelsIndirect access to a data sourceThe easy wayCombining queries and/or transformationsEncapsulating queries into functionsUsing multiple datasets in Python and R script stepsApplying a full join with MergeUsing arguments of the Python.Execute functionDealing with dates/times in Python and R script stepsSummaryTest your knowledge
Importing Unhandled Data Objects
Technical requirementsImporting RDS files in RA brief introduction to TidyverseCreating a serialized R objectConfiguring the environment and installing TidyverseCreating the RDS filesUsing an RDS file in Power BIImporting an RDS file into the Power Query EditorImporting an RDS file in an R script visualImporting PKL files in PythonA very short introduction to the PyData worldCreating a serialized Python objectConfiguring the environment and installing the PyData packagesCreating the PKL filesUsing a PKL file in Power BIImporting a PKL file into the Power Query EditorImporting a PKL file in a Python script visualSummaryReferencesTest your knowledge
Using Regular Expressions in Power BI
Technical requirementsA brief introduction to regexesThe basics of regexesLiteral charactersSpecial characters in regexThe ^ and $ anchorsOR operatorsNegated character classesShorthand character classesQuantifiersThe dotGreedy and lazy matchesChecking the validity of email addressesChecking the validity of datesValidating data using regex in Power BIUsing regex in Power BI to validate emails with PythonUsing regex in Power BI to validate emails with RUsing regex in Power BI to validate dates with PythonUsing regex in Power BI to validate dates with RLoading complex log files using regex in Power BIApache access logsImporting Apache access logs in Power BI with PythonImporting Apache access logs in Power BI with RExtracting values from text using regex in Power BIOne regex to rule them allUsing regex in Power BI to extract values with PythonUsing regex in Power BI to extract values with RSummaryReferencesTest your knowledge
Anonymizing and Pseudonymizing Your Data in Power BI
Technical requirementsDe-identifying dataDe-identification techniquesInformation removalData maskingData swappingGeneralizationData perturbationTokenizationHashingEncryptionUnderstanding pseudonymizationWhat is anonymization?Anonymizing data in Power BIAnonymizing data using PythonAnonymizing data using RPseudonymizing data in Power BIPseudonymizing data using PythonPseudonymizing data using RSummaryReferencesTest your knowledge
Logging Data from Power BI to External Sources
Technical requirementsLogging to CSV filesLogging to CSV files with PythonUsing the pandas moduleLogging emails to CSV files in Power BI with PythonLogging to CSV files with RUsing Tidyverse functionsLogging dates to CSV files in Power BI with RLogging to Excel filesLogging to Excel files with PythonUsing the pandas moduleLogging emails and dates to Excel files in Power BI with PythonLogging to Excel files with RUsing the readxl and openxlsx packagesLogging emails and dates to Excel in Power BI with RLogging to (Azure) SQL ServerInstalling SQL Server ExpressCreating an Azure SQL DatabaseLogging to an (Azure) SQL server with PythonUsing the pyodbc moduleLogging emails and dates to an Azure SQL Database in Power BI with PythonLogging to an (Azure) SQL Server with RUsing the DBI and odbc packagesLogging emails and dates to an Azure SQL Database in Power BI with RManaging credentials in the codeCreating environment variablesUsing environment variables in PythonUsing environment variables in RSummaryReferencesTest your knowledge
Loading Large Datasets Beyond the Available RAM in Power BI
Technical requirementsA typical analytic scenario using large datasetsImporting large datasets with PythonInstalling Dask on your laptopCreating a Dask DataFrameExtracting information from a Dask DataFrameImporting a large dataset in Power BI with PythonImporting large datasets with RIntroducing Apache ArrowInstalling arrow on your laptopCreating and extracting information from an Arrow Dataset objectImporting a large dataset in Power BI with RSummaryReferencesTest your knowledge

Boosting Data Loading Speed in Power BI with Parquet Format
Technical requirementsFrom CSV to the Parquet file formatLimitations of using Parquet files natively in Power BIUsing Parquet files with PythonAnalyzing Parquet data with DaskAnalyzing Parquet data with PyArrowPerformance differences between Dask and PyArrowUsing Parquet files with RAnalyzing Parquet data with Arrow for RUsing the Parquet format to speed up a Power BI reportTransforming historical data in ParquetAppending new data to and analyzing the Parquet datasetAnalyzing Parquet data in Power BI with PythonAnalyzing Parquet data in Power BI with RSummaryReferencesTest your knowledge
Calling External APIs to Enrich Your Data
Technical requirementsWhat is a web service?Registering for Bing Maps web servicesGeocoding addresses using PythonUsing an explicit GET requestUsing an explicit GET request in parallelUsing the Geocoder library in parallelGeocoding addresses using RUsing an explicit GET requestUsing an explicit GET request in parallelUsing the tidygeocoder package in parallelAccessing web services using Power BIGeocoding addresses in Power BI with PythonGeocoding addresses in Power BI with RSummaryReferencesTest your knowledge
Calculating Columns Using Complex Algorithms: Distances
Technical requirementsWhat is a distance?The distance between two geographic locationsSome theory firstSpherical trigonometryThe law of Cosines distanceThe law of Haversines distanceVincenty’s distanceWhat kind of distance to use and whenImplementing distances using PythonCalculating distances with PythonCalculating distances in Power BI with PythonImplementing distances using RCalculating distances with RCalculating distances in Power BI with RThe distance between two stringsSome theory firstThe Hamming distanceThe Levenshtein distanceThe Jaro-Winkler distanceThe Jaccard distanceWhat kind of distance to use and whenDeduplicating strings using Python and RDeduplicating emails with PythonDeduplicating emails with RDeduplicating emails in Power BISummaryReferencesTest your knowledge
Calculating Columns Using Complex Algorithms: Fuzzy Matching
Technical requirementsExploring default fuzzy matching in Power BIUsing Power Query’s fuzzy mergeIntroducing probabilistic record linkage algorithmsApplying probabilistic record linkage algorithmsApplying probabilistic record linkage in PythonApplying probabilistic record linkage in RApplying probabilistic record linkage in Power BISummaryReferencesTest your knowledge
Calculating Columns Using Complex Algorithms: Optimization Problems
Technical requirementsThe basics of linear programmingLinear equations and inequalitiesFormulating a linear optimization problemDefinition of the LP problem to solveFormulating the LP problemHandling optimization problems with Python and RSolving the LP problem in PythonSolving the LP problem in Power BI with PythonSolving the LP problem in RSolving the LP problem in Power BI with RSummaryReferencesTest your knowledge
Adding Statistical Insights: Associations
Technical requirementsExploring associations between variablesCorrelation between numeric variablesPearson’s correlation coefficientCharles Spearman’s correlation coefficientMaurice Kendall’s correlation coefficientDescription of a real caseImplementing correlation coefficients in PythonImplementing correlation coefficients in RImplementing correlation coefficients in Power BI with Python and RCorrelation between non-numeric variablesHarald Cramér’s correlation coefficientHenri Theil’s uncertainty coefficientCorrelation between non-numeric and numeric variablesPearson’s correlation ratioImplementing correlation coefficients in PythonImplementing correlation coefficients in RImplementing correlation coefficients in Power BI with Python and RSummaryReferencesTest your knowledge
Adding Statistical Insights: Outliers and Missing Values
Technical requirementsWhat outliers areThe causes of outliersIdentifying outliersUnivariate outliersMultivariate outliersNumeric variables and categorical variablesAll numeric variablesDealing with outliersImplementing outlier detection algorithmsImplementing outlier detection in PythonImplementing outlier detection in RImplementing outlier detection in Power BIWhat missing values are and how to deal with themThe causes of missing valuesHandling missing valuesDiscarding dataMean, median, and mode imputationEasy imputation by handMultiple imputationUnivariate time-series imputationMultivariate time series imputationDiagnosing missing values in R and PythonImplementing missing value imputation algorithmsRemoving missing valuesImputing tabular dataImputing time series dataImputing missing values in Power BISummaryReferencesTest your knowledge
Using Machine Learning without Premium or Embedded Capacity
Technical requirementsInteracting with ML in Power BI with dataflowsUsing AutoML solutionsPyCaretFLAMLAzure AutoMLAutoQuant for REmbedding training code in Power QueryTraining and using ML models with PyCaretUsing PyCaret in Power BITraining and using ML models with FLAMLUsing FLAML in Power BIUsing trained models in Power QueryScoring observations in Power Query using a trained PyCaret modelScoring observations in Power Query using a trained FLAML modelUsing trained models in script visualsScoring observations in a script visual using a trained modelCalling web services in Power QueryUsing Azure AutoML models in Power QueryTraining a model using the Azure AutoML UIConsuming an Azure ML-deployed model in Power BIUsing Azure AI Language in Power QueryConfiguring a new Language serviceConfiguring your Python environment and WindowsConsuming the Text Analytics API in Power BISummaryReferencesTest your knowledge
Using SQL Server External Languages for Advanced Analytics and ML Integration in Power BI
Technical requirementsIntroducing SQL Server Machine Learning ServicesThe Extensibility Framework to run Python and R scriptsInstalling Python and R custom runtimes for SQL ServerUpdating SQL ServerInstalling Machine Learning Services and Language ExtensionsInstalling the Python runtimeInstalling the R runtimeConfiguring the SQL Server Language ExtensionsGranting access to LaunchpadConfiguring the Language ExtensionsA closer look at sp_execute_external_scriptUnderstanding input and output parametersManaging loopback requestsManaging different Python environments or R installationsThe need for external languages with Power BIArchitectural and security policy constraintsRunning analytical scripts on data stored in SQL ServerMissing libraries in the Power BI serviceUsing external languages with Power BIConverting an EXEC to SELECT ... FROMImplementing the predictive stored procedureCalling a stored procedure in DirectQueryPublishing the report to the Power BI serviceSummaryReferencesAcknowledgementsTest your knowledge
Exploratory Data Analysis
Technical requirementsWhat is the goal of EDA?Understanding your dataCleaning your dataDiscovering associations between variablesEDA with Python and REDA in Power BIDataset summary pageMissing values explorationUnivariate explorationMultivariate explorationVariable associationsSummaryReferencesTest your knowledge
Using the Grammar of Graphics in Python with plotnine
Technical requirementsWhat is plotnine?plotnine core conceptsAnalyzing Titanic data with plotnineUsing plotnine in Power BIWorking with plotnine and getting an imageWorking with plotnine and getting a Matplotlib figureWorking with plotnine in the Python script visualSummaryReferencesAcknowledgmentTest your knowledge
Advanced Visualizations
Technical requirementsChoosing a circular barplotImplementing a circular barplot in RImplementing a circular barplot in Power BISummaryReferencesTest your knowledge
Interactive R Custom Visuals
Technical requirementsWhy interactive R custom visuals?Adding a dash of interactivity with PlotlyExploiting the interactivity provided by HTML widgetsPackaging it all into a Power BI custom visualInstalling the pbiviz packageDeveloping your first R HTML custom visualImporting the custom visual package into Power BISummaryReferencesTest your knowledge
Appendix 1: Answers
Appendix 2: Glossary
Other Books You May Enjoy
Index

Content preview from Extending Power BI with Python and R - Second Edition

13 Calculating Columns Using Complex Algorithms: Fuzzy Matching

In the previous chapter, we discussed the importance of distance measures in estimating the dissimilarity between two distinct strings. Continuing our exploration of data analysis techniques, this chapter delves into the world of fuzzy matching, a technique used to determine logical similarities and identity mismatches in duplicates. Unfortunately, finding a dissimilarity metric in string values can be challenging. However, Power BI comes with a complex, reliable, and scalable fuzzy matching algorithm implemented by the Microsoft Research team based on the Jaccard distance. Although this algorithm performs well enough for typical fuzzy matching problems, it’s worth noting that there ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781837639533

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Extending Power BI with Python and R - Second Edition

by Luca Zavarella

13

Calculating Columns Using Complex Algorithms: Fuzzy Matching

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.