book

Python Polars: The Definitive Guide

by Jeroen Janssens, Thijs Nieuwdorp

February 2025

Intermediate to advanced

504 pages

11h 31m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Includes

Quizzes

Foreword
Preface
Who This Book Is ForHanna: The Data AnalystKosjo: The Data EngineerA Broader AudienceGet More Out of This BookConventions Used in This BookO’Reilly Online LearningHow to Contact UsAcknowledgments
I. Begin
1. Introducing Polars
What Is This Thing Called Polars?Key FeaturesKey ConceptsAdvantagesWhy You Should Use PolarsPerformanceUsabilityPopularitySustainabilityPolars Compared to Other Data Processing PackagesWhy We Focus on Python PolarsHow This Book Is OrganizedAn ETL ShowcaseExtractBonus: Visualizing Neighborhoods and StationsTransformBonus: Visualizing Daily Trips per BoroughLoadBonus: Becoming Faster by Being LazyTakeaways
2. Getting Started
Setting Up Your EnvironmentDownloading the ProjectInstalling uvInstalling the ProjectWorking with the Virtual EnvironmentVerifying Your InstallationCrash Course in JupyterLabKeyboard ShortcutsInstalling Polars on Other ProjectsAll Optional DependenciesOptional Dependencies for InteroperabilityOptional Dependencies for Working with SpreadsheetsOptional Dependencies for Working with DatabasesOptional Dependencies for Working with Remote FilesystemsOptional Dependencies for Other I/O FormatsOptional Dependencies for Extra FunctionalityInstalling Optional DependenciesConfiguring PolarsTemporary Configuration Using a Context ManagerLocal Configuration Using a DecoratorCompiling Polars from ScratchEdge Case: Very Large DatasetsEdge Case: Processors Lacking AVX SupportTakeaways
3. Moving from pandas to Polars
AnimalsSimilarities to RecognizeAppearances to AppreciateDifferences in CodeDifferences in DisplayConcepts to UnlearnIndexAxesIndexing and SlicingEagernessRelaxednessSyntax to ForgetCommon Operations Side By SideTo and From pandasTakeaways
II. Form
4. Data Structures and Data Types
Series, DataFrames, and LazyFramesData TypesNested Data TypesMissing ValuesData Type ConversionTakeaways
5. Eager and Lazy APIs
Eager API: DataFrameLazy API: LazyFramePerformance DifferencesFunctionality DifferencesAttributesAggregation MethodsComputation MethodsDescriptive MethodsGroupBy MethodsExporting MethodsManipulation and Selection MethodsMiscellaneous MethodsTips and TricksGoing from LazyFrame to DataFrame and Vice VersaJoining a DataFrame with a LazyFrameCaching Intermittent ResultsTakeaways
6. Reading and Writing Data
Format OverviewReading CSV FilesParsing Missing Values CorrectlyReading Files with Encodings Other Than UTF-8Reading Excel SpreadsheetsWorking with Multiple FilesReading ParquetReading JSON and NDJSONJSONNDJSONOther File FormatsQuerying DatabasesWriting DataCSV FormatExcel FormatParquet FormatOther ConsiderationsTakeaways

III. Express
7. Beginning Expressions
Methods and NamespacesExpressions by ExampleSelecting Columns with ExpressionsCreating New Columns with ExpressionsFiltering Rows with ExpressionsAggregating with ExpressionsSorting Rows with ExpressionsThe Definition of an ExpressionProperties of ExpressionsCreating ExpressionsFrom Existing ColumnsFrom Literal ValuesFrom RangesOther Functions to Create ExpressionsRenaming ExpressionsExpressions Are IdiomaticTakeaways
8. Continuing Expressions
Types of OperationsExample A: Element-Wise OperationsExample B: Operations That Summarize to OneExample C: Operations That Summarize to One or MoreExample D: Operations That ExtendElement-Wise OperationsOperations That Perform Mathematical TransformationsOperations Related to TrigonometryOperations That Round and CategorizeOperations for Missing or Infinite ValuesOther OperationsNonreducing Series-Wise OperationsOperations That AccumulateOperations That Fill and ShiftOperations Related to Duplicate ValuesOperations That Compute Rolling StatisticsOperations That SortOther OperationsSeries-Wise Operations That Summarize to OneOperations That Are QuantifiersOperations That Compute StatisticsOperations That CountOther OperationsSeries-Wise Operations That Summarize to One or MoreOperations Related to Unique ValuesOperations That SelectOperations That Drop Missing ValuesOther OperationsSeries-Wise Operations That ExtendTakeaways
9. Combining Expressions
Inline Operators Versus MethodsArithmetic OperationsComparison OperationsBoolean Algebra OperationsBitwise OperationsUsing FunctionsWhen, Then, OtherwiseTakeaways
IV. Transform
10. Selecting and Creating Columns
Selecting ColumnsIntroducing SelectorsSelecting Based on NameSelecting Based on Data TypeSelecting Based on PositionCombining SelectorsCreating ColumnsRelated Column OperationsDroppingRenamingStackingAdding Row IndicesTakeaways
11. Filtering and Sorting Rows
Filtering RowsFiltering Based on ExpressionsFiltering Based on Column NamesFiltering Based on ConstraintsSorting RowsSorting Based on a Single ColumnSorting in ReverseSorting Based on Multiple ColumnsSorting Based on ExpressionsSorting Nested Data TypesRelated Row OperationsFiltering Missing ValuesSlicingTop and BottomSamplingSemi-JoinsTakeaways
12. Working with Textual, Temporal, and Nested Data Types
StringString MethodsString ExamplesCategoricalCategorical MethodsCategorical ExamplesEnumTemporalTemporal MethodsTemporal ExamplesListList MethodsList ExamplesArrayArray MethodsArray ExamplesStructStruct MethodsStruct ExamplesTakeaways
13. Summarizing and Aggregating
Split, Apply, and CombineGroupBy ContextThe DescriptivesAdvanced MethodsRow-Wise AggregationsWindow Functions in Selection ContextDynamic GroupingRolling AggregationsUpsamplingTakeaways
14. Joining and Concatenating
JoiningJoin StrategiesJoining on Multiple ColumnsValidationInexact JoiningInexact Join StrategiesAdditional Fine-TuningUse Case: Marketing Campaign AttributionVertical and Horizontal ConcatenationVerticalHorizontalDiagonalAlignRelaxedStackingAppendingExtendingTakeaways
15. Reshaping
Wide Versus Long DataFramesPivot to a Wider DataFrameUnpivot to a Longer DataFrameTransposingExplodingPartition into Multiple DataFramesTakeaways
V. Advance
16. Visualizing Data
NYC Bike TripsBuilt-In Plotting with AltairIntroducing AltairMethods in the Plot NamespacesPlotting DataFramesToo Large to HandlePlotting Seriespandas-Like Plotting with hvPlotIntroducing hvPlotA First PlotMethods in the hvPlot Namespacepandas as BackupManual TransformationsChanging the Plotting BackendPlotting Points on a MapComposing PlotsAdding Interactive WidgetsPublication-Quality Graphics with plotnineIntroducing plotninePlots for ExplorationPlots for CommunicationStyling DataFrames With Great TablesTakeaways
17. Extending Polars
User-Defined Functions in PythonApplying a Function to ElementsApplying a Function to a SeriesApplying a Function to GroupsApplying a Function to an ExpressionApplying a Function to a DataFrame or LazyFrameRegistering Your Own NamespacePolars Plugins in RustPrerequisitesThe Anatomy of a Plugin ProjectThe PluginCompiling the PluginPerformance BenchmarkRegister ArgumentsUsing a Rust CrateUse Case: geoTakeaways
18. Polars Internals
Polars’ ArchitectureArrowMultithreaded Computations and SIMD OperationsThe String Data Type in MemoryChunkedArrays in SeriesQuery OptimizationLazyFrame Scan-Level OptimizationsOther OptimizationsChecking Your Expressionsmeta Namespace Overviewmeta Namespace ExamplesProfiling PolarsTests in PolarsComparing DataFrames and SeriesCommon AntipatternsUsing Brackets for Column SelectionMisusing CollectUsing Python Code in your Polars QueriesTakeaways
Appendix. Accelerating Polars with the GPU
NVIDIA RAPIDSInstalling the GPU EngineStep 1: Install WSL2 on WindowsStep 2: Install Ubuntu Linux on WSL2Step 3: Install Prerequisite Ubuntu Linux PackagesStep 4: Install the CUDA ToolkitStep 5: Install Python DependenciesStep 6: Test Your InstallationUsing the Polars GPU EngineConfigurationUnsupported FeaturesBenchmarking the Polars GPU EngineSolutionsQueries and DataMethodResults and DiscussionConclusionThe Future of Polars on the GPUTakeaways
Index
About the Authors

Content preview from Python Polars: The Definitive Guide

Chapter 1. Introducing Polars

In 2022, we found ourselves in the middle of a challenging project for a client. Their data pipeline was growing out of control. The codebase was a mix of Python and R, with the Python side relying heavily on the pandas package for wrangling all the data. Over time, three major issues emerged: the code was becoming increasingly difficult to maintain, performance had slowed to a crawl, and memory consumption had skyrocketed to over 500 GB. These problems were stifling productivity and pushing the limits of the infrastructure.

Back then, Polars was still relatively unknown, but we had experimented with it and seen some promising results. Convincing the rest of the team to migrate both the pandas and R code to Polars wasn’t easy, but once the switch was made, the impact was immediate. The new data pipeline was much faster, and the memory footprint shrank to just 40 GB—a fraction of what it used to be.

Thanks to this success, we’re fully convinced of the power of Polars. We wrote this book, Python Polars: The Definitive Guide, to share with you what we’ve learned and help you unlock the same potential in your data workflows.

In this introductory chapter, you’ll learn:

The main features of Polars
Why Polars is fast and popular
How Polars compares to other data processing packages
Why you should use Polars
How we have organized this book
Why we focus on Python Polars

In addition, we’ll demonstrate Polars’ capabilities through a showcase, ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781098156077Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Python Polars: The Definitive Guide

by Jeroen Janssens, Thijs Nieuwdorp

Chapter 1. Introducing Polars

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.