O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Python High Performance - Second Edition

Book Description

Learn how to use Python to create efficient applications

About This Book

  • Identify the bottlenecks in your applications and solve them using the best profiling techniques
  • Write efficient numerical code in NumPy, Cython, and Pandas
  • Adapt your programs to run on multiple processors and machines with parallel programming

Who This Book Is For

The book is aimed at Python developers who want to improve the performance of their application. Basic knowledge of Python is expected

What You Will Learn

  • Write efficient numerical code with the NumPy and Pandas libraries
  • Use Cython and Numba to achieve native performance
  • Find bottlenecks in your Python code using profilers
  • Write asynchronous code using Asyncio and RxPy
  • Use Tensorflow and Theano for automatic parallelism in Python
  • Set up and run distributed algorithms on a cluster using Dask and PySpark

In Detail

Python is a versatile language that has found applications in many industries. The clean syntax, rich standard library, and vast selection of third-party libraries make Python a wildly popular language.

Python High Performance is a practical guide that shows how to leverage the power of both native and third-party Python libraries to build robust applications.

The book explains how to use various profilers to find performance bottlenecks and apply the correct algorithm to fix them. The reader will learn how to effectively use NumPy and Cython to speed up numerical code. The book explains concepts of concurrent programming and how to implement robust and responsive applications using Reactive programming. Readers will learn how to write code for parallel architectures using Tensorflow and Theano, and use a cluster of computers for large-scale computations using technologies such as Dask and PySpark.

By the end of the book, readers will have learned to achieve performance and scale from their Python applications.

Style and approach

A step-by-step practical guide filled with real-world use cases and examples

Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the code file.

Table of Contents

  1. Customer Feedback
  2. Preface
    1. What this book covers
    2. What you need for this book
    3. Who this book is for
    4. Conventions
    5. Reader feedback
    6. Customer support
      1. Downloading the example code
      2. Downloading the color images of this book
      3. Errata
      4. Piracy
      5. Questions
  3. Benchmarking and Profiling
    1. Designing your application
    2. Writing tests and benchmarks
      1. Timing your benchmark
    3. Better tests and benchmarks with pytest-benchmark
    4. Finding bottlenecks with cProfile
    5. Profile line by line with line_profiler
    6. Optimizing our code
    7. The dis module
    8. Profiling memory usage with memory_profiler
    9. Summary
  4. Pure Python Optimizations
    1. Useful algorithms and data structures
      1. Lists and deques
      2. Dictionaries
        1. Building an in-memory search index using a hash map
      3. Sets
      4. Heaps
      5. Tries
    2. Caching and memoization
      1. Joblib
    3. Comprehensions and generators
    4. Summary
  5. Fast Array Operations with NumPy and Pandas
    1. Getting started with NumPy
      1. Creating arrays
      2. Accessing arrays
      3. Broadcasting
      4. Mathematical operations
      5. Calculating the norm
    2. Rewriting the particle simulator in NumPy
    3. Reaching optimal performance with numexpr
    4. Pandas
      1. Pandas fundamentals
        1. Indexing Series and DataFrame objects
      2. Database-style operations with Pandas
        1. Mapping
        2. Grouping, aggregations, and transforms
        3. Joining
    5. Summary
  6. C Performance with Cython
    1. Compiling Cython extensions
    2. Adding static types
      1. Variables
      2. Functions
      3. Classes
    3. Sharing declarations
    4. Working with arrays
      1. C arrays and pointers
      2. NumPy arrays
      3. Typed memoryviews
    5. Particle simulator in Cython
    6. Profiling Cython
    7. Using Cython with Jupyter
    8. Summary
  7. Exploring Compilers
    1. Numba
      1. First steps with Numba
      2. Type specializations
      3. Object mode versus native mode
      4. Numba and NumPy
        1. Universal functions with Numba
        2. Generalized universal functions
      5. JIT classes
      6. Limitations in Numba
    2. The PyPy project
      1. Setting up PyPy
      2. Running a particle simulator in PyPy
    3. Other interesting projects
    4. Summary
  8. Implementing Concurrency
    1. Asynchronous programming
      1. Waiting for I/O
      2. Concurrency
      3. Callbacks
      4. Futures
      5. Event loops
    2. The asyncio framework
      1. Coroutines
      2. Converting blocking code into non-blocking code
    3. Reactive programming
      1. Observables
      2. Useful operators
      3. Hot and cold observables
      4. Building a CPU monitor
    4. Summary
  9. Parallel Processing
    1. Introduction to parallel programming
      1. Graphic processing units
    2. Using multiple processes
      1. The Process and Pool classes
      2. The Executor interface
      3. Monte Carlo approximation of pi
      4. Synchronization and locks
    3. Parallel Cython with OpenMP
    4. Automatic parallelism
      1. Getting started with Theano
        1. Profiling Theano
      2. Tensorflow
      3. Running code on a GPU
    5. Summary
  10. Distributed Processing
    1. Introduction to distributed computing
      1. An introduction to MapReduce
    2. Dask
      1. Directed Acyclic Graphs
      2. Dask arrays
      3. Dask Bag and DataFrame
      4. Dask distributed
        1. Manual cluster setup
    3. Using PySpark
      1. Setting up Spark and PySpark
      2. Spark architecture
      3. Resilient Distributed Datasets
      4. Spark DataFrame
    4. Scientific computing with mpi4py
    5. Summary
  11. Designing for High Performance
    1. Choosing a suitable strategy
      1. Generic applications
      2. Numerical code
      3. Big data
    2. Organizing your source code
    3. Isolation, virtual environments, and containers
      1. Using conda environments
      2. Virtualization and Containers
        1. Creating docker images
    4. Continuous integration
    5. Summary