Foreword
Preface
1.Understanding Performant Python
The Fundamental Computer System
Computing Units
Memory Units
Communications Layers
Putting the Fundamental Elements Together
Idealized Computing Versus the Python Virtual Machine
So Why Use Python
How to Be a Highly Performant Programmer
Good Working Practices
Some Thoughts on Good Notebook Practice
Getting the Joy Back into Your Work
2.Profiling to Find Bottlenecks.
Profiling Efficiently
Introducing the Julia Set
Calculating the Full Julia Set
Simple Approaches to Timing—print and a Decorator
Simple Timing Using the Unix time Command
Using the cProfile Module
Visualizing cProfile Output with SnakeViz
Using line_profiler for Line-by-Line Measurements
Using memory_profiler to Diagnose Memory Usage
Introspecting an Existing Process with PySpy
Bytecode: Under the Hood
Using the dis Module to Examine CPython Bytecode
Different Approaches, Different Co mplexity
Unit Testing During Optimization to Maintain Correctness
No-op @profile Decorator
Strategies to Profile Your Code Successfully
Wrap-Up
3.Lists and Tuples
A More Efficient Search
Lists Versus Tuples
Lists as Dynamic Arrays
Tuples as Static Arrays
Wrap-Up
4.Dictionaries and Sets.
How Do Dictionaries and Sets Work
Inserting and Retrieving
Deletion
Resizing
Hash Functions and Entropy
Dictionaries and Namespaces
Wrap-Up
5.Iterators and Generators.
Iterators for Infinite Series
Lazy Generator Evaluation
Wrap-Up
6.Matrix and Vector Computation.
Introduction to the Problem
Aren't Python Lists Good Enough
Problems with Allocating Too Much
Memory Fragmentation
Understanding perf
Making Decisions with perf's Output
Enter numpy
Applying numpy to the Diffusion Problem
Memory Allocations and In-Place Operations
Selective Optimizations: Finding What Needs to Be Fixed
numexpr: Making In-Place Operations Faster and Easier
A Cautionary Tale: Verify “Optimizations"(scipy)
Lessons from Matrix Optimizations
Pandas
Pandas's Internal Model
Applying a Function to Many Rows of Data
Building DataFrames and Series from Partial Results Rather than
Concatenating
There's More Than One (and Possibly a Faster) Way to Do a Job
Advice for Effective Pandas Development asu
Wrap-Up
7.Compiling to C.
What Sort of Speed Gains Are Possible
JIT Versus AOT Compilers
Why Does Type Information Help the Code Run Faster
Using a C Compiler
Reviewing the Julia Set Example
Cython
Compiling a Pure Python Version Using Cython
pyximport
Cython Annotations to Analyze a Block of Code
Adding Some Type Annotations
Cython and numpy
Parallelizing the Solution with OpenMP on One Machine
Numba
Numba to Compile NumPy for Pandas
PyPy
Garbage Collection Differences
Running PyPy and Installing Modules
A Summary of Speed Improvements
When to Use Each Technology
Other Upcoming Projects
Graphics Processing Units (GPUs)
Dynamic Graphs: PyTorch
Basic GPU Profiling
Performance Considerations of GPUs
When to Use GPUs
Foreign Function Interfaces
ctypes
cffi
f2py
CPython Module
Wrap-Up
8.Asynchronous l/0.
9.The multiprocessing Module.
10.Clusters and Job Queues
11.Using Less RAM.
12.Lessons from the Field.
Index