Summary
Summary#
In this workshop, we covered:
1. Understand how to profile Python code and identify bottlenecks
Measure the time of cells, functions, and programs to find bottlenecks e.g., using
timeit
andline_profiler
.Visualise the profiled code e.g., using
SnakeViz
andpyinstrument
.Log profiling information e.g., using
Eliot
.Consider how fast the code could go e.g., Big O notation.
2. Understand how to choose the most appropriate data structure, algorithm, and libraries for a problem
Make use of the built-in functions e.g, use
len
rather than counting the items in an object in a loop.Use appropriate data structures e.g., append to lists rather than concatenating, use dictionaries as fast to search look-ups, cache results in dictionaries to reduce repeated calculations.
Make use of the standard library (optimised in C) e.g., the
math
module.See whether there is an algorithm or library that already optimally solves your problem e.g., faster sorting algorithms.
3. Improve the execution time of Python code using:
Vectorisation
Take advantage of broadcasting for different shaped arrays.
Use vectorised functions where you can e.g., NumPy ufuncs.
Compilers
Speed up numerical functions with the Numba
@njit
(nopython) compiler.
Parallelisation
GPUs
Use CUDA/Numba, RAPIDS, and JAX to write custom data science code for CUDA GPUs.
4. Understand when to use each technique
Try out, explore, and practise with these options for yourself.
Read the documentation for more information on things you’re interested in.