Summary#

In this workshop, we covered:

1. Understand how to profile Python code and identify bottlenecks

  • Measure the time of cells, functions, and programs to find bottlenecks e.g., using timeit and line_profiler.

  • Visualise the profiled code e.g., using SnakeViz and pyinstrument.

  • Log profiling information e.g., using Eliot.

  • Consider how fast the code could go e.g., Big O notation.

2. Understand how to choose the most appropriate data structure, algorithm, and libraries for a problem

  • Make use of the built-in functions e.g, use len rather than counting the items in an object in a loop.

  • Use appropriate data structures e.g., append to lists rather than concatenating, use dictionaries as fast to search look-ups, cache results in dictionaries to reduce repeated calculations.

  • Make use of the standard library (optimised in C) e.g., the math module.

  • See whether there is an algorithm or library that already optimally solves your problem e.g., faster sorting algorithms.

3. Improve the execution time of Python code using:

  • Vectorisation

    • Take advantage of broadcasting for different shaped arrays.

    • Use vectorised functions where you can e.g., NumPy ufuncs.

  • Compilers

    • Speed up numerical functions with the Numba @njit (nopython) compiler.

  • Parallelisation

    • Use Dask or Ray to parallelise your numerical work.

    • Test locally (single machine) first before moving to a distributed machine (high-performance computer), ensuring that the code is parallelising correctly.

    • Use diagnostics to understand your parallel code (e.g., Dask’s dashboard, qacct).

  • GPUs

4. Understand when to use each technique

  • Try out, explore, and practise with these options for yourself.

  • Read the documentation for more information on things you’re interested in.