Solutions

Solutions#

Profiling#

Exercises#

Exercise 1

What is profiling and why is it useful?

Solution 1

Profiling analyses your code in terms of speed or memory.

This helps identify where the bottlenecks are (why and where is it slow?) and how much potential there is for improvement.

Exercise 2

What profiling tool times the execution of a cell in a Jupyter Notebook?

Solution 2

%%timeit

Exercise 3

Below are two approaches for filling up an empty NumPy array.

Which approach is faster and why?

def fill_array_approach_1(n):
    array = np.empty(1)

    for index in range(n):
        new_point = np.random.rand()
        array = np.append(array, new_point)

    return array

def fill_array_approach_2(n):
    array = np.empty(n)

    for index in range(len(array)):
        new_point = np.random.rand()
        array[index] = new_point

    return array

Solution 3

Approach 2 is about 10x faster.

For example, over 1,000 data points:

%timeit fill_array_approach_1(1_000)
2.7 ms ± 177 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit fill_array_approach_2(1_000)
251 µs ± 4.75 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

This is because approach 2 creates one empty array of the required size in memory once, before filling it up in a loop.

In contrast, approach 1 creates a new array each time by appending the new point.

Exercise 4

Below are two methods that find two numbers from an array of unique integers that add up to a target sum.

If the target can’t be made, then an empty list is returned.

Each element in the array can only be used once.

Which method is faster and why?

def two_sum_brute_force(array, target):
    """
    A brute force approach to find two numbers from an array that add up to a target.

    Steps
    1. Loop through the array twice, adding up pairs of array elements.
    2. Compare each of these sums to the target.
    3. Return the pair that sums to the target, if one exists.
    """
    for index_one in range(len(array)):
        for index_two in range(index_one + 1, len(array)):
            if (
                array[index_one] + array[index_two] == target  # check sum of pair
                and index_one != index_two  # can't use the same element twice
            ):
                return [index_one, index_two]  # return the target pair

    return []  # return an empty list if the target pair isn't found

def two_sum_cache(array, target):
    """
    Use caching to find two numbers from an array that add up to a target.

    Steps
    1. Create a dictionary of cached differences relative to the target sum.
    2. Loop through the array once, adding each index and difference to the cache.
    3. If the required difference of a new array element is already in the cache,
       then you've found a matching pair, which you can return.
    """
    cache_differences = {}
    for index, element in enumerate(array):
        difference = (
            target - element
        )  # calculate the target difference for this element
        if difference in cache_differences:  # if we have the matching pair
            return [index, cache_differences[difference]]  # return the target pair
        cache_differences[element] = index  # if we don't have a match, add to the cache

    return []  # return an empty list if the target pair isn't found

import numpy as np

array = np.random.choice(1_000, 500, replace=False)
target = 250

Solution 4

The cached version is faster.

This is because it only needs to loop through the array once.

So, for the brute force method, the time complexity is O(n^2) and the space complexity is O(1).

And for the cached method, the time complexity is O(n) and the space complexity is O(n). This is because the cache itself (i.e., the dictionary) takes up space up to the size of the array.

For example, using the array and target above:

%timeit two_sum_brute_force(array, target)
272 µs ± 12.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

%timeit two_sum_cache(array, target)
31.8 µs ± 8.57 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Data Structures, Algorithms, and Libraries#

Questions#

Question 1

Which of the following uses less memory and how can you check?

np.float16
np.float32
np.float64

Solution 1

Single precision floats have 32 bits (i.e., np.single or np.float32).

This is equal to 4 bytes, as 1 byte is 8 bits.

You can check this using:

np.float32(1.0).nbytes
4

Half precision floats have (you guessed it) half of this (i.e., np.half or np.float16):

np.float16(1.0).nbytes
2

And double precision floats (i.e., np.double or np.float64):

np.float64(1.0).nbytes
8

In some situations, you may be able to significantly reduce your memory usage by using half or single precision arrays.

Exercises#

Exercise 1

What data structure would be suitable for finding or removing duplicate values?

a. List
b. Dictionary
c. Queue
d. Set

Test out your answer on the following array:

array = np.random.choice(100, 80)

Are there any other ways of doing it?

Solution 1

A suitable data structure for finding or removing duplicate values is:

d. Set

This is because sets are unordered collections of distinct hashable objects.

You can call it via set(array).

Another approach would be to use NumPy features. For example, when creating a random array, you can specific whether the random choice gets replaced or not via:

array = np.random.choice(100, 80, replace=False)

Exercise 2

In the exercise from the profiling lesson, we saw an example of two_sum i.e., finding two numbers from an array of unique integers that add up to a target sum.

What would be a good approach for generalising this sum of two numbers to three, four, or n numbers?

Solution 2

Many algorithms are already implemented for you.

For example, the algorithm library has an n_sum method.

This generalises the function to any value of n.

For example, here is a “six_sum”:

from algorithms.arrays import n_sum
array = np.random.choice(100, 80, replace=False)
target = 23
n_sum(6, array, target)
[[0, 1, 3, 4, 6, 9],
[0, 1, 3, 4, 7, 8],
[0, 1, 3, 5, 6, 8],
[0, 1, 4, 5, 6, 7]]

Try it for yourself or any one of the other many examples.

Vectorisation#

Questions#

Question 1

If something doesn’t vary for a given loop, should it be inside or outside of that loop?

Solution 1

Loop-invariants should be outside of loops.

This is because loops are slow in Python (CPython, default interpreter).

Loops are slow because loops type−check and dispatch functions per cycle.

Try to avoid them where you can (e.g., by using NumPy ufuncs, aggregations, broadcasting, etc.).

Question 2

Is it possible to run the unvectorised my_function directly on the same inputs (i.e., all of x)?

Solution 2

No, a TypeError is returned.

As the function has not yet been vectorised, it cannot operate across arrays with more than 1 element.

To check that it does work for 1 element, you could try:
my_function(x[0], mean, sigma)

or to run it in a loop:

for val in x:
my_function(val, mean, sigma)

Exercises#

Exercise 1

What is broadcasting?

Solution 1

Broadcasting allows for operations with different shaped arrays.

Exercise 2

What is vectorisation?

Solution 2

Vectorisation allows the code to operate on multiple array elements at once, rather than looping through them one at a time.

Exercise 3

How would you replace the compute_reciprocals function below with a vectorised version?

def compute_reciprocals(array):
    """
    Divides 1 by an array of values.
    """
    output = np.empty(len(array))
    for i in range(len(array)):
        output[i] = 1.0 / array[i]

    return output

big_array = np.random.randint(1, 100, size=1_000_000)
%timeit compute_reciprocals(big_array)

1.3 s ± 1.92 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Solution 3

You can either use the np.divide ufunc as follows:

%timeit np.divide(1.0, big_array)
1.29 ms ± 124 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Or you could just use /, which works on every element by overloading the operator (similar to broadcasting):

%timeit (1.0 / big_array)
1.19 ms ± 103 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Exercise 4

Create your own vectorised ufunc that calculates the cube root of an array over all elements.

Hint

Here is an unvectorised version (i.e., it has a for loop):

def my_cube_root(array):
    output = np.empty(len(array))
    for i in range(len(array)):
        output[i] = array[i] ** (1 / 3)

    return output

%timeit my_cube_root(big_array)
1.77 s ± 152 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Solution 4

Option 1:
You could use the np.vectorize decorator:

import math
from numpy import vectorize

@vectorize
def my_cube_root(array):
return math.pow(array, 1/3)

%timeit my_cube_root(big_array)
213 ms ± 33.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Option 2:
You could overload the ** and / symbols:

def my_cube_root(array):
return array ** (1 / 3)

%timeit my_cube_root(big_array)
36 ms ± 864 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Option 3:
You could use the optimised ufuncs from NumPy or SciPy (recommended):

%timeit np.cbrt(big_array)
13.9 ms ± 1.51 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

from scipy.special import cbrt
%timeit cbrt(big_array)
18.9 ms ± 1.9 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

Compilers#

Questions#

Question 1

For the function below (fast_add):

@njit
def fast_add(x, y):
return x + y

What will happen when it is called with:
fast_add(1, (2,))

Solution 1

A TypingError is returned.

This is because Numba is trying to compile a function that adds an integer to a tuple.

Take care with types.

Ensure that the function works first, before adding Numba to make it faster.

Exercises#

Exercise 1

What is the default Python distribution?

Cython
PyPy
CPython

Solution 1

CPython

Exercise 2

Which Numba compilation mode has higher performance?

object
nopython

Solution 2

nopython

Exercise 3

How do I compile a function in Numba using no-python mode?

Solution 3

Wrap the function in the @njit (or @jit(nopython=True)) decorator.

Exercise 4

What is the keyword argument that enables Numba compiled functions to run over multiple CPUs?

Solution 4

parallel=True

Exercise 5

Create your own Numba vectorised function that calculates the cube root of an array over all elements.

Hint

Have a look at the similar exercise from the Vectorisation lesson.

Solution 5

Using the Numba @vectorize decorator and the math library:

import math
from numba import vectorize # I'm the only change from the NumPy ufunc exercise

@vectorize
def my_cube_root(array):
return math.pow(array, 1/3)

%timeit my_cube_root(big_array)
27.7 ms ± 643 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Numba is a nice way to get performant numerical code.

Solutions

Contents

Solutions#

Profiling#

Exercises#

Data Structures, Algorithms, and Libraries#

Questions#

Exercises#

Vectorisation#

Questions#

Exercises#

Compilers#

Questions#

Exercises#