Solutions#

Profiling#

Exercises#

Exercise 1

What is profiling and why is it useful?

Exercise 2

What profiling tool times the execution of a cell in a Jupyter Notebook?

Exercise 3

Below are two approaches for filling up an empty NumPy array.

Which approach is faster and why?

def fill_array_approach_1(n):
    array = np.empty(1)

    for index in range(n):
        new_point = np.random.rand()
        array = np.append(array, new_point)

    return array
def fill_array_approach_2(n):
    array = np.empty(n)

    for index in range(len(array)):
        new_point = np.random.rand()
        array[index] = new_point

    return array

Exercise 4

Below are two methods that find two numbers from an array of unique integers that add up to a target sum.

If the target can’t be made, then an empty list is returned.

Each element in the array can only be used once.

Which method is faster and why?

def two_sum_brute_force(array, target):
    """
    A brute force approach to find two numbers from an array that add up to a target.

    Steps
    1. Loop through the array twice, adding up pairs of array elements.
    2. Compare each of these sums to the target.
    3. Return the pair that sums to the target, if one exists.
    """
    for index_one in range(len(array)):
        for index_two in range(index_one + 1, len(array)):
            if (
                array[index_one] + array[index_two] == target  # check sum of pair
                and index_one != index_two  # can't use the same element twice
            ):
                return [index_one, index_two]  # return the target pair

    return []  # return an empty list if the target pair isn't found
def two_sum_cache(array, target):
    """
    Use caching to find two numbers from an array that add up to a target.

    Steps
    1. Create a dictionary of cached differences relative to the target sum.
    2. Loop through the array once, adding each index and difference to the cache.
    3. If the required difference of a new array element is already in the cache,
       then you've found a matching pair, which you can return.
    """
    cache_differences = {}
    for index, element in enumerate(array):
        difference = (
            target - element
        )  # calculate the target difference for this element
        if difference in cache_differences:  # if we have the matching pair
            return [index, cache_differences[difference]]  # return the target pair
        cache_differences[element] = index  # if we don't have a match, add to the cache

    return []  # return an empty list if the target pair isn't found
import numpy as np

array = np.random.choice(1_000, 500, replace=False)
target = 250

Data Structures, Algorithms, and Libraries#

Questions#

Question 1

Which of the following uses less memory and how can you check?

  • np.float16

  • np.float32

  • np.float64

Exercises#

Exercise 1

What data structure would be suitable for finding or removing duplicate values?

a. List
b. Dictionary
c. Queue
d. Set

Test out your answer on the following array:

array = np.random.choice(100, 80)

Are there any other ways of doing it?

Exercise 2

In the exercise from the profiling lesson, we saw an example of two_sum i.e., finding two numbers from an array of unique integers that add up to a target sum.

What would be a good approach for generalising this sum of two numbers to three, four, or n numbers?

Vectorisation#

Questions#

Question 1

If something doesn’t vary for a given loop, should it be inside or outside of that loop?

Question 2

Is it possible to run the unvectorised my_function directly on the same inputs (i.e., all of x)?

Exercises#

Exercise 1

What is broadcasting?

Exercise 2

What is vectorisation?

Exercise 3

How would you replace the compute_reciprocals function below with a vectorised version?

def compute_reciprocals(array):
    """
    Divides 1 by an array of values.
    """
    output = np.empty(len(array))
    for i in range(len(array)):
        output[i] = 1.0 / array[i]

    return output
big_array = np.random.randint(1, 100, size=1_000_000)
%timeit compute_reciprocals(big_array)
1.33 s ± 7.26 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Exercise 4

Create your own vectorised ufunc that calculates the cube root of an array over all elements.

Compilers#

Questions#

Question 1

For the function below (fast_add):

@njit
def fast_add(x, y):
    return x + y

What will happen when it is called with:
fast_add(1, (2,))

Exercises#

Exercise 1

What is the default Python distribution?

  • Cython

  • PyPy

  • CPython

Exercise 2

Which Numba compilation mode has higher performance?

  • object

  • nopython

Exercise 3

How do I compile a function in Numba using no-python mode?

Exercise 4

What is the keyword argument that enables Numba compiled functions to run over multiple CPUs?

Exercise 5

Create your own Numba vectorised function that calculates the cube root of an array over all elements.