Arrays and numpy
Answers#
Part 1#
import numpy as np
Creating and Inspecting arrays#
Convert the following list into a numpy array
chlor_a_list = [0.3, 1.2, 0.8, 0.8, 1.1, 0.2, 0.4]
chlor_a = np.array(chlor_a_list)
Get the following values from the
chlor_a
array you made in the last problem:
The first value
The last value
chlor_a[0]
0.3
chlor_a[6]
# or
chlor_a[-1]
0.4
What is the data type of the
chlor_a
array you made?
# float (specifically a 64-bit float)
chlor_a.dtype
dtype('float64')
Use code to figure out how many items are in your array
chlor_a.shape
(7,)
Multiple dimensions#
What is the shape of the following array? Use the shape to determine how many elements are in the array
population_sparrows = np.array([[43, 24, 53, 24], [21, 32, 42, 32], [76, 23, 14, 12]])
population_sparrows.shape
(3, 4)
# Total elements is the size of each axis multiplied together
3*4
12
Use the
len()
function and the.shape
property to calculate the number of dimensions of thepopulation_sparrows
array
len(population_sparrows.shape)
2
Return the same result as problem 6 in a different way, using the
.ndim
property
population_sparrows.ndim
2
Get the value for the item in the last row and the last column of the
population_sparrows
array
population_sparrows[-1, -1]
12
Get a 4-number array that is a subset of number from the
population_sparrows
array using the slice operator:
population_sparrows[0:2, 1:3]
array([[24, 53],
[32, 42]])
Math and aggregations#
For the next few problems consider that the population_sparrows
array represents the populations of sparrows at 12 different reserach locations.
Let’s say our sparrow population grew, and the population of every location doubled. Multiple all the values in the array by 2. Make sure array is updated with the new values.
population_sparrows = population_sparrows*2
Later in the season a group of biologists adds a few sparrows to each population. The number of sparrows they added to each location is represented by the following array:
indiviuals_added = np.array([[4, 4, 5, 4], [1, 2, 2, 3], [6, 3, 4, 2]])
Calculate the updated population values with the new additions. Update the variable value.
population_sparrows = population_sparrows + indiviuals_added
Calculate the sum of the sparrows at all the locations
population_sparrows.sum()
832
Calculate the sum of sparrows over axis 1
population_sparrows.sum(axis=1)
array([305, 262, 265])
Consider the following array. If you ran an aggregation (Ex.
.max()
) and specified an axis, over what axis would you get 3 numbers as a result?
example_array = np.array([[4, 4, 5, 4], [1, 2, 2, 3], [6, 3, 4, 2]])
# axis 1
Part 2#
Note that the exact answers to the practice problems may not be the same on my sheet as on yours, because the random function used to generate the values won’t create the same array on your computer as on mine. Check that the code does the same thing, or run my code in your notebook to check the answer
Question 1#
reflectances = np.random.randint(0, high=100, size=(5, 30, 40))
A) Get a subset of at least 15 values from the middle of the reflectances
array.
reflectances[2:5, 10:20, 20]
array([[19, 50, 46, 70, 17, 37, 86, 60, 86, 26],
[43, 59, 60, 68, 90, 33, 81, 37, 19, 3],
[96, 30, 84, 68, 72, 1, 62, 40, 45, 11]])
B) Write a chunk of code to get the value in the center of the array. Make sure the code works for arrays of different sizes, so calculate the center index values using the properties of the array.
# Find the center index for each axis
axis0_indx = reflectances.shape[0] // 2 # floor division to ensure a whole number
axis1_indx = reflectances.shape[1] // 2
axis2_indx = reflectances.shape[2] // 2
# Get the value
print(reflectances[axis0_indx, axis1_indx, axis2_indx])
37
Question 2#
A) Add 10 to every value in the reflectances
array which has an index of 2 in the axis=0 position.
reflectances[2] + 10
array([[ 91, 81, 13, ..., 20, 45, 44],
[ 69, 48, 39, ..., 13, 102, 33],
[ 11, 19, 72, ..., 87, 75, 57],
...,
[ 79, 30, 14, ..., 100, 71, 19],
[ 49, 51, 100, ..., 47, 12, 36],
[ 55, 73, 32, ..., 54, 79, 20]])
Question 3#
A) What will be the shapes of the following arrays? First take a guess, then run the code.
array1 = np.array([1, 2, 3])
array2 = np.array([[1], [2], [3]])
# array1 -> (3,)
# array2 -> (3, 1)
B) Starting with an array of all zeros, what how will the output look different adding together starting_array
+ array1
vs. starting_array
+ array2
? Make your guess first, then run the code to compare to your expectation.
starting_array = np.zeros((3,3))
array1 = np.array([1, 2, 3])
array2 = np.array([[1], [2], [3]])
# Array is broadcast along axis 0
starting_array + array1
array([[1., 2., 3.],
[1., 2., 3.],
[1., 2., 3.]])
# Array is broadcast along axis 1
starting_array + array2
array([[1., 1., 1.],
[2., 2., 2.],
[3., 3., 3.]])
Question 4#
A) Going back to our reflectances
array, find the mean and standard deviation of all the values in the array.
reflectances = np.random.randint(0, high=100, size=(5, 30, 40))
reflectances.mean()
49.865833333333335
reflectances.std()
28.979409344778272
B) What is the maximum value of the 2D array at index 1 of axis 0?
example = np.array([[23, 43, 10], [3, 10, 8], [13, 16, 0]])
example[1].max()
10
Question 5#
A) The .astype()
method is a method that changes the datatype of the values in the array. It takes one argument - the new data type (which you type in without quotations).
Use the .astype()
method on the reflectances array to change the data type to float.
# Data types don't need quotations, they are one of the few words you can just type
int
int
reflectances.astype(float)
array([[[ 1., 87., 39., ..., 72., 40., 32.],
[76., 79., 83., ..., 27., 7., 67.],
[92., 78., 15., ..., 1., 63., 61.],
...,
[92., 66., 48., ..., 47., 81., 92.],
[26., 49., 81., ..., 2., 91., 85.],
[61., 68., 0., ..., 35., 70., 14.]],
[[29., 81., 91., ..., 47., 91., 93.],
[ 8., 44., 57., ..., 82., 95., 62.],
[93., 56., 56., ..., 14., 51., 92.],
...,
[19., 23., 72., ..., 90., 88., 73.],
[67., 17., 78., ..., 0., 15., 93.],
[15., 2., 4., ..., 62., 1., 82.]],
[[71., 2., 88., ..., 48., 42., 44.],
[24., 43., 99., ..., 38., 19., 17.],
[33., 3., 0., ..., 30., 95., 67.],
...,
[68., 18., 58., ..., 11., 68., 5.],
[82., 16., 91., ..., 21., 92., 76.],
[55., 75., 13., ..., 79., 87., 72.]],
[[55., 60., 53., ..., 91., 33., 84.],
[75., 44., 17., ..., 82., 53., 66.],
[71., 74., 20., ..., 48., 54., 19.],
...,
[15., 99., 37., ..., 93., 1., 16.],
[37., 91., 23., ..., 73., 3., 18.],
[55., 39., 10., ..., 18., 81., 53.]],
[[67., 60., 67., ..., 83., 10., 2.],
[50., 1., 77., ..., 20., 62., 65.],
[58., 87., 1., ..., 62., 78., 77.],
...,
[44., 1., 11., ..., 90., 75., 51.],
[65., 81., 19., ..., 7., 9., 80.],
[79., 27., 69., ..., 22., 52., 60.]]])
B) We have seen the use of axis
as a kwarg in the .max()
function. If you need to you can use multiple kwargs, which you seperate by commas.
The keepdims
kwarg maintains the output value within the original axis they were calculated over. So if you took the sum of an array over axis 1, you would recieve the output array with a vertically stacked output. keepdims
takes a boolean input - True if you would like to keep the dimensions and False if not.
Take the max of the reflectances
array, using the axis
kwarg with value 1 and the keepdims
kwarg set to True. Try it again with the kwarg set to False. Take the shape of both outputs and notice how they change.
reflectances.max(axis=1, keepdims=True)
array([[[97, 94, 98, 99, 99, 96, 91, 98, 99, 93, 99, 91, 98, 97, 99, 99,
96, 97, 99, 99, 99, 98, 93, 97, 99, 97, 98, 96, 98, 97, 98, 96,
95, 98, 89, 98, 99, 97, 91, 96]],
[[96, 99, 91, 99, 96, 96, 97, 98, 96, 96, 97, 98, 96, 99, 98, 97,
97, 95, 91, 99, 98, 97, 98, 98, 94, 99, 99, 99, 94, 93, 99, 99,
94, 98, 99, 97, 96, 90, 96, 93]],
[[99, 96, 99, 99, 93, 96, 98, 98, 99, 89, 94, 99, 94, 99, 94, 99,
90, 92, 99, 93, 93, 98, 99, 97, 99, 97, 99, 96, 92, 91, 96, 94,
99, 97, 85, 93, 97, 99, 97, 97]],
[[93, 99, 96, 98, 86, 99, 99, 99, 96, 99, 99, 99, 99, 98, 94, 98,
99, 99, 88, 97, 97, 95, 96, 99, 92, 99, 97, 92, 87, 99, 98, 95,
97, 97, 96, 97, 99, 99, 98, 89]],
[[99, 93, 97, 99, 97, 99, 99, 99, 99, 93, 98, 99, 99, 99, 98, 97,
97, 99, 93, 98, 95, 98, 94, 99, 96, 91, 99, 99, 98, 95, 98, 99,
94, 98, 99, 99, 99, 99, 97, 97]]])
reflectances.max(axis=1, keepdims=True).shape
(5, 1, 40)
reflectances.max(axis=1, keepdims=False).shape
(5, 40)
Note: The “keepdims” concept is a bit conceptually abstract and I couldn’t find a good visual illustration for it right now. If you leave this problem feeling a little iffy on “keepdims” that is totally fine and normal. The more important concept to be comfortable with is using multiple kwargs in a function/method.
Question 6#
example = np.array([[6, 10, 5, 9], [6, 9, 9, 11], [12, 14, 6, 3]])
A) Get a list of unique items in the example
array.
np.unique(example)
array([ 3, 5, 6, 9, 10, 11, 12, 14])
Google help: “numpy unique values in an array”, or this stackoverflow.
B) Pick one of the values in the array and determine how many times that value occurs in the array.
np.unique(example, return_counts=True)
# Then read the output to see that (for example) the value 6 occured 3 times
(array([ 3, 5, 6, 9, 10, 11, 12, 14]), array([1, 1, 3, 3, 1, 1, 1, 1]))
# OR
np.count_nonzero(example == 6)
3
Google help: “numpy number of occurances of a value” or this stackoverflow.
Question 7#
NaNs are an important data point when working with real data - rarely do you have a totally complete dataset.
You can make individual nans with np.NaN
:
np.NaN
nan
Look at the docs for the function np.full()
and create a new array of shape (4, 5, 6) filled with nan values.
np.full((4,5,6), np.NaN)
array([[[nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan]],
[[nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan]],
[[nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan]],
[[nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan]]])