# Arrays and `numpy` Answers

## Part 1

In [1]:
import numpy as np

### Creating and Inspecting arrays

1. Convert the following list into a numpy array

In [2]:
chlor_a_list = [0.3, 1.2, 0.8, 0.8, 1.1, 0.2, 0.4]

In [3]:
chlor_a = np.array(chlor_a_list)

2. Get the following values from the `chlor_a` array you made in the last problem:
* The first value
* The last value

In [4]:
chlor_a[0]

0.3

In [5]:
chlor_a[6]
# or
chlor_a[-1]

0.4

3. What is the data type of the `chlor_a` array you made?

In [6]:
# float (specifically a 64-bit float)
chlor_a.dtype

dtype('float64')

4. Use code to figure out how many items are in your array

In [7]:
chlor_a.shape

(7,)

### Multiple dimensions

5. What is the shape of the following array?  Use the shape to determine how many elements are in the array

In [8]:
population_sparrows = np.array([[43, 24, 53, 24], [21, 32, 42, 32], [76, 23, 14, 12]])

In [9]:
population_sparrows.shape

(3, 4)

In [10]:
# Total elements is the size of each axis multiplied together
3*4

12

6. Use the `len()` function and the `.shape` property to calculate the  number of dimensions of the `population_sparrows` array

In [11]:
len(population_sparrows.shape)

2

7. Return the same result as problem 6 in a different way, using the `.ndim` property

In [12]:
population_sparrows.ndim

2

8. Get the value for the item in the last row and the last column of the `population_sparrows` array

In [13]:
population_sparrows[-1, -1]

12

9. Get a 4-number array that is a subset of number from the `population_sparrows` array using the slice operator `:`

In [14]:
population_sparrows[0:2, 1:3]

array([[24, 53],
       [32, 42]])

### Math and aggregations

For the next few problems consider that the `population_sparrows` array represents the populations of sparrows at 12 different reserach locations.

10. Let's say our sparrow population grew, and the population of every location doubled.  Multiple all the values in the array by 2.  Make sure array is updated with the new values.

In [15]:
population_sparrows = population_sparrows*2

10. Later in the season a group of biologists adds a few sparrows to each population.  The number of sparrows they added to each location is represented by the following array:

In [16]:
indiviuals_added = np.array([[4, 4, 5, 4], [1, 2, 2, 3], [6, 3, 4, 2]])

Calculate the updated population values with the new additions.  Update the variable value.

In [17]:
population_sparrows = population_sparrows + indiviuals_added

11. Calculate the sum of the sparrows at all the locations

In [18]:
population_sparrows.sum()

832

12. Calculate the sum of sparrows over axis 1

In [19]:
population_sparrows.sum(axis=1)

array([305, 262, 265])

13. Consider the following array. If you ran an aggregation (Ex. `.max()`) and specified an axis, over what axis would you get 3 numbers as a result?

In [20]:
example_array = np.array([[4, 4, 5, 4], [1, 2, 2, 3], [6, 3, 4, 2]])

In [21]:
# axis 1

## Part 2

**Note that the exact answers to the practice problems may not be the same on my sheet as on yours, because the random function used to generate the values won't create the same array on your computer as on mine. Check that the code does the same thing, or run my code in your notebook to check the answer**

### Question 1

In [22]:
reflectances = np.random.randint(0, high=100, size=(5, 30, 40))

**A)** Get a subset of at least 15 values from the middle of the `reflectances` array.

In [23]:
reflectances[2:5, 10:20, 20]

array([[67, 83, 54, 68, 13, 69, 69, 84, 40, 94],
       [92, 37, 77, 37, 54,  2, 22, 93, 13, 67],
       [99,  0, 37, 63, 83, 26, 94, 21, 48, 68]])

**B)** Write a chunk of code to get the value in the center of the array.  Make sure the code works for arrays of different sizes, so calculate the center index values using the properties of the array.

In [24]:
# Find the center index for each axis
axis0_indx = reflectances.shape[0] // 2 # floor division to ensure a whole number
axis1_indx = reflectances.shape[1] // 2 
axis2_indx = reflectances.shape[2] // 2
# Get the value
print(reflectances[axis0_indx, axis1_indx, axis2_indx])

69


### Question 2

**A)** Add 10 to every value in the `reflectances` array which has an index of 2 in the axis=0 position.

In [25]:
reflectances[2] + 10

array([[ 45,  61,  57, ..., 107,  97,  15],
       [105,  79,  34, ...,  18,  66,  13],
       [100,  72,  75, ...,  54, 100,  62],
       ...,
       [ 34,  92,  17, ...,  85,  44,  39],
       [ 93,  65,  84, ...,  16,  17,  66],
       [ 80,  23,  34, ..., 104,  75,  52]])

### Question 3
**A)** What will be the shapes of the following arrays?  First take a guess, then run the code.

```
array1 = np.array([1, 2, 3])
array2 = np.array([[1], [2], [3]])
```

In [26]:
# array1 -> (3,)
# array2 -> (3, 1)

**B)** Starting with an array of all zeros, what how will the output look different adding together `starting_array` + `array1` vs. `starting_array` + `array2`?  Make your guess first, then run the code to compare to your expectation.

In [27]:
starting_array = np.zeros((3,3))

In [28]:
array1 = np.array([1, 2, 3])
array2 = np.array([[1], [2], [3]])

In [29]:
# Array is broadcast along axis 0
starting_array + array1

array([[1., 2., 3.],
       [1., 2., 3.],
       [1., 2., 3.]])

In [30]:
# Array is broadcast along axis 1
starting_array + array2

array([[1., 1., 1.],
       [2., 2., 2.],
       [3., 3., 3.]])

### Question 4

**A)** Going back to our `reflectances` array, find the mean and standard deviation of all the values in the array.

In [31]:
reflectances = np.random.randint(0, high=100, size=(5, 30, 40))

In [32]:
reflectances.mean()

49.513333333333335

In [33]:
reflectances.std()

28.948019314319627

**B)** What is the maximum value of the 2D array at index 1 of axis 0?

In [34]:
example = np.array([[23, 43, 10], [3, 10, 8], [13, 16, 0]])

In [35]:
example[1].max()

10

### Question 5

**A)** The `.astype()` method is a method that changes the datatype of the values in the array.  It takes one argument - the new data type (which you type in without quotations).

Use the `.astype()` method on the reflectances array to change the data type to float.

In [36]:
# Data types don't need quotations, they are one of the few words you can just type
int

int

In [37]:
reflectances.astype(float)

array([[[76., 67., 58., ..., 21., 17., 95.],
        [22., 33., 55., ..., 43., 71., 38.],
        [ 3., 15., 63., ..., 33., 40., 54.],
        ...,
        [53., 67., 58., ..., 58., 87., 91.],
        [20., 10., 43., ..., 56., 75., 50.],
        [86., 31.,  8., ..., 96., 68., 92.]],

       [[66., 41., 28., ..., 69., 60., 70.],
        [36.,  8., 28., ..., 66., 40., 28.],
        [25., 90., 36., ..., 62., 68., 92.],
        ...,
        [74., 81., 72., ..., 45., 30., 51.],
        [89., 42., 66., ...,  1., 38., 21.],
        [76., 58., 57., ..., 68., 24.,  3.]],

       [[ 7., 33., 95., ..., 41., 32., 37.],
        [49., 18.,  3., ...,  2., 29., 23.],
        [75., 45., 19., ..., 15., 79., 49.],
        ...,
        [55., 81., 86., ...,  1.,  9., 95.],
        [40., 71., 69., ..., 63., 32.,  0.],
        [30., 25., 95., ..., 53., 39., 76.]],

       [[10., 84., 19., ..., 40., 57., 45.],
        [86., 56., 46., ..., 70., 67., 74.],
        [19., 73., 88., ..., 56., 92., 48.],
        ..

**B)** We have seen the use of `axis` as a kwarg in the `.max()` function.  If you need to you can use multiple kwargs, which you seperate by commas.

The `keepdims` kwarg maintains the output value within the original axis they were calculated over.  So if you took the sum of an array over axis 1, you would recieve the output array with a vertically stacked output.  `keepdims` takes a boolean input - True if you would like to keep the dimensions and False if not.

Take the max of the `reflectances` array, using the `axis` kwarg with value 1 and the `keepdims` kwarg set to True.  Try it again with the kwarg set to False.  Take the shape of both outputs and notice how they change.

In [38]:
reflectances.max(axis=1, keepdims=True)

array([[[96, 98, 99, 92, 99, 96, 94, 97, 99, 94, 96, 96, 96, 99, 97, 95,
         98, 98, 91, 97, 96, 96, 89, 92, 95, 97, 92, 99, 96, 95, 98, 98,
         97, 90, 99, 95, 93, 99, 96, 95]],

       [[98, 97, 98, 99, 93, 92, 92, 96, 95, 93, 99, 98, 99, 96, 98, 98,
         93, 98, 99, 99, 98, 99, 97, 96, 99, 97, 94, 97, 98, 99, 88, 99,
         92, 97, 99, 90, 93, 95, 95, 98]],

       [[99, 95, 97, 99, 97, 95, 98, 98, 98, 99, 97, 98, 98, 98, 99, 97,
         84, 99, 98, 90, 93, 97, 95, 92, 99, 85, 99, 94, 90, 92, 97, 93,
         98, 99, 99, 99, 95, 98, 97, 95]],

       [[99, 97, 88, 93, 96, 98, 97, 99, 93, 92, 96, 98, 99, 95, 97, 97,
         90, 97, 94, 95, 97, 98, 99, 97, 94, 94, 94, 97, 98, 99, 94, 94,
         99, 98, 90, 99, 93, 98, 92, 98]],

       [[89, 90, 98, 98, 96, 97, 97, 99, 97, 98, 96, 96, 96, 98, 99, 96,
         98, 97, 93, 99, 94, 97, 99, 99, 98, 96, 99, 95, 99, 97, 96, 99,
         94, 93, 99, 96, 95, 96, 82, 98]]])

In [39]:
reflectances.max(axis=1, keepdims=True).shape

(5, 1, 40)

In [40]:
reflectances.max(axis=1, keepdims=False).shape

(5, 40)

_Note: The "keepdims" concept is a bit conceptually abstract and I couldn't find a good visual illustration for it right now.  If you leave this problem feeling a little iffy on "keepdims" that is totally fine and normal.  The more important concept to be comfortable with is using multiple kwargs in a function/method._

### Question 6

In [41]:
example = np.array([[6, 10, 5, 9], [6, 9, 9, 11], [12, 14, 6, 3]])

**A)** Get a list of unique items in the `example` array.

In [42]:
np.unique(example)

array([ 3,  5,  6,  9, 10, 11, 12, 14])

_**Google help:** "numpy unique values in an array", or [this stackoverflow](https://stackoverflow.com/questions/16970982/find-unique-rows-in-numpy-array)._

**B)** Pick one of the values in the array and determine how many times that value occurs in the array.

In [43]:
np.unique(example, return_counts=True)
# Then read the output to see that (for example) the value 6 occured 3 times

(array([ 3,  5,  6,  9, 10, 11, 12, 14]),
 array([1, 1, 3, 3, 1, 1, 1, 1], dtype=int64))

In [44]:
# OR

In [45]:
np.count_nonzero(example == 6)

3

_**Google help:** "numpy number of occurances of a value" or [this stackoverflow](https://stackoverflow.com/questions/28663856/how-to-count-the-occurrence-of-certain-item-in-an-ndarray)._

### Question 7

NaNs are an important data point when working with real data - rarely do you have a totally complete dataset.

You can make individual nans with `np.NaN`:

In [46]:
np.NaN

nan

Look at [the docs](https://numpy.org/devdocs/reference/generated/numpy.full.html) for the function `np.full()` and create a new array of shape (4, 5, 6) filled with nan values.

In [47]:
np.full((4,5,6), np.NaN)

array([[[nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan]],

       [[nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan]],

       [[nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan]],

       [[nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan]]])