NumPy and 1D Arrays

Notes:


import numpy as np

Notes:

Although we have not formally introduced you to NumPy, the name may sound familiar since we’ve been subtly hinting at its existence for a little while now.

In the last Module, we’ve had you import this library for practice.

So what is NumPy?


What is NumPy?

NumPy -> “Numerical Python extensions”.

NumPy offers:

  • Arrays

  • Mathematical Constants

  • Mathematical Functions

Notes:

The name NumPy is derived from “Numerical Python extensions”.

NumPy is a Python library used primarily for computing involving numbers. It is especially useful as it provides a multidimensional array object, called an array.

In addition, NumPy also offers numerous other mathematical functions used in the domain of Linear Algebra and Calculus.


So What is an Array?

my_list = [1, 2, 3, 4, 5]
my_list
[1, 2, 3, 4, 5]
my_array = np.array((1, 2, 3, 4, 5))
my_array
array([1, 2, 3, 4, 5])
type(my_array)
<class 'numpy.ndarray'>

Notes:

A NumPy array is somewhat like a list.

They are considered their own data type.

We can see this by using type on an array.


my_list = [1,"hi"]
my_array = np.array((1, "hi"))
my_array
array(['1', 'hi'], dtype='<U21')

Notes:

Soon, we’ll start to see that although lists and arrays appear quite similar, they have some key differences.

A list can contain multiple data types, while an array cannot.

In this case, 1 was converted to a '1' in quotations, which signifies that it is now a string.


Creating 1D Arrays

my_array = np.array([1, 2, 3, 4])
my_array
array([1, 2, 3, 4])
np.zeros(10)
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
np.ones(4)
array([1., 1., 1., 1.])

Notes:

We can make arrays from lists as well as tuples.

There are also several built-in NumPy functions that create different arrays with patterns and requirements.

np.zeros() will create an array containing 0 for each element, and the input argument specifies the size.

Here we specified 10, so our array has 10 elements.

Similarly, np.ones() does the same thing except with an array of elements with 1 values.

Now we’ve specified 4 as the input, and so this array has 4 elements.


np.arange(5)
array([0, 1, 2, 3, 4])
np.arange(0, 10, 2) 
array([0, 2, 4, 6, 8])

Notes:

np.arange() similarly to range() can take 1, 2 or 3 input arguments and will produce an array in a similar way that range() produces a sequence.

If there are 3 input arguments, the first 2 are where the interval values start and stop respectively, and the third argument gives the step size between values.


np.linspace(1,5,10)
array([1.        , 1.44444444, 1.88888889, 2.33333333, 2.77777778, 3.22222222, 3.66666667, 4.11111111, 4.55555556, 5.        ])
np.random.rand(5) 
array([0.51939439, 0.84917657, 0.04400428, 0.68715007, 0.23389854])

Notes:

np.linspace() will produce an array containing the number of elements specified by the 3rd argument’s value, containing values between the first 2 arguments values.

For example, this code will produce 10, equally spaced values from 1 to 5.

Notice the elements in np.linspace() arrays are defaulted to type float.

We can also produce an array with random values using np.random.rand().

Here, we have random numbers uniformly distributed from 0 to 1.


Elementwise operations

array1 = np.ones(4)
array1
array([1., 1., 1., 1.])
array2 = array1 + 1
array2
array([2., 2., 2., 2.])
array1 + array2
array([3., 3., 3., 3.])
array1 * array2
array([2., 2., 2., 2.])

Notes:

Let’s talk about how operations are calculated with arrays.

We discussed that array and lists are similar but not quite the same.

Arrays are designed for convenience mathematically, so arrays operate in an element-wise manner.

When we do operations, the operation is done to each element in the array.

If we add to our array, 1 is added to each element in the array.

If we add two arrays together, the element at identical index positions are added.

Similarly, if we multiply 2 arrays together, the index at each position are multiplied together.


list_1 = [ 1, 1, 1, 1]
list_1 + 1
TypeError: can only concatenate list (not "int") to list

Detailed traceback: 
  File "<string>", line 1, in <module>

Notes:

This is much more convenient than using a list.

We can’t simply add 1 to a list. Instead, we get an error


list_1 = [ 1, 1, 1, 1]

list_2 =  [elem + 1 for elem in list_1]
list_2
[2, 2, 2, 2]
list_3 = []

for index in range(len(list_1)):
  list_3.append(list_1[index] + list_2[index])
  
list_3
[3, 3, 3, 3]

Notes:

If we wanted the same operations done with lists, we would have to use a loop or list comprehension to obtain the same results.


Slicing and Indexing 1D Arrays

arr = np.arange(10)
arr
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
arr[7]
7
arr[2:6]
array([2, 3, 4, 5])
arr[-1]
9

Notes:

When it comes to slicing, 1D arrays are sliced in the same manner that lists are.

We can obtain an individual location by putting the index position in square brackets.

And just like slicing dataframes with .iloc[], when we want an interval of values, the first value in the bracket is included, and the last value is excluded.

To obtain elements from right to left, we use negative integers.


Boolean Indexing

grade_array = np.array([98, 87, 103, 92, 67, 107, 78, 104, 85, 105])
grade_array
array([ 98,  87, 103,  92,  67, 107,  78, 104,  85, 105])
threshold = np.array([98, 87, 103, 92, 67, 107, 78, 104, 85, 105]) > 100
threshold
array([False, False,  True, False, False,  True, False,  True, False,  True])
grade_array[threshold] = 100
grade_array
array([ 98,  87, 100,  92,  67, 100,  78, 100,  85, 100])

Notes:

Let’s now explore Boolean indexing.

Let’s take a 1D array that consists of 10 elements.

Remember that when we do most operations, it occurs in an element-wise manner.

Perhaps we are grading exams that contain bonus marks.

The max possible allowed mark on the exam is 100%, so we must cap the grades, so any mark greater than 100 is set to 100. First, we check which values are greater than 100.

This produces an array containing Boolean values, which we store in the object threshold.

The first and second elements are False since both 98 and 87 and not larger than 100. However, the 3rd element is True since 103 is larger than 100.

We now can replace all those values that have a True Boolean, with a new value; in this case, let’s assign them a value of 100, the maximum possible allowed grade.


new_grade_array = np.array([98,87,103, 92,67, 107, 78, 104, 85, 105])
new_grade_array
array([ 98,  87, 103,  92,  67, 107,  78, 104,  85, 105])
new_grade_array[new_grade_array > 100] = 100
new_grade_array
array([ 98,  87, 100,  92,  67, 100,  78, 100,  85, 100])

Notes:

We could also shorten the whole process and avoid making threshold by using the following code.

You’ll notice that we use similar filtering square bracket notation that we did using pandas!


Why NumPy?

cereal.head()
                        name mfr  type  calories  protein  fat  sodium  fiber  carbo  sugars  potass  vitamins  shelf  weight  cups     rating
0                  100% Bran   N  Cold        70        4    1     130   10.0    5.0       6     280        25      3     1.0  0.33  68.402973
1          100% Natural Bran   Q  Cold       120        3    5      15    2.0    8.0       8     135         0      3     1.0  1.00  33.983679
2                   All-Bran   K  Cold        70        4    1     260    9.0    7.0       5     320        25      3     1.0  0.33  59.425505
3  All-Bran with Extra Fiber   K  Cold        50        4    0     140   14.0    8.0       0     330        25      3     1.0  0.50  93.704912
4             Almond Delight   R  Cold       110        2    2     200    1.0   14.0       8       1        25      3     1.0  0.75  34.384843
type(cereal.loc[3,'calories'])
<class 'numpy.int64'>
cereal['calories'].to_numpy()
array([ 70, 120,  70,  50, 110, 110, 110, 130,  90,  90, 120, 110, 120, 110, 110, 110, 100, 110, 110, 110, 100, 110, 100, 100, 110, 110, 100, 120, 120, 110, 100, 110, 100, 110, 120, 120, 110, 110, 110, 140, 110, 100, 110, 100, 150, 150, 160, 100, 120, 140,  90, 130, 120, 100,  50,  50, 100, 100, 120, 100,  90, 110, 110,  80,  90,  90, 110, 110,  90, 110, 140, 100, 110, 110, 100, 100, 110])

Notes:

So why, NumPy?

Lists are often used with a similar purpose of arrays, but they are slow to process.

Because of this, NumPy is used to create many other structures.

In fact, let’s refresh ourselves on certain values in a dataframe.

Remember when we obtained the data type of a specific value in a dataframe?

We obtained this <class 'numpy.int64'>, which we originally ignored.

This is because a pandas dataframe is built off of a multidimensional (2D specifically) array!

We will explain more about multidimensional arrays in the next set of slides.

We can actually convert an entire pandas column into an array pretty easily using np.to_numpy().


NumPy Constants and Functions

404 image
np.pi
3.141592653589793
404 image
np.inf
inf
404 image
np.e
2.718281828459045

Notes:

NumPy also offers an assortment of handy mathematical constants and functions.


NumPy Functions

np.prod([2, 3, 1])
6
np.diff([2, 5, 20])
array([ 3, 15])
np.log10(100)
2.0

The full list of mathematical functions are available at this NumPy website.

Notes:

NumPy’s functions include but are not limited to:

  • np.prod() which calculates the product of values in an array

  • np.diff() which calculates the difference between element (left element subtracted from the right element)

  • And other functions such as np.log() or trigonometric ones like np.sin()


Let’s apply what we learned!

Notes: