DRY revisited and function fundamentals¶
Notes:
numbers = [2, 3, 5]
squared = list()
for number in numbers:
squared.append(number ** 2)
squared
[4, 9, 25]
def squares_a_list(numerical_list):
new_squared_list = list()
for number in numerical_list:
new_squared_list.append(number ** 2)
return new_squared_list
squares_a_list(numbers)
[4, 9, 25]
Notes:
In the last module, we were introduced to the DRY principle and how creating functions helps comply with it.
Let’s do a little bit of a recap.
DRY stands for Don’t Repeat Yourself.
We can avoid writing repetitive code by creating a function that takes in arguments, performs some operations, and returns the results.
The example in Module 5 converted code that creates a list of squared elements from an existing list of numbers into a function.
larger_numbers = [5, 44, 55, 23, 11]
promoted_numbers = [73, 84, 95]
executive_numbers = [100, 121, 250, 103, 183, 222, 214]
squares_a_list(larger_numbers)
[25, 1936, 3025, 529, 121]
squares_a_list(promoted_numbers)
[5329, 7056, 9025]
squares_a_list(executive_numbers)
[10000, 14641, 62500, 10609, 33489, 49284, 45796]
Notes:
This function gave us the ability to do the same operation for multiple lists without having to rewrite any code and just calling the function.
Scoping¶
def squares_a_list(numerical_list):
new_squared_list = list()
for number in numerical_list:
new_squared_list.append(number ** 2)
print(new_squared_list)
return new_squared_list
squares_a_list(numbers)
[4]
[4, 9]
[4, 9, 25]
[4, 9, 25]
new_squared_list
NameError: name 'new_squared_list' is not defined
Detailed traceback:
File "<string>", line 1, in <module>
Notes:
It’s important to know what exactly is going on inside and outside of a function.
In our function squares_a_list()
we saw that we created a variable
named new_squared_list
.
We can print this variable and watch all the elements be appended to it as we loop through the input list.
But what happens if we try and print this variable outside of the function?
Yikes! Where did new_squared_list
go?
It doesn’t seem to exist! That’s not entirely true.
In Python, new_squared_list
is something we call a local
variable.
Local variables are any objects that have been created within a function and only exist inside the function where they are made.
Code within a function is described as a local environment.
Since we called new_squared_list
outside of the function’s body,
Python fails to recognize it.
def squares_a_list(numerical_list):
new_squared_list = list()
for number in numerical_list:
new_squared_list.append(number ** 2)
print(new_squared_list)
return new_squared_list
a_new_variable = "Peek-a-boo"
a_new_variable
'Peek-a-boo'
Notes:
Let’s compare that with the variable a_new_variable
.
a_new_variable
is created outside of a function in what we call our
global environment, and therefore Python recognizes it as a
global variable.
Global and Local Variables¶
def squares_a_list(numerical_list):
print(a_new_variable)
new_squared_list = list()
for number in numerical_list:
new_squared_list.append(number ** 2)
return new_squared_list
squares_a_list([12, 5, 7, 99999])
Peek-a-boo
[144, 25, 49, 9999800001]
Notes:
Global variables differ from local variables as they are not only recognized outside of any function but also recognized inside functions.
Let’s take a look at what happens when we add a_new_variable
, which is
a global variable,e and refer to it in the squares_a_list
function.
The function recognizes the global variable!
It’s important to note that, although functions recognize global variables, it’s not good practice to have functions reference objects outside of it.
We will learn more about this later in the module.

Attribution - 49th and Parallel
Notes:
I’m going to make an analogy comparing coffee stores to variables.
Starbucks Coffee is a globally recognized brand across the world and is available in 70 different countries.
I can purchase a coffee from Starbucks in Vancouver (my local city), and if I were to travel across the world to Sydney, Australia, I would still be able to purchase a coffee from Starbucks there.
Starbucks Coffee is similar to a global variable as it is accessible and recognized in both its local (Vancouver) and global environments.
49th Parallel is a local Vancouver coffee store.
Many people from Vancouver recognize it; however, purchasing a coffee from 49th Parallel outside of Vancouver would be impossible as it is not accessible past the City of Vancouver.
Just like Starbucks Coffee, global variables are recognized and accessible in both their global and local environments, whereas local variables like the coffee store 49th Parallel are only recognized and accessible in the local environment it was created in.
When things get tricky¶
a_new_variable = "Peek-a-boo"
def squares_a_list(numerical_list):
a_new_variable = "Ta-Da!"
print(a_new_variable)
new_squared_list = list()
for number in numerical_list:
new_squared_list.append(number ** 2)
return new_squared_list
squares_a_list([1, 2])
Ta-Da!
[1, 4]
Notes:
Things can get unclear when we have variables that are named the same way but come from two different environments.
What happens when 2 different objects share the same name, where one was defined inside the function and the other in the global environment?
For instance, let’s say we defined a variable a_new_variable
in our
global environment, and we’ve made a variable in a local environment
with the same name a_new_variable
but with different values within our
squares_a_list
function.
We can see that the locally created a_new_variable
variable was
printed instead of the global object with the same name.
def squares_a_list(numerical_list):
a_new_variable = "Ta-Da!"
print(a_new_variable)
new_squared_list = list()
for number in numerical_list:
new_squared_list.append(number ** 2)
return new_squared_list
squares_a_list([1, 2])
a_new_variable
Ta-Da!
[1, 4]
'Peek-a-boo'
Notes:
What about if we output a_new_variable
right after.
Our function prints the locally defined a_new_variable
, and the global
environment prints the globally defined a_new_variable
.
def squares_a_list(numerical_list, a_new_variable):
print(a_new_variable)
new_squared_list = list()
for number in numerical_list:
new_squared_list.append(number ** 2)
return new_squared_list
a_new_variable = "Peek-a-boo"
squares_a_list([1,2], "BAM!")
BAM!
[1, 4]
Notes:
What if a_new_variable
was an argument?
Given a global variable a_new_variable = "Peek-a-boo"
, what value will
the function print if we assign a value of "BAM!"
to the input
argument a_new_variable
?
Here we can see that the function uses the input argument value instead of the global variable value.
Modifying global variables¶
global_list = [50, 51, 52]
def squares_a_list(numerical_list):
global_list.append(99)
print("print global_list:", global_list)
new_squared_list = list()
for number in numerical_list:
new_squared_list.append(number ** 2)
return new_squared_list
squares_a_list([1, 2])
print global_list: [50, 51, 52, 99]
[1, 4]
global_list
[50, 51, 52, 99]
Notes:
So global variables are accessible inside functions - but what about modifying them?
Let’s take a list that we define in our global environment called
global_list
and add 99
to the list in the local environment.
The list that we defined globally was able to be modified inside the function and have the changes reflected back in the global environment!
What is going on?
Modifying objects like this within a function without returning them is called a function side effect.
Function Side Effects¶
cereal = pd.read_csv('cereal.csv')
cereal.head()
name mfr type calories protein fat sodium fiber carbo sugars potass vitamins shelf weight cups rating
0 100% Bran N Cold 70 4 1 130 10.0 5.0 6 280 25 3 1.0 0.33 68.402973
1 100% Natural Bran Q Cold 120 3 5 15 2.0 8.0 8 135 0 3 1.0 1.00 33.983679
2 All-Bran K Cold 70 4 1 260 9.0 7.0 5 320 25 3 1.0 0.33 59.425505
3 All-Bran with Extra Fiber K Cold 50 4 0 140 14.0 8.0 0 330 25 3 1.0 0.50 93.704912
4 Almond Delight R Cold 110 2 2 200 1.0 14.0 8 1 25 3 1.0 0.75 34.384843
.drop()
.assign()
.sort_values()
.rename()
Notes:
For this next concept, we are going to bring back our trusty cereal dataframe.
Since the beginning of this course, we have been using verbs such as;
.drop()
.assign()
.sort_values()
.rename()
Where we modify a dataframe and save the modification as a new dataframe object.
cereal_dropped = cereal.drop(columns = ['sugars','potass','vitamins', 'shelf', 'weight', 'cups'])
cereal_dropped.head(2)
name mfr type calories protein fat sodium fiber carbo rating
0 100% Bran N Cold 70 4 1 130 10.0 5.0 68.402973
1 100% Natural Bran Q Cold 120 3 5 15 2.0 8.0 33.983679
cereal.head(2)
name mfr type calories protein fat sodium fiber carbo sugars potass vitamins shelf weight cups rating
0 100% Bran N Cold 70 4 1 130 10.0 5.0 6 280 25 3 1.0 0.33 68.402973
1 100% Natural Bran Q Cold 120 3 5 15 2.0 8.0 8 135 0 3 1.0 1.00 33.983679
cereal.drop(columns = ['sugars','potass','vitamins', 'shelf', 'weight', 'cups'], inplace=True)
cereal.head(2)
name mfr type calories protein fat sodium fiber carbo rating
0 100% Bran N Cold 70 4 1 130 10.0 5.0 68.402973
1 100% Natural Bran Q Cold 120 3 5 15 2.0 8.0 33.983679
Notes:
For example, when we have been dropping columns from a dataframe, we have been saving the changes with the assignment operator to a new object.
In this example, we drop columns from sugars
to cups
and assign this
modified dataframe with the dropped columns to the object named
cereal_dropped
.
If we look at the original cereal
dataframe, we can see it was
unaffected by this transformation.
Many of the verbs that we use for our transformations, such as the ones
we mentioned on the previous slide, have an argument called inplace
.
The inplace
argument accepts a Boolean value where the dataframe
object is modified directly without the need to save the changes to an
object with the assignment operation. That means we can skip the part of
making a new object with the =
sign.
Let’s try and drop the same columns as before but now using
inplace=True
.
This time, nothing is returned when we execute this code; however, if we
look at the cereal
dataframe now, we can see that it’s been altered,
and the columns have been dropped.
This transformation of the dataframe object is a side effect of the function.
A side effect is when a function produces changes to global variables outside the environment it was created, this means a function has an observable effect besides the returning value.
It’s important to include that although inplace
exists, there is a
reason we haven’t taught it, and it’s because we don’t recommend using
it. Overriding the object by saving it with the same object name is the
preferred coding technique.
cereal.to_csv('cereal.csv')
regular_list = [7, 8, 2020]
regular_list
[7, 8, 2020]
regular_list.append(3)
regular_list
[7, 8, 2020, 3]
Notes:
Although this appears to be new vocabulary, side effects have been
present since the beginning of this course, starting with pd.to_csv()
.
pd.to_csv()
is a function that we saw in module 1, that didn’t return
anything after we executed it but still produced a side effect of a
newly saved csv file on our computer.
Another example that we’ve seen when working with lists is the verb
.append()
.
When we execute the code .append(3)
, on our object regular_list
,
nothing is returned from the function, and we have not used to
assignment operator to save any transformation to the list, however,
when we inspect regular_list
, we can see that it has been modified and
included the new element 3
.
This would be another example of a function with a side effect.
The list was created in the global environment, but modified in
.append()
’s local environment.
Side effects seem like fun, but they can be extremely problematic when trying to debug (fix) your code.
When writing functions, it’s usually a good idea to avoid side effects.
If objects need to be modified, best practice is to modify them in the environment they originated in.
Side Effect Documentation¶
If your functions have side-effects, they should be documented.
Notes:
Although side effects are not recommended, there are cases where either we must have side-effects in our functions, or there is no way to avoid it. In these cases, it is extremely important that we document it.
This leads to the next question of How? Good news - the answer is coming later on in this module!
The deal with print()¶
print('A regular string')
A regular string
a_number_variable = 54.346
print(a_number_variable)
54.346
Notes:
What is print()
?
We have not talked about this function in large detail but we do know
print()
will print whatever variable or item you call in it. It can be
an especially handy one when debugging.
We can use it to print some code directly or from a variable like we see here.
It’s important that we address using the print statement vs using return in a function as they are quite different.
Let’s see why.
def squares_a_list(numerical_list):
new_squared_list = list()
for number in numerical_list:
new_squared_list.append(number ** 2)
return new_squared_list
def squares_a_list_print(numerical_list):
new_squared_list = list()
for number in numerical_list:
new_squared_list.append(number ** 2)
print(new_squared_list)
Notes:
Here er have our squares_a_list
function. Let’s create a new function
called squares_a_list_print
where instead of returning the new
variable new_squared_list
, we print it instead.
The only difference here is that in squares_a_list
we return
new_squared_list
and in squares_a_list_print
we are printing
new_squared_list
.
numbers = [2, 3, 5]
squares_a_list(numbers)
[4, 9, 25]
squares_a_list_print(numbers)
[4, 9, 25]
Notes:
Let’s see what happens when we call these functions now.
If we call them both without assigning them to an object, it looks like these functions do identical things.
Both output the new list.
return_func_var = squares_a_list(numbers)
print(return_func_var)
[4, 9, 25]
print_func_var = squares_a_list_print(numbers)
[4, 9, 25]
print(print_func_var)
None
Notes:
This time let’s instead save them to objects.
When we call and save squares_a_list(numbers)
, you’ll see that nothing
is printed or outputted.
But if we print what our variable return_func_var
contains, you’ll see
the list of square numbers.
In contrast,when we do the same thing with
squares_a_list_print(numbers)
, the new list is outputted while we are
assigning it to a variable since the print function is called within our
squares_a_list_print(numbers)
function.
If we see print what’s in our variables print_func_var
, you’ll see
that there is nothing stored in it.
That’s because the print()
function, when used in a function, is a
side effect and our squares_a_list_print()
function is not
returning anything to store, it’s only displaying it.
In order for us to save the output of our functions to a variable, we
must use return
in our function, otherwise we are only producing a
side effect instead of returning an actual value