Frequency Tables and Writing CSVs¶
Watch it
See the accompanied youtube video at the link here.
What is Frequency?¶
Before we explain what a frequency table is, you must know what frequency means first.
Frequency is simply put, the number of times a value occurs within the data. Let’s look at an example using our candybars dataset.
candybars_mini
name | weight | available_canada_america | |
---|---|---|---|
0 | Coffee Crisp | 50 | Canada |
1 | Butterfinger | 184 | America |
2 | Skor | 39 | Both |
3 | Smarties | 45 | Canada |
4 | Twix | 58 | Both |
5 | Reeses Peanutbutter Cups | 43 | Both |
6 | 3 Musketeers | 54 | America |
If we count the number of times the value Both
appears in the
available_canada_america
column, we get 3 times. This is the frequency
of the value both
.
What is a Frequency Table?¶
A frequency table is a manner of displaying all the possible values of a column in our dataframe and the number of occurrences (frequencies) of each value.
For our sample data, a frequency table for the
available_canada_america
column would look like this:
Both 3
Canada 2
America 2
Name: available_canada_america, dtype: int64
If we want to get a frequency table of a categorical column, there are a few steps that need to be followed.
Up until now, we discussed getting a single column from a dataframe
using double square brackets - df[['column name']]
.
For frequency tables, however, we only use single brackets to obtain the column values.
mfr_column = cereal['mfr']
mfr_column
0 N
1 Q
2 K
3 K
4 R
..
72 G
73 G
74 R
75 G
76 G
Name: mfr, Length: 77, dtype: object
We saved the object in this example here to an object named mfr_column
in the same way that we have done this before.
Now we can use .value_counts()
on this mfr_column
variable to
reference it, and we can obtain the frequency value for the different
categories in that variable.
mfr_freq = mfr_column.value_counts()
mfr_freq
K 23
G 22
P 9
Q 8
R 8
N 6
A 1
Name: mfr, dtype: int64
If we did instead use double square brackets with pd.value_counts()
,
we would get an error. So it is important to take care and remember when
you are using value_counts()
, you only use one set of square brackets.
mfr_col_wrong = cereal[['mfr']]
mfr_col_wrong
mfr | |
---|---|
0 | N |
1 | Q |
2 | K |
3 | K |
4 | R |
... | ... |
72 | G |
73 | G |
74 | R |
75 | G |
76 | G |
77 rows × 1 columns
mfr_col_wrong.value_counts()
mfr
K 23
G 22
P 9
Q 8
R 8
N 6
A 1
dtype: int64
Saving a dataframe¶
Sometimes it is useful to save a new dataframe to a file like a csv file for future use by you or somebody else.
We can do this using a method called .to_csv()
.
mfr_freq.to_csv('mfr_frequency.csv', index=False)
We put our desired csv
file name in quotations within the parentheses
and follow it with the argument index=False
so we don’t export our
index column which is just a column of numbers.