You can use the following functions to calculate the mean, median, and mode of each numeric column in a pandas DataFrame:
print(df.mean(numeric_only=True)) print(df.median(numeric_only=True)) print(df.mode(numeric_only=True))
The following example shows how to use these functions in practice.
Suppose we have the following pandas DataFrame that contains information about points scored by various basketball players in four different games:
import pandas as pd #create DataFrame df = pd.DataFrame({'player': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'], 'game1': [18, 22, 19, 14, 14, 11, 20, 28], 'game2': [5, 7, 7, 9, 12, 9, 9, 4], 'game3': [11, 8, 10, 6, 6, 5, 9, 12], 'game4': [9, 8, 10, 9, 14, 15, 10, 11]}) #view DataFrame print(df) player game1 game2 game3 game4 0 A 18 5 11 9 1 B 22 7 8 8 2 C 19 7 10 10 3 D 14 9 6 9 4 E 14 12 6 14 5 F 11 9 5 15 6 G 20 9 9 10 7 H 28 4 12 11
We can use the following syntax to calculate the mean value of each numeric column:
#calculate mean of each numeric column print(df.mean(numeric_only=True)) game1 18.250 game2 7.750 game3 8.375 game4 10.750 dtype: float64
From the output we can see:
- The mean value in the game1 column is 18.25.
- The mean value in the game2 column is 7.75.
- The mean value in the game3 column is 8.375.
- The mean value in the game4 column is 10.75.
We can then use the following syntax to calculate the median value of each numeric column:
#calculate median of each numeric column print(df.median(numeric_only=True)) game1 18.5 game2 8.0 game3 8.5 game4 10.0 dtype: float64
From the output we can see:
- The median value in the game1 column is 18.5.
- The median value in the game2 column is 8.
- The median value in the game3 column is 8.5.
- The median value in the game4 column is 10.
We can then use the following syntax to calculate the mode of each numeric column:
#calculate mode of each numeric column print(df.mode(numeric_only=True)) game1 game2 game3 game4 0 14.0 9.0 6.0 9 1 NaN NaN NaN 10
From the output we can see:
- The mode in the game1 column is 14.
- The mode in the game2 column is 9.
- The mode in the game3 column is 6.
- The mode in the game4 column is 9 and 10
Note that the game4 column had two modes since there were two values that occurred most frequently in that column.
Note: You can also use the describe() function in pandas to generate more descriptive statistics for each column.
Additional Resources
The following tutorials explain how to perform other common operations in pandas:
How to Calculate the Mean by Group in Pandas
How to Calculate the Median by Group in Pandas
How to Calculate Mode by Group in Pandas
To find the modes of the columns in a DataFrame, or the mode value of a Series in pandas, the easiest way is to use the pandas mode() function.
df.mode()
When working with data, many times we want to calculate summary statistics to understand our data better. One such statistic is the mode, or the value which occurs most for a given variable.
Finding the mode in a column, or the mode for all columns or rows in a DataFrame using pandas is easy. We can use the pandas mode() function to find the mode value of columns in a DataFrame.
The pandas mode() function works for both numeric and object dtypes.
Let’s say we have the following DataFrame.
df = pd.DataFrame({'Age': [43,23,43,49,71,37],
'Test_Score':[90,87,96,96,87,79]})
print(df)
# Output:
Age Test_Score
0 43 90
1 23 87
2 43 96
3 49 96
4 71 87
5 37 79
To get the modes for all columns, we can call the pandas mode() function.
print(df.mode())
# Output:
Age Test_Score
0 43.0 87
1 NaN 96
There is one mode for “Age” and two modes for “Test_Score”.
If we only want to get the mode of one column, we can do this using the pandas mode() function in the following Python code:
print(df["Test_Score"].mode())
# Output:
0 87
1 96
dtype: int64
Find the Mode of a Column with Object dtype in pandas
The mode() function works for both numeric and object dtypes.
Let’s say I have the following pandas DataFrame:
Name Weight_Change Month
0 Jim -16.20 1
1 Sally 12.81 1
2 Bob -20.45 1
3 Sue 15.35 1
4 Jill -12.43 1
5 Larry -18.52 1
6 Pam -6.10 2
7 Sally -2.81 2
8 Rose 12.45 2
9 Pat -0.32 2
10 Jill -1.23 2
11 Larry -8.52 2
12 Jim 5.20 3
13 Rob 12.81 3
14 Bob -2.45 3
15 Herman 5.35 3
16 Jill -2.43 3
17 Billy -1.85 3
We can use the mode() function to see who appears in our DataFrame the most by calling it on the “Name” column.
print(df["Name"].mode())
#Output:
0 Jill
dtype: object
Hopefully this article has been helpful for you to understand how to find the mode of a Series or DataFrame in pandas.
About The Programming Expert
The Programming Expert is a compilation of a programmer’s findings in the world of software development, website creation, and automation of processes.
Programming allows us to create amazing applications which make our work more efficient, repeatable and accurate.
At the end of the day, we want to be able to just push a button and let the code do it’s magic.
You can read more about us on our about page.
Contents
- 1 Introduction
-
- 1.0.1 Importing Pandas Library
- 1.1 Pandas Mean : mean()
- 1.1.1 Syntax
- 1.1.2 Example 1: Simple example of Pandas Mean() function
- 1.1.3 Example 2: Using skipna parameter of Pandas Mean() function
- 1.2 Pandas Median : median()
- 1.2.1 Syntax
- 1.2.2 Example 1: Finding median using pandas median() function
- 1.2.3 Example 2: Finding median over column axis and using skipna parameter
- 1.3 Pandas Mode : mode()
- 1.3.1 Syntax
- 1.3.2 Example 1: Finding mode using pandas mode() function
- 1.3.3 Example 2: Using dropna parameter of pandas mode()
- 1.4 Conclusion
-
Introduction
In this article, we will cover pandas functions for statistical analysis which is one of the most important topics related to Data Science. This field helps us in understanding the intricate details of our data. The pandas functions that will be learned in this article are pandas mean(), median(), and mode(). So let’s start the article and learn about these functions.
Importing Pandas Library
First we will import the pandas library.
In [1]:
import pandas as pd import numpy as np
The mean function of pandas helps us in finding the mean of the values on the specified axis.
Syntax
pandas.DataFrame.mean(axis=None, skipna=None, level=None, numeric_only=None, kwargs)
- axis : {index (0), columns (1)} – This is the axis where the function is applied.
- skipna : bool, default True – This is used for deciding whether to exclude NA/Null values or not.
- level : int or level name, default None – This parameter is generally used when we are deadling with multindex dataframe.
- numeric_only : bool, default None – This is used for deciding whether to include only float, int, boolean columns.
- kwags – Additional keyword arguments passed to the function.
Example 1: Simple example of Pandas Mean() function
In this example, the mean can be calculated over columns or rows.
Ad
In [2]:
df = pd.DataFrame({"P":[35, 9, 1, 78, 19], "Q":[51, 45, 54, 30, 12], "R":[24, 6, 75, 13, 83], "S":[14, 41, 7, 25, 67]})
Out[3]:
P | Q | R | S | |
---|---|---|---|---|
0 | 35 | 51 | 24 | 14 |
1 | 9 | 45 | 6 | 41 |
2 | 1 | 54 | 75 | 7 |
3 | 78 | 30 | 13 | 25 |
4 | 19 | 12 | 83 | 67 |
The default axis for finding mean is over the columns.
Out[4]:
P 28.4 Q 38.4 R 40.2 S 30.8 dtype: float64
In the below instances,the axis parameter is passed to the mean function and we can see the difference in the results.
Out[5]:
P 28.4 Q 38.4 R 40.2 S 30.8 dtype: float64
Out[6]:
0 31.00 1 25.25 2 34.25 3 36.50 4 45.25 dtype: float64
Example 2: Using skipna parameter of Pandas Mean() function
Whenever a dataframe consists null/NaN values, then by using skipna parameter, we can skip those values and find the mean of the dataframe.
In [7]:
df = pd.DataFrame({"P":[35, 9, 1, 78, None], "Q":[51, None, 54, 30, 12], "R":[24, 6, None, 13, 83], "S":[14, 41, 7, 25, None]})
Out[8]:
P | Q | R | S | |
---|---|---|---|---|
0 | 35.0 | 51.0 | 24.0 | 14.0 |
1 | 9.0 | NaN | 6.0 | 41.0 |
2 | 1.0 | 54.0 | NaN | 7.0 |
3 | 78.0 | 30.0 | 13.0 | 25.0 |
4 | NaN | 12.0 | 83.0 | NaN |
In [9]:
df.mean(axis = 1, skipna = True)
Out[9]:
0 31.000000 1 18.666667 2 20.666667 3 36.500000 4 47.500000 dtype: float64
Pandas Median : median()
The median function of pandas helps us in finding the median of the values on the specified axis.
Syntax
pandas.DataFrame.median(axis=None, skipna=None, level=None, numeric_only=None, kwargs)
- axis : {index (0), columns (1)} – This is the axis where the function is applied.
- skipna : bool, default True – This is used for deciding whether to exclude NA/Null values or not.
- level : int or level name, default None – This parameter is generally used when we are deadling with multindex dataframe.
- numeric_only : bool, default None – This is used for deciding whether to include only float, int, boolean columns.
- kwags – Additional keyword arguments passed to the function.
Example 1: Finding median using pandas median() function
The dataframe that we have is used for finding the median.
Out[10]:
P | Q | R | S | |
---|---|---|---|---|
0 | 35.0 | 51.0 | 24.0 | 14.0 |
1 | 9.0 | NaN | 6.0 | 41.0 |
2 | 1.0 | 54.0 | NaN | 7.0 |
3 | 78.0 | 30.0 | 13.0 | 25.0 |
4 | NaN | 12.0 | 83.0 | NaN |
Out[11]:
P 22.0 Q 40.5 R 18.5 S 19.5 dtype: float64
Example 2: Finding median over column axis and using skipna parameter
In this example, median is calculated over column axis and skipna parameter is used for excluding the NULL values.
In [12]:
df.median(axis = 1, skipna = True)
Out[12]:
0 29.5 1 9.0 2 7.0 3 27.5 4 47.5 dtype: float64
Pandas Mode : mode()
The mode function of pandas helps us in finding the mode of the values on the specified axis.
Syntax
pandas.DataFrame.mode(axis=None, skipna=None, level=None, numeric_only=None, kwargs)**
- axis : {index (0), columns (1)} – This is the axis where the function is applied.
- skipna : bool, default True – This is used for deciding whether to exclude NA/Null values or not.
- level : int or level name, default None – This parameter is generally used when we are deadling with multindex dataframe.
- numeric_only : bool, default None – This is used for deciding whether to include only float, int, boolean columns.
- kwags – Additional keyword arguments passed to the function.
Example 1: Finding mode using pandas mode() function
With the help of pandas mode function, we will find the mode of the dataframe.
In [13]:
df = pd.DataFrame([('Sedan', 80, 250), ('Hatchback', 90, 200), ('SUV', 80, 250), ('Sedan', 75, 150)], index=('BMW', 'Mercedes', 'Jaguar', 'Bentley'), columns=('car_name', 'speed', 'weight'))
Out[14]:
car_name | speed | weight | |
---|---|---|---|
0 | Sedan | 80 | 250 |
Example 2: Using dropna parameter of pandas mode()
The dropna parameter of pandas mode() function is used in this example.
In [15]:
df_drop = pd.DataFrame([('Sedan', 80, np.nan), ('Hatchback', 90, 200), ('SUV', 80, np.nan), ('Sedan', 75, 150)], index=('BMW', 'Mercedes', 'Jaguar', 'Bentley'), columns=('car_name', 'speed', 'weight'))
In [16]:
df_drop.mode(dropna=False)
Out[16]:
car_name | speed | weight | |
---|---|---|---|
0 | Sedan | 80 | NaN |
In [17]:
df_drop.mode(dropna=True)
Out[17]:
car_name | speed | weight | |
---|---|---|---|
0 | Sedan | 80.0 | 150.0 |
1 | NaN | NaN | 200.0 |
Conclusion
We have reached to the end of this article, in this article we have covered pandas functions of statistics. These functions are mean(), median() and mode(). These statistical functions help in understanding the intricate details of our data. We have looked at the syntax and examples of these functions, this will assist in learning the usage of these functions.
- Also Read – Tutorial – Pandas Drop, Pandas Dropna, Pandas Drop Duplicate
- Also Read – Pandas Visualization Tutorial – Bar Plot, Histogram, Scatter Plot, Pie Chart
- Also Read – Tutorial – Pandas Concat, Pandas Append, Pandas Merge, Pandas Join
- Also Read – Pandas DataFrame Tutorial – Selecting Rows by Value, Iterrows and DataReader
Reference – https://pandas.pydata.org/docs/
You can use the following syntax to calculate the mode in a GroupBy object in pandas:
df.groupby(['group_var'])['value_var'].agg(pd.Series.mode)
The following example shows how to use this syntax in practice.
Example: Calculate Mode in a GroupBy Object
Suppose we have the following pandas DataFrame that shows the points scored by basketball players on various teams:
import pandas as pd
#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'B', 'B', 'C', 'C', 'C'],
'points': [10, 10, 12, 15, 19, 23, 20, 20, 26]})
#view DataFrame
print(df)
team points
0 A 10
1 A 10
2 A 12
3 A 15
4 B 19
5 B 23
6 C 20
7 C 20
8 C 26
We can use the following syntax to calculate the mode points value for each team:
#calculate mode points value for each team
df.groupby(['team'])['points'].agg(pd.Series.mode)
team
A 10
B [19, 23]
C 20
Name: points, dtype: object
Here’s how to interpret the output:
- The mode points value for team A is 10.
- The mode points values for team B are 19 and 23.
- The mode points value for team C is 20.
If one group happens to have multiple modes then you can use the following syntax to display each mode on a different row:
#calculate mode points value for each team
df.groupby(['team'])['points'].apply(pd.Series.mode)
team
A 0 10
B 0 19
1 23
C 0 20
Name: points, dtype: int64
Note: You can find the complete documentation for the GroupBy operation in pandas here.
Additional Resources
The following tutorials explain how to perform other common operations in pandas:
Pandas: How to Calculate Cumulative Sum by Group
Pandas: How to Count Unique Values by Group
Pandas: How to Calculate Correlation By Group
Last update on August 19 2022 21:50:33 (UTC/GMT +8 hours)
DataFrame – mode() function
The mode() function is used to get the mode(s) of each element along the selected axis.
The mode of a set of values is the value that appears most often. It can be multiple values.
Syntax:
DataFrame.mode(self, axis=0, numeric_only=False, dropna=True)
Parameters:
Name | Description | Type/Default Value | Required / Optional |
---|---|---|---|
axis |
The axis to iterate over while searching for the mode:
|
{0 or ‘index’, 1 or ‘columns’} Default Value: 0 |
Required |
numeric_only | If True, only apply to numeric columns. | bool Default Value: False |
Required |
dropna | Don’t consider counts of NaN/NaT. | bool Default Value: True |
Required |
Returns: DataFrame
The modes of each column or row.
Example:
Download the Pandas DataFrame Notebooks from here.
Previous: DataFrame – min() function
Next: DataFrame – pct_change() function
We are closing our Disqus commenting system for some maintenanace issues. You may write to us at reach[at]yahoo[dot]com or visit us
at Facebook