Как найти моду pandas


You can use the following functions to calculate the mean, median, and mode of each numeric column in a pandas DataFrame:

print(df.mean(numeric_only=True))
print(df.median(numeric_only=True))
print(df.mode(numeric_only=True))

The following example shows how to use these functions in practice.

Suppose we have the following pandas DataFrame that contains information about points scored by various basketball players in four different games:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'player': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'],
                   'game1': [18, 22, 19, 14, 14, 11, 20, 28],
                   'game2': [5, 7, 7, 9, 12, 9, 9, 4],
                   'game3': [11, 8, 10, 6, 6, 5, 9, 12],
                   'game4': [9, 8, 10, 9, 14, 15, 10, 11]})
                   
#view DataFrame
print(df)

  player  game1  game2  game3  game4
0      A     18      5     11      9
1      B     22      7      8      8
2      C     19      7     10     10
3      D     14      9      6      9
4      E     14     12      6     14
5      F     11      9      5     15
6      G     20      9      9     10
7      H     28      4     12     11

We can use the following syntax to calculate the mean value of each numeric column:

#calculate mean of each numeric column
print(df.mean(numeric_only=True))

game1    18.250
game2     7.750
game3     8.375
game4    10.750
dtype: float64

From the output we can see:

  • The mean value in the game1 column is 18.25.
  • The mean value in the game2 column is 7.75.
  • The mean value in the game3 column is 8.375.
  • The mean value in the game4 column is 10.75.

We can then use the following syntax to calculate the median value of each numeric column:

#calculate median of each numeric column
print(df.median(numeric_only=True))

game1    18.5
game2     8.0
game3     8.5
game4    10.0
dtype: float64

From the output we can see:

  • The median value in the game1 column is 18.5.
  • The median value in the game2 column is 8.
  • The median value in the game3 column is 8.5.
  • The median value in the game4 column is 10.

We can then use the following syntax to calculate the mode of each numeric column:

#calculate mode of each numeric column
print(df.mode(numeric_only=True))

   game1  game2  game3  game4
0   14.0    9.0    6.0      9
1    NaN    NaN    NaN     10

From the output we can see:

  • The mode in the game1 column is 14.
  • The mode in the game2 column is 9.
  • The mode in the game3 column is 6.
  • The mode in the game4 column is 9 and 10

Note that the game4 column had two modes since there were two values that occurred most frequently in that column.

Note: You can also use the describe() function in pandas to generate more descriptive statistics for each column.

Additional Resources

The following tutorials explain how to perform other common operations in pandas:

How to Calculate the Mean by Group in Pandas
How to Calculate the Median by Group in Pandas
How to Calculate Mode by Group in Pandas

To find the modes of the columns in a DataFrame, or the mode value of a Series in pandas, the easiest way is to use the pandas mode() function.

df.mode()

When working with data, many times we want to calculate summary statistics to understand our data better. One such statistic is the mode, or the value which occurs most for a given variable.

Finding the mode in a column, or the mode for all columns or rows in a DataFrame using pandas is easy. We can use the pandas mode() function to find the mode value of columns in a DataFrame.

The pandas mode() function works for both numeric and object dtypes.

Let’s say we have the following DataFrame.

df = pd.DataFrame({'Age': [43,23,43,49,71,37], 
      'Test_Score':[90,87,96,96,87,79]})

print(df)
# Output: 
   Age  Test_Score
0   43          90
1   23          87
2   43          96
3   49          96
4   71          87
5   37          79

To get the modes for all columns, we can call the pandas mode() function.

print(df.mode())

# Output:
    Age  Test_Score
0  43.0          87
1   NaN          96

There is one mode for “Age” and two modes for “Test_Score”.

If we only want to get the mode of one column, we can do this using the pandas mode() function in the following Python code:

print(df["Test_Score"].mode())

# Output:
0    87
1    96
dtype: int64

Find the Mode of a Column with Object dtype in pandas

The mode() function works for both numeric and object dtypes.

Let’s say I have the following pandas DataFrame:

     Name  Weight_Change Month
0     Jim         -16.20     1
1   Sally          12.81     1
2     Bob         -20.45     1
3     Sue          15.35     1
4    Jill         -12.43     1
5   Larry         -18.52     1
6     Pam          -6.10     2   
7   Sally          -2.81     2  
8    Rose          12.45     2
9     Pat          -0.32     2
10   Jill          -1.23     2
11  Larry          -8.52     2
12    Jim           5.20     3 
13    Rob          12.81     3  
14    Bob          -2.45     3
15 Herman           5.35     3
16   Jill          -2.43     3
17  Billy          -1.85     3

We can use the mode() function to see who appears in our DataFrame the most by calling it on the “Name” column.

print(df["Name"].mode())

#Output:
0    Jill
dtype: object

Hopefully this article has been helpful for you to understand how to find the mode of a Series or DataFrame in pandas.

About The Programming Expert

The Programming Expert is a compilation of a programmer’s findings in the world of software development, website creation, and automation of processes.

Programming allows us to create amazing applications which make our work more efficient, repeatable and accurate.

At the end of the day, we want to be able to just push a button and let the code do it’s magic.

You can read more about us on our about page.

Pandas Statistical Functions Part-1 - mean(), median(), and mode()
Pandas Statistical Functions Part-1 – mean(), median(), and mode()

Contents

  • 1 Introduction
      • 1.0.1 Importing Pandas Library
    • 1.1 Pandas Mean : mean()
      • 1.1.1 Syntax
      • 1.1.2 Example 1: Simple example of Pandas Mean() function
      • 1.1.3 Example 2: Using skipna parameter of Pandas Mean() function
    • 1.2 Pandas Median : median()
      • 1.2.1 Syntax
      • 1.2.2 Example 1: Finding median using pandas median() function
      • 1.2.3 Example 2: Finding median over column axis and using skipna parameter
    • 1.3 Pandas Mode : mode()
      • 1.3.1 Syntax
      • 1.3.2 Example 1: Finding mode using pandas mode() function
      • 1.3.3 Example 2: Using dropna parameter of pandas mode()
    • 1.4 Conclusion

Introduction

In this article, we will cover pandas functions for statistical analysis which is one of the most important topics related to Data Science. This field helps us in understanding the intricate details of our data. The pandas functions that will be learned in this article are pandas mean(), median(), and mode(). So let’s start the article and learn about these functions.

Importing Pandas Library

First we will import the pandas library.

In [1]:

import pandas as pd
import numpy as np

The mean function of pandas helps us in finding the mean of the values on the specified axis.

Syntax

pandas.DataFrame.mean(axis=None, skipna=None, level=None, numeric_only=None, kwargs)

  • axis : {index (0), columns (1)} – This is the axis where the function is applied.
  • skipna : bool, default True – This is used for deciding whether to exclude NA/Null values or not.
  • level : int or level name, default None – This parameter is generally used when we are deadling with multindex dataframe.
  • numeric_only : bool, default None – This is used for deciding whether to include only float, int, boolean columns.
  • kwags – Additional keyword arguments passed to the function.

Example 1: Simple example of Pandas Mean() function

In this example, the mean can be calculated over columns or rows.

Ad

Deep Learning Specialization on Coursera

In [2]:

df = pd.DataFrame({"P":[35, 9, 1, 78, 19], 
                   "Q":[51, 45, 54, 30, 12],  
                   "R":[24, 6, 75, 13, 83], 
                   "S":[14, 41, 7, 25, 67]})   

Out[3]:

P Q R S
0 35 51 24 14
1 9 45 6 41
2 1 54 75 7
3 78 30 13 25
4 19 12 83 67

The default axis for finding mean is over the columns.

Out[4]:

P    28.4
Q    38.4
R    40.2
S    30.8
dtype: float64

In the below instances,the axis parameter is passed to the mean function and we can see the difference in the results.

Out[5]:

P    28.4
Q    38.4
R    40.2
S    30.8
dtype: float64

Out[6]:

0    31.00
1    25.25
2    34.25
3    36.50
4    45.25
dtype: float64

Example 2: Using skipna parameter of Pandas Mean() function

Whenever a dataframe consists null/NaN values, then by using skipna parameter, we can skip those values and find the mean of the dataframe.

In [7]:

df = pd.DataFrame({"P":[35, 9, 1, 78, None], 
                   "Q":[51, None, 54, 30, 12],  
                   "R":[24, 6, None, 13, 83], 
                   "S":[14, 41, 7, 25, None]})   

Out[8]:

P Q R S
0 35.0 51.0 24.0 14.0
1 9.0 NaN 6.0 41.0
2 1.0 54.0 NaN 7.0
3 78.0 30.0 13.0 25.0
4 NaN 12.0 83.0 NaN

In [9]:

df.mean(axis = 1, skipna = True) 

Out[9]:

0    31.000000
1    18.666667
2    20.666667
3    36.500000
4    47.500000
dtype: float64

Pandas Median : median()

The median function of pandas helps us in finding the median of the values on the specified axis.

Syntax

pandas.DataFrame.median(axis=None, skipna=None, level=None, numeric_only=None, kwargs)

  • axis : {index (0), columns (1)} – This is the axis where the function is applied.
  • skipna : bool, default True – This is used for deciding whether to exclude NA/Null values or not.
  • level : int or level name, default None – This parameter is generally used when we are deadling with multindex dataframe.
  • numeric_only : bool, default None – This is used for deciding whether to include only float, int, boolean columns.
  • kwags – Additional keyword arguments passed to the function.

Example 1: Finding median using pandas median() function

The dataframe that we have is used for finding the median.

Out[10]:

P Q R S
0 35.0 51.0 24.0 14.0
1 9.0 NaN 6.0 41.0
2 1.0 54.0 NaN 7.0
3 78.0 30.0 13.0 25.0
4 NaN 12.0 83.0 NaN

Out[11]:

P    22.0
Q    40.5
R    18.5
S    19.5
dtype: float64

Example 2: Finding median over column axis and using skipna parameter

In this example, median is calculated over column axis and skipna parameter is used for excluding the NULL values.

In [12]:

df.median(axis = 1, skipna = True) 

Out[12]:

0    29.5
1     9.0
2     7.0
3    27.5
4    47.5
dtype: float64

Pandas Mode : mode()

The mode function of pandas helps us in finding the mode of the values on the specified axis.

Syntax

pandas.DataFrame.mode(axis=None, skipna=None, level=None, numeric_only=None, kwargs)**

  • axis : {index (0), columns (1)} – This is the axis where the function is applied.
  • skipna : bool, default True – This is used for deciding whether to exclude NA/Null values or not.
  • level : int or level name, default None – This parameter is generally used when we are deadling with multindex dataframe.
  • numeric_only : bool, default None – This is used for deciding whether to include only float, int, boolean columns.
  • kwags – Additional keyword arguments passed to the function.

Example 1: Finding mode using pandas mode() function

With the help of pandas mode function, we will find the mode of the dataframe.

In [13]:

df = pd.DataFrame([('Sedan', 80, 250),
                    ('Hatchback', 90, 200),
                    ('SUV', 80, 250),
                    ('Sedan', 75, 150)],
                   index=('BMW', 'Mercedes', 'Jaguar', 'Bentley'),
                   columns=('car_name', 'speed', 'weight'))

Out[14]:

car_name speed weight
0 Sedan 80 250

Example 2: Using dropna parameter of pandas mode()

The dropna parameter of pandas mode() function is used in this example.

In [15]:

df_drop = pd.DataFrame([('Sedan', 80, np.nan),
                    ('Hatchback', 90, 200),
                    ('SUV', 80, np.nan),
                    ('Sedan', 75, 150)],
                   index=('BMW', 'Mercedes', 'Jaguar', 'Bentley'),
                   columns=('car_name', 'speed', 'weight'))

In [16]:

df_drop.mode(dropna=False)

Out[16]:

car_name speed weight
0 Sedan 80 NaN

In [17]:

df_drop.mode(dropna=True)

Out[17]:

car_name speed weight
0 Sedan 80.0 150.0
1 NaN NaN 200.0

Conclusion

We have reached to the end of this article, in this article we have covered pandas functions of statistics. These functions are mean(), median() and mode(). These statistical functions help in understanding the intricate details of our data. We have looked at the syntax and examples of these functions, this will assist in learning the usage of these functions.

  • Also Read – Tutorial – Pandas Drop, Pandas Dropna, Pandas Drop Duplicate
  • Also Read – Pandas Visualization Tutorial – Bar Plot, Histogram, Scatter Plot, Pie Chart
  • Also Read – Tutorial – Pandas Concat, Pandas Append, Pandas Merge, Pandas Join
  • Also Read – Pandas DataFrame Tutorial – Selecting Rows by Value, Iterrows and DataReader

Reference – https://pandas.pydata.org/docs/


You can use the following syntax to calculate the mode in a GroupBy object in pandas:

df.groupby(['group_var'])['value_var'].agg(pd.Series.mode)

The following example shows how to use this syntax in practice.

Example: Calculate Mode in a GroupBy Object

Suppose we have the following pandas DataFrame that shows the points scored by basketball players on various teams:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'B', 'B', 'C', 'C', 'C'],
                   'points': [10, 10, 12, 15, 19, 23, 20, 20, 26]})

#view DataFrame
print(df)

  team  points
0    A      10
1    A      10
2    A      12
3    A      15
4    B      19
5    B      23
6    C      20
7    C      20
8    C      26

We can use the following syntax to calculate the mode points value for each team:

#calculate mode points value for each team
df.groupby(['team'])['points'].agg(pd.Series.mode)

team
A          10
B    [19, 23]
C          20
Name: points, dtype: object

Here’s how to interpret the output:

  • The mode points value for team A is 10.
  • The mode points values for team B are 19 and 23.
  • The mode points value for team C is 20.

If one group happens to have multiple modes then you can use the following syntax to display each mode on a different row:

#calculate mode points value for each team
df.groupby(['team'])['points'].apply(pd.Series.mode)

team   
A     0    10
B     0    19
      1    23
C     0    20
Name: points, dtype: int64

Note: You can find the complete documentation for the GroupBy operation in pandas here.

Additional Resources

The following tutorials explain how to perform other common operations in pandas:

Pandas: How to Calculate Cumulative Sum by Group
Pandas: How to Count Unique Values by Group
Pandas: How to Calculate Correlation By Group

Last update on August 19 2022 21:50:33 (UTC/GMT +8 hours)

DataFrame – mode() function

The mode() function is used to get the mode(s) of each element along the selected axis.

The mode of a set of values is the value that appears most often. It can be multiple values.

Syntax:

DataFrame.mode(self, axis=0, numeric_only=False, dropna=True)

Parameters:

Name Description Type/Default Value Required / Optional
axis    

The axis to iterate over while searching for the mode:

  • 0 or ‘index’ : get mode of each column
  • 1 or ‘columns’ : get mode of each row
{0 or ‘index’, 1 or ‘columns’}
Default Value: 0
Required
numeric_only        If True, only apply to numeric columns. bool
Default Value: False
Required
dropna     Don’t consider counts of NaN/NaT. bool
Default Value: True
Required

Returns: DataFrame
The modes of each column or row.

Example:

Download the Pandas DataFrame Notebooks from here.

Previous: DataFrame – min() function
Next: DataFrame – pct_change() function

We are closing our Disqus commenting system for some maintenanace issues. You may write to us at reach[at]yahoo[dot]com or visit us
at Facebook

Добавить комментарий