Как найти моду numpy

Improve Article

Save Article

Like Article

  • Read
  • Discuss
  • Improve Article

    Save Article

    Like Article

    In this article, we will discuss how to calculate the mode of the Numpy Array.

    Mode refers to the most repeating element in the array. We can find the mode from the NumPy array by using the following methods.

    Method 1: Using scipy.stats package

    Let us see the syntax of the mode() function 

    Syntax :

    variable = stats.mode(array_variable)

    Note : To apply mode we need to create an array. In python, we can create an array using numpy package. So first we need to create an array using numpy package and apply mode() function on that array. Let us see examples for better understanding.

    Example 1:

    Applying on 1-D array

    Python3

    from scipy import stats as st

    import numpy as np

    abc = np.array([1, 1, 2, 2, 2, 3, 4, 5])

    print(st.mode(abc))

    Output :

    ModeResult(mode=array([2]), count=array([3]))

    Example 2:

    Applying on a 2-D array

    Python3

    import numpy as np

    from scipy import stats as st

    arr = np.array([[1, 2, 3, 4, 5],

                    [1, 2, 2, 2, 2],

                    [4, 5, 7, 9, 4],

                    [6, 7, 8, 9, 2],

                    [2, 3, 4, 8, 6]])

    print(st.mode(arr))

    Output :

    ModeResult(mode=array([[1, 2, 2, 9, 2]]), count=array([[2, 2, 1, 2, 2]]))

    Method 2: Using Statistics module

    Like NumPy module, the statistics module also contains statistical functions like mean , median , mode….etc . So let us see an example of a mode using the statistics module.

    Example :

    Python3

    import statistics as st

    import numpy as np

    arr1 = np.array([9, 8, 7, 6, 6, 6, 6, 5, 5, 4,

                     3, 2, 1, 1, 1, 1, 1, 1])

    print(st.mode(arr1))

    Output :

    1

    Method 3: Using user-defined Function

    Here we are not using any predefines functions for getting mode of a series. Let us see an example with demonstrates how to calculate mode without predefined functions.

    Example :

    Python3

    lst = [1, 2, 3, 4, 5, 6, 2, 3, 4, 5, 5, 5, 5]

    def mode(lst):

        freq = {}

        for i in lst:

            freq.setdefault(i, 0)

            freq[i] += 1

        hf = max(freq.values())

        hflst = []

        for i, j in freq.items():

            if j == hf:

                hflst.append(i)

        return hflst

    print(mode(lst))

    Output :

    [5]

    Last Updated :
    12 Jan, 2022

    Like Article

    Save Article

    Update

    The scipy.stats.mode function has been significantly optimized since this post, and would be the recommended method

    Old answer

    This is a tricky problem, since there is not much out there to calculate mode along an axis. The solution is straight forward for 1-D arrays, where numpy.bincount is handy, along with numpy.unique with the return_counts arg as True. The most common n-dimensional function I see is scipy.stats.mode, although it is prohibitively slow- especially for large arrays with many unique values. As a solution, I’ve developed this function, and use it heavily:

    import numpy
    
    def mode(ndarray, axis=0):
        # Check inputs
        ndarray = numpy.asarray(ndarray)
        ndim = ndarray.ndim
        if ndarray.size == 1:
            return (ndarray[0], 1)
        elif ndarray.size == 0:
            raise Exception('Cannot compute mode on empty array')
        try:
            axis = range(ndarray.ndim)[axis]
        except:
            raise Exception('Axis "{}" incompatible with the {}-dimension array'.format(axis, ndim))
    
        # If array is 1-D and numpy version is > 1.9 numpy.unique will suffice
        if all([ndim == 1,
                int(numpy.__version__.split('.')[0]) >= 1,
                int(numpy.__version__.split('.')[1]) >= 9]):
            modals, counts = numpy.unique(ndarray, return_counts=True)
            index = numpy.argmax(counts)
            return modals[index], counts[index]
    
        # Sort array
        sort = numpy.sort(ndarray, axis=axis)
        # Create array to transpose along the axis and get padding shape
        transpose = numpy.roll(numpy.arange(ndim)[::-1], axis)
        shape = list(sort.shape)
        shape[axis] = 1
        # Create a boolean array along strides of unique values
        strides = numpy.concatenate([numpy.zeros(shape=shape, dtype='bool'),
                                     numpy.diff(sort, axis=axis) == 0,
                                     numpy.zeros(shape=shape, dtype='bool')],
                                    axis=axis).transpose(transpose).ravel()
        # Count the stride lengths
        counts = numpy.cumsum(strides)
        counts[~strides] = numpy.concatenate([[0], numpy.diff(counts[~strides])])
        counts[strides] = 0
        # Get shape of padded counts and slice to return to the original shape
        shape = numpy.array(sort.shape)
        shape[axis] += 1
        shape = shape[transpose]
        slices = [slice(None)] * ndim
        slices[axis] = slice(1, None)
        # Reshape and compute final counts
        counts = counts.reshape(shape).transpose(transpose)[slices] + 1
    
        # Find maximum counts and return modals/counts
        slices = [slice(None, i) for i in sort.shape]
        del slices[axis]
        index = numpy.ogrid[slices]
        index.insert(axis, numpy.argmax(counts, axis=axis))
        return sort[index], counts[index]
    

    Result:

    In [2]: a = numpy.array([[1, 3, 4, 2, 2, 7],
                             [5, 2, 2, 1, 4, 1],
                             [3, 3, 2, 2, 1, 1]])
    
    In [3]: mode(a)
    Out[3]: (array([1, 3, 2, 2, 1, 1]), array([1, 2, 2, 2, 1, 2]))
    

    Some benchmarks:

    In [4]: import scipy.stats
    
    In [5]: a = numpy.random.randint(1,10,(1000,1000))
    
    In [6]: %timeit scipy.stats.mode(a)
    10 loops, best of 3: 41.6 ms per loop
    
    In [7]: %timeit mode(a)
    10 loops, best of 3: 46.7 ms per loop
    
    In [8]: a = numpy.random.randint(1,500,(1000,1000))
    
    In [9]: %timeit scipy.stats.mode(a)
    1 loops, best of 3: 1.01 s per loop
    
    In [10]: %timeit mode(a)
    10 loops, best of 3: 80 ms per loop
    
    In [11]: a = numpy.random.random((200,200))
    
    In [12]: %timeit scipy.stats.mode(a)
    1 loops, best of 3: 3.26 s per loop
    
    In [13]: %timeit mode(a)
    1000 loops, best of 3: 1.75 ms per loop
    

    EDIT: Provided more of a background and modified the approach to be more memory-efficient

    Numpy Mean, Numpy Median, Numpy Mode, Numpy Standard Deviation in Python
    Numpy Mean, Numpy Median, Numpy Mode, Numpy Standard Deviation in Python

    Contents

    • 1 Introduction
      • 1.1 Importing Numpy Library
    • 2 Numpy Mean : np.mean()
      • 2.1 Syntax
      • 2.2 Example 1 : Basic example of np.mean() function
      • 2.3 Example 2 : Using ‘axis’ parameter of np.mean() function as ‘0’
      • 2.4 Example 3 : Using ‘axis’ parameter of np.mean() function as ‘1’
      • 2.5 Example 4: Striving for more accurate results
    • 3 Numpy Mode
      • 3.1 Syntax
      • 3.2 Example 1: Basic example of finding mode of numpy array
      • 3.3 Example 2 : Putting axis=None in scipy mode function
    • 4 Numpy Median : np.median()
      • 4.1 Syntax
      • 4.2 Example 1 : Basic example of np.median() function
      • 4.3 Example 2 : Using ‘axis’ parameter value as ‘0’
      • 4.4 Example 3 : Using ‘axis’ parameter value as ‘1’
    • 5 Numpy Standard Deviation : np.std()
      • 5.1 Syntax
      • 5.2 Example 1 : Basic example of np.std() function
      • 5.3 Example 2: Using axis parameter value as ‘0’
      • 5.4 Example 3: Using axis parameter value as ‘1’
    • 6 Conclusion

    Introduction

    While doing your data science or machine learning projects, you would often be required to carry out some statistical operations. In this tutorial, we will cover numpy statistical functions numpy mean, numpy mode, numpy median and numpy standard deviation. All of these statistical functions help in better understanding of data and also facilitates in deciding what actions should be taken further on data.

    Importing Numpy Library

    We will start with the import of numpy library

    Commencing this tutorial with the mean function.

    The numpy mean function is used for computing the arithmetic mean of the input values. Arithmetic mean is the sum of the elements along the axis divided by the number of elements.

    We will now look at the syntax of numpy.mean() or np.mean().

    Ad

    Deep Learning Specialization on Coursera

    Syntax

    numpy.mean(a, axis=some_value, dtype=some_value, out=some_value, keepdims=some_value)

    a : array-like – Array containing numbers whose mean is desired. If a is not an array, a conversion is attempted.

    axis : None or int or tuple of ints (optional) – This consits of axis or axes along which the means are computed.

    dtype : data-type (optional) – It is the type used in computing the mean. For integer inputs, the default is float64; for floating point inputs, it is the same as the input dtype.

    out : ndarray (optional) – This is the alternate output array in which to place the result. The default is None; if provided, it must have the same shape as the expected output

    keepdims : bool (optional) – If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array. If the default value is passed, then keepdims will not be passed through to the mean method of sub-classes of ndarray

    The output of numpy mean function is also an array, if out=None then a new array is returned containing the mean values, otherwise a reference to the output array is returned.

    Example 1 : Basic example of np.mean() function

    Here we have used a multi-dimensional array to find the mean.

    In [2]:

    a = np.array([[7, 2], [5, 4]])
    a
    

    Example 2 : Using ‘axis’ parameter of np.mean() function as ‘0’

    In this example, we can see that when the axis value is ‘0’, then mean of 7 and 5 and then mean of 2 and 4 is calculated.

    Example 3 : Using ‘axis’ parameter of np.mean() function as ‘1’

    When axis value is ‘1’, then mean of 7 and 2 and then mean of 5 and 4 is calculated.

    Example 4: Striving for more accurate results

    Here we will look how altering dtype values helps in achieving more precision in results.

    Finding mean through single precision is less accurate i.e. using dtype value as float32.

    First we have created a 2-D array of zeros with 512*512 values

    In [6]:

    a = np.zeros((2, 512*512), dtype=np.float32)
    a
    

    Out[6]:

    array([[0., 0., 0., ..., 0., 0., 0.],
           [0., 0., 0., ..., 0., 0., 0.]], dtype=float32)

    We have used slicing to fill the values in the array in first row and all columns

    Out[7]:

    array([[1., 1., 1., ..., 1., 1., 1.],
           [0., 0., 0., ..., 0., 0., 0.]], dtype=float32)

    Again slicing is used to fill the values in the second row and all the columns onwards

    Out[8]:

    array([[1. , 1. , 1. , ..., 1. , 1. , 1. ],
           [0.1, 0.1, 0.1, ..., 0.1, 0.1, 0.1]], dtype=float32)

    Finding mean through dtype value as float64. The answers are more accurate through this.

    In [10]:

    np.mean(a, dtype=np.float64)
    

    The next statistical function which we’ll learn is mode for numpy array.

    Numpy Mode

    One thing which should be noted is that there is no in-built function for finding mode using any numpy function. For this, we will use scipy library. First we will create numpy array and then we’ll execute the scipy function over the array.

    Syntax

    Now we will go over scipy mode function syntax and understand how it operates over a numpy array.

    scipy.stats.mode(a, axis=0, nan_policy=’propagate’)

    a : array-like – This consists of n-dimensional array of which we have to find mode(s).

    axis – int or None (optional) – This is the axis along which to operate. Default is 0. If None, computing mode over the whole array a

    nan_policy – {‘propagate’, ‘raise’, ‘omit’} (optional) – This defines how to handle when input contains nan. The following options are available default is propagate which returns nan, raise throws an error and omit performs the calculations ignoring nan values.

    As output, two different types of values are produced. First is the mode which is of ndarray type and it consists of array of modal values. The second is count which is again of ndarray type consisting of array of counts for each mode.

    Example 1: Basic example of finding mode of numpy array

    Here we are using default axis value as ‘0’.

    In [11]:

    a = np.array([[7, 1, 1, 7],
                 [9, 4, 3, 8],
                 [6, 1, 9, 7],
                 [9, 7, 2, 5],
                 [5, 1, 5, 9]])
    a
    

    Out[11]:

    array([[7, 1, 1, 7],
           [9, 4, 3, 8],
           [6, 1, 9, 7],
           [9, 7, 2, 5],
           [5, 1, 5, 9]])

    In this example, the mode is calculated over columns. This is the reason, we have 4 different values, one for each column. As you can see in the first column ‘9’ is appearing 2 times and thus it is the mode. Similarly, we have 1 as the mode for the second column and 7 as the mode for last i.e. fourth column.

    In the case of third column, you would note that there is no mode value, so the least value is considered as the mode and that’s why we have 1 as the mode of third column.

    In [12]:

    from scipy import stats
    stats.mode(a)
    

    Out[12]:

    ModeResult(mode=array([[9, 1, 1, 7]]), count=array([[2, 3, 1, 2]]))

    Example 2 : Putting axis=None in scipy mode function

    When we put axis value as None in scipy mode function. In this case, mode is calculated for the complete array and this is the reason, 1 is the mode value with count as 4

    Out[13]:

    ModeResult(mode=array([1]), count=array([4]))

    Continuing our statistical operations tutorial, we will now look at numpy median function

    Numpy Median : np.median()

    The numpy median function helps in finding the middle value of a sorted array.

    Syntax

    numpy.median(a, axis=None, out=None, overwrite_input=False, keepdims=False)

    a : array-like – Input array or object that can be converted to an array, values of this array will be used for finding the median.

    axis : int or sequence of int or None (optional) – Axis or axes along which the medians are computed. The default is to compute the median along a flattened version of the array.

    out : ndarray (optional) – This is the alternate output array in which to place the result. The default is None; if provided, it must have the same shape as the expected output

    overwrite_input : bool (optional) – If True, then allow use of memory of input array a for calculations. The default value is false.

    keepdims – bool (optional) – If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original arr.

    Numpy median function returns a new array holding the result. If the input contains integers or floats smaller than float64, then the output data-type is np.float64. Otherwise, the data-type of the output is the same as that of the input.

    Example 1 : Basic example of np.median() function

    When we use the default value for numpy median function, the median is computed for flattened version of array. The below array is converted to 1-D array in sorted manner. So the array look like this : [1,5,6,7,8,9]. So the final result is 6.5.

    In [14]:

    a = np.array([[5, 8, 1], [7, 9, 6]])
    a
    

    Out[14]:

    array([[5, 8, 1],
           [7, 9, 6]])

    Example 2 : Using ‘axis’ parameter value as ‘0’

    Here, with axis = 0 the median results are of pairs 5 and 7, 8 and 9 and 1 and 6.

    Example 3 : Using ‘axis’ parameter value as ‘1’

    For axis=1, the median values are obtained through 2 different arrays i.e. [1,5,8] and [6,7,9].

    The last statistical function which we’ll cover in this tutorial is standard deviation.

    Numpy Standard Deviation : np.std()

    Numpy standard deviation function is useful in finding the spread of a distribution of array values. Let’s look at the syntax of numpy.std() to understand about it parameters.

    Syntax

    numpy.std(a, axis=None, dtype=None, out=None, ddof=0, keepdims=some_value)

    a : array-like – Input array or object that can be converted to an array, values of this array will be used for finding the median.

    axis : int or sequence of int or None (optional) – Axis or axes along which the medians are computed. The default is to compute the median along a flattened version of the array.

    out : ndarray (optional) – Alternative output array in which to place the result. It must have the same shape as the expected output.

    ddof : int (optional) – This means delta degrees of freedom. The divisor used in calculations is N – ddof, where N represents the number of elements. By default ddof is zero.

    keepdims – bool (optional) – If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original arr.

    The np.std() returns standard deviation in the form of new array if out parameter is None, otherwise return a reference to the output array.

    Example 1 : Basic example of np.std() function

    In this example, we are using 2-dimensional arrays for finding standard deviation. Here the default value of axis is used, due to this the multidimensional array is converted to flattened array.

    In [19]:

    a = np.array([[7, 9], [8, 4]])
    a
    

    Example 2: Using axis parameter value as ‘0’

    Here the standard deviation is calculated column-wise. So the pairs created are 7 and 8 and 9 and 4.

    Example 3: Using axis parameter value as ‘1’

    Here the standard deviation is calculated row-wise. So the pairs created are 7 and 9 and 8 and 4.

    Numpy is the best python package for doing complex mathematical calculations. It has many functions for array creation and manipulation. In this entire tutorial, you will know how to find a mode of a NumPy array in python using various examples.

    What is a Mode?

    A mode is generally used to find the most occurrences of the data points in a dataset.  Datasets can have one mode, two-mode, or no mode at all.

    In this section, you will know the various examples of how to find a mode of an array. However you can use your own numeric datasets, but for simplicity, I am finding mode in a sample NumPy array. Make sure you must have properly installed NumPy in your system.

    Example 1: Find mode on 1 D Numpy array

    In this example, I will find mode on a single-dimensional NumPy array. First I will create a Single dimension NumPy array and then import the mode() function from scipy. Execute the below lines of code to calculate the mode of 1d array.

    import numpy as np
    from scipy import stats
    array_1d = np.array([1,2,3,2,4,5,5,5])
    print(stats.mode(array_1d))

    Output

    Mode of a Single Dimensional Numpy array

    Mode of a Single Dimensional array

    Here you can see the occurrence of 5 is more than any other elements. That’s why this array has mode  5.

    Example 2: Finding mode on 2 D Numpy array

    In the next example, I will create two dimensional NumPy array and use the stats.mode() method on that array. There are two ways you can find mode on a 2D Numpy array. One is finding mode for each row-wise and the other is finding mode on entire array. Let’s explore each of them.

    Finding mode rowwise

    To find mode rowise you have to set the axis as zero value. It will find the array of modes for each column. Run the below lines of code and see the output.

    import numpy as np
    from scipy import stats
    array_2d = np.array([[1,2,3],[1,5,6],[1,2,10]])
    print(stats.mode(array_2d))

    Output

    Mode of a Multi Dimensional Numpy array

    Mode of a Multi Dimensional array

    Finding Overall Mode

    In the same way, you can find mode for the entire array. To do so you have to set the axis value as None. Just execute the below lines of code and see the output.

    import numpy as np
    from scipy import stats
    array_2d = np.array([[1,2,3],[1,5,6],[1,2,10]])
    print(stats.mode(array_2d,axis=None))

    Output

    Finding overall mode of a Multi Dimensional Numpy array

    Finding the overall mode of a Multi Dimensional array

    Conclusion

    Mode is very useful for finding the measure of the central tendency. You can use it for finding the standard deviation of the dataset. These are the basic example for finding a mode of the array in python. I hope you have liked this tutorial. If you have any questions then you can contact us for more help. In the meantime, you can subscribe to us for quick updates directly in your inbox.

    Source:

    Scipy Documentation

    Join our list

    Subscribe to our mailing list and get interesting stuff and updates to your email inbox.

    We respect your privacy and take protecting it seriously

    Thank you for signup. A Confirmation Email has been sent to your Email Address.

    Something went wrong.


    You can use the following basic syntax to find the mode of a NumPy array:

    #find unique values in array along with their counts
    vals, counts = np.unique(array_name, return_counts=True)
    
    #find mode
    mode_value = np.argwhere(counts == np.max(counts))

    Recall that the mode is the value that occurs most often in an array.

    Note that it’s possible for an array to have one mode or multiple modes.

    The following examples show how to use this syntax in practice.

    Example 1: Calculating Mode of NumPy Array with Only One Mode

    The following code shows how to find the mode of a NumPy array in which there is only one mode:

    import numpy as np
    
    #create NumPy array of values with only one mode
    x = np.array([2, 2, 2, 3, 4, 4, 5, 5, 5, 5, 7])
    
    #find unique values in array along with their counts
    vals, counts = np.unique(x, return_counts=True)
    
    #find mode
    mode_value = np.argwhere(counts == np.max(counts))
    
    #print list of modes
    print(vals[mode_value].flatten().tolist())
    
    [5]
    
    #find how often mode occurs
    print(np.max(counts))
    
    4

    From the output we can see that the mode is 5 and it occurs 4 times in the NumPy array.

    Example 2: Calculating Mode of NumPy Array with Multiple Modes

    The following code shows how to find the mode of a NumPy array in which there are multiple modes:

    import numpy as np
    
    #create NumPy array of values with multiple modes
    x = np.array([2, 2, 2, 3, 4, 4, 4, 5, 5, 5, 7])
    
    #find unique values in array along with their counts
    vals, counts = np.unique(x, return_counts=True)
    
    #find mode
    mode_value = np.argwhere(counts == np.max(counts))
    
    #print list of modes
    print(vals[mode_value].flatten().tolist())
    
    [2, 4, 5]
    
    #find how often mode occurs
    print(np.max(counts))
    
    3

    From the output we can see that this NumPy array has three modes: 2, 4, and 5.

    We can also see that each of these values occurs 3 times in the array.

    Additional Resources

    The following tutorials explain how to perform other common operations in NumPy:

    How to Map a Function Over a NumPy Array
    How to Find Index of Value in NumPy Array
    How to Calculate the Magnitude of a Vector Using NumPy

    Добавить комментарий