Как найти медиану в списке питон

Sometimes, while working with Python list we can have a problem in which we need to find Median of list. This problem is quite common in the mathematical domains and generic calculations. Let’s discuss certain ways in which this task can be performed.
 Method #1 : Using loop + “~” operator This task can be performed in brute force manner using the combination of above functionalities. In this, we sort the list and the by using the property of “~” operator to perform negation, we access the list from front and rear, performing the required computation required for finding median. 

Python3

test_list = [4, 5, 8, 9, 10, 17]

print("The original list : " + str(test_list))

test_list.sort()

mid = len(test_list) // 2

res = (test_list[mid] + test_list[~mid]) / 2

print("Median of list is : " + str(res))

Output

The original list : [4, 5, 8, 9, 10, 17]
Median of list is : 8.5

Time Complexity: O(n) where n is the number of elements in the list “test_list”. loop + “~” operator performs n number of operations.
Auxiliary Space: O(1), constant extra space is required.

  Method #2 : Using statistics.median() This is the most generic method to perform this task. In this we directly use inbuilt function to perform the median of the list. 

Python3

import statistics

test_list = [4, 5, 8, 9, 10, 17]

print("The original list : " + str(test_list))

res = statistics.median(test_list)

print("Median of list is : " + str(res))

Output

The original list : [4, 5, 8, 9, 10, 17]
Median of list is : 8.5

Using python heapq.nlargest() or heapq.nsmallest()

Explanation: Using python’s heapq module, we can use the nlargest() or nsmallest() function to find the median of a list of numbers. This method is useful when we are working with large amount of data and we want to find median of large dataset with minimum memory footprint.

Python3

import heapq

test_list = [4, 5, 8, 9, 10, 17]

print("The original list : " + str(test_list))

mid = len(test_list) // 2

if len(test_list) % 2 == 0:

    res = (heapq.nlargest(mid, test_list)[-1] + heapq.nsmallest(mid, test_list)[-1]) / 2

else:

    res = heapq.nlargest(mid+1, test_list)[-1]

print("Median of list is : " + str(res))

Output

The original list : [4, 5, 8, 9, 10, 17]
Median of list is : 8.5

Time complexity: O(n log(k)) where k = len(test_list)/2
Auxiliary Space: O(k) where k = len(test_list)/2

Method  : Using sort the list:

Python3

test_list = [4, 5, 8, 9, 10, 17]

print("The original list : " + str(test_list))

test_list.sort()

n = len(test_list)

if n % 2 == 0:

    median = (test_list[n//2 - 1] + test_list[n//2]) / 2

else:

    median = test_list[n//2]

print("Median of list is : " + str(median))

Output

The original list : [4, 5, 8, 9, 10, 17]
Median of list is : 8.5

Time complexity: O(n log n)
Auxiliary Space: O(n) 

Last Updated :
12 Apr, 2023

Like Article

Save Article

How do you find the median of a list in Python? The list can be of any size and the numbers are not guaranteed to be in any particular order.

If the list contains an even number of elements, the function should return the average of the middle two.

Here are some examples (sorted for display purposes):

median([1]) == 1
median([1, 1]) == 1
median([1, 1, 2, 4]) == 1.5
median([0, 2, 5, 6, 8, 9, 9]) == 6
median([0, 0, 0, 0, 4, 4, 6, 8]) == 2

Veedrac's user avatar

Veedrac

57.5k14 gold badges111 silver badges168 bronze badges

asked Jun 7, 2014 at 21:04

ChucksPlace's user avatar

2

Python 3.4 has statistics.median:

Return the median (middle value) of numeric data.

When the number of data points is odd, return the middle data point.
When the number of data points is even, the median is interpolated by taking the average of the two middle values:

>>> median([1, 3, 5])
3
>>> median([1, 3, 5, 7])
4.0

Usage:

import statistics

items = [6, 1, 8, 2, 3]

statistics.median(items)
#>>> 3

It’s pretty careful with types, too:

statistics.median(map(float, items))
#>>> 3.0

from decimal import Decimal
statistics.median(map(Decimal, items))
#>>> Decimal('3')

answered Jun 8, 2014 at 0:08

Veedrac's user avatar

VeedracVeedrac

57.5k14 gold badges111 silver badges168 bronze badges

3

(Works with python-2.x):

def median(lst):
    n = len(lst)
    s = sorted(lst)
    return (s[n//2-1]/2.0+s[n//2]/2.0, s[n//2])[n % 2] if n else None

>>> median([-5, -5, -3, -4, 0, -1])
-3.5

numpy.median():

>>> from numpy import median
>>> median([1, -4, -1, -1, 1, -3])
-1.0

For python-3.x, use statistics.median:

>>> from statistics import median
>>> median([5, 2, 3, 8, 9, -2])
4.0

answered Jun 7, 2014 at 23:33

A.J. Uppal's user avatar

A.J. UppalA.J. Uppal

19k6 gold badges45 silver badges76 bronze badges

7

The sorted() function is very helpful for this. Use the sorted function
to order the list, then simply return the middle value (or average the two middle
values if the list contains an even amount of elements).

def median(lst):
    sortedLst = sorted(lst)
    lstLen = len(lst)
    index = (lstLen - 1) // 2
   
    if (lstLen % 2):
        return sortedLst[index]
    else:
        return (sortedLst[index] + sortedLst[index + 1])/2.0

vvvvv's user avatar

vvvvv

23.6k19 gold badges47 silver badges74 bronze badges

answered Jun 7, 2014 at 22:09

swolfe's user avatar

swolfeswolfe

9065 silver badges8 bronze badges

2

Of course in Python3 you can use built in functions, but if you are using Python2 or just would like to create your own you can do something like this. The trick here is to use ~ operator that flip positive number to negative. For instance ~2 -> -3 and using negative in for list in Python will count items from the end. So if you have mid == 2 then it will take third element from beginning and third item from the end.

def median(data):
    data.sort()
    mid = len(data) // 2
    return (data[mid] + data[~mid]) / 2.0

personal_cloud's user avatar

answered Jan 21, 2018 at 17:22

Vlad Bezden's user avatar

Vlad BezdenVlad Bezden

82k24 gold badges246 silver badges179 bronze badges

Here’s a cleaner solution:

def median(lst):
    quotient, remainder = divmod(len(lst), 2)
    if remainder:
        return sorted(lst)[quotient]
    return sum(sorted(lst)[quotient - 1:quotient + 1]) / 2.

Note: Answer changed to incorporate suggestion in comments.

SiHa's user avatar

SiHa

7,61313 gold badges32 silver badges41 bronze badges

answered Apr 25, 2015 at 20:18

Batuhan Ulug's user avatar

2

You can try the quickselect algorithm if faster average-case running times are needed. Quickselect has average (and best) case performance O(n), although it can end up O(n²) on a bad day.

Here’s an implementation with a randomly chosen pivot:

import random

def select_nth(n, items):
    pivot = random.choice(items)

    lesser = [item for item in items if item < pivot]
    if len(lesser) > n:
        return select_nth(n, lesser)
    n -= len(lesser)

    numequal = items.count(pivot)
    if numequal > n:
        return pivot
    n -= numequal

    greater = [item for item in items if item > pivot]
    return select_nth(n, greater)

You can trivially turn this into a method to find medians:

def median(items):
    if len(items) % 2:
        return select_nth(len(items)//2, items)

    else:
        left  = select_nth((len(items)-1) // 2, items)
        right = select_nth((len(items)+1) // 2, items)

        return (left + right) / 2

This is very unoptimised, but it’s not likely that even an optimised version will outperform Tim Sort (CPython’s built-in sort) because that’s really fast. I’ve tried before and I lost.

answered Jun 8, 2014 at 0:49

Veedrac's user avatar

VeedracVeedrac

57.5k14 gold badges111 silver badges168 bronze badges

2

You can use the list.sort to avoid creating new lists with sorted and sort the lists in place.

Also you should not use list as a variable name as it shadows python’s own list.

def median(l):
    half = len(l) // 2
    l.sort()
    if not len(l) % 2:
        return (l[half - 1] + l[half]) / 2.0
    return l[half]

answered Jun 7, 2014 at 22:48

Padraic Cunningham's user avatar

5

def median(x):
    x = sorted(x)
    listlength = len(x) 
    num = listlength//2
    if listlength%2==0:
        middlenum = (x[num]+x[num-1])/2
    else:
        middlenum = x[num]
    return middlenum

Sam Mason's user avatar

Sam Mason

14.8k1 gold badge41 silver badges59 bronze badges

answered Sep 25, 2018 at 18:22

Bulent's user avatar

BulentBulent

911 silver badge1 bronze badge

0

def median(array):
    """Calculate median of the given list.
    """
    # TODO: use statistics.median in Python 3
    array = sorted(array)
    half, odd = divmod(len(array), 2)
    if odd:
        return array[half]
    return (array[half - 1] + array[half]) / 2.0

answered Mar 4, 2016 at 11:50

warvariuc's user avatar

warvariucwarvariuc

56.5k40 gold badges173 silver badges227 bronze badges

A simple function to return the median of the given list:

def median(lst):
    lst = sorted(lst)  # Sort the list first
    if len(lst) % 2 == 0:  # Checking if the length is even
        # Applying formula which is sum of middle two divided by 2
        return (lst[len(lst) // 2] + lst[(len(lst) - 1) // 2]) / 2
    else:
        # If length is odd then get middle value
        return lst[len(lst) // 2]

Some examples with the median function:

>>> median([9, 12, 20, 21, 34, 80])  # Even
20.5
>>> median([9, 12, 80, 21, 34])  # Odd
21

If you want to use library you can just simply do:

>>> import statistics
>>> statistics.median([9, 12, 20, 21, 34, 80])  # Even
20.5
>>> statistics.median([9, 12, 80, 21, 34])  # Odd
21

Asclepius's user avatar

Asclepius

56.2k17 gold badges163 silver badges142 bronze badges

answered Jul 5, 2020 at 23:16

The AG's user avatar

The AGThe AG

6529 silver badges18 bronze badges

0

I posted my solution at Python implementation of “median of medians” algorithm , which is a little bit faster than using sort(). My solution uses 15 numbers per column, for a speed ~5N which is faster than the speed ~10N of using 5 numbers per column. The optimal speed is ~4N, but I could be wrong about it.

Per Tom’s request in his comment, I added my code here, for reference. I believe the critical part for speed is using 15 numbers per column, instead of 5.

#!/bin/pypy
#
# TH @stackoverflow, 2016-01-20, linear time "median of medians" algorithm
#
import sys, random


items_per_column = 15


def find_i_th_smallest( A, i ):
    t = len(A)
    if(t <= items_per_column):
        # if A is a small list with less than items_per_column items, then:
        #
        # 1. do sort on A
        # 2. find i-th smallest item of A
        #
        return sorted(A)[i]
    else:
        # 1. partition A into columns of k items each. k is odd, say 5.
        # 2. find the median of every column
        # 3. put all medians in a new list, say, B
        #
        B = [ find_i_th_smallest(k, (len(k) - 1)/2) for k in [A[j:(j + items_per_column)] for j in range(0,len(A),items_per_column)]]

        # 4. find M, the median of B
        #
        M = find_i_th_smallest(B, (len(B) - 1)/2)


        # 5. split A into 3 parts by M, { < M }, { == M }, and { > M }
        # 6. find which above set has A's i-th smallest, recursively.
        #
        P1 = [ j for j in A if j < M ]
        if(i < len(P1)):
            return find_i_th_smallest( P1, i)
        P3 = [ j for j in A if j > M ]
        L3 = len(P3)
        if(i < (t - L3)):
            return M
        return find_i_th_smallest( P3, i - (t - L3))


# How many numbers should be randomly generated for testing?
#
number_of_numbers = int(sys.argv[1])


# create a list of random positive integers
#
L = [ random.randint(0, number_of_numbers) for i in range(0, number_of_numbers) ]


# Show the original list
#
# print L


# This is for validation
#
# print sorted(L)[int((len(L) - 1)/2)]


# This is the result of the "median of medians" function.
# Its result should be the same as the above.
#
print find_i_th_smallest( L, (len(L) - 1) / 2)

Community's user avatar

answered Jan 21, 2016 at 0:00

user5818263's user avatar

0

In case you need additional information on the distribution of your list, the percentile method will probably be useful. And a median value corresponds to the 50th percentile of a list:

import numpy as np
a = np.array([1,2,3,4,5,6,7,8,9])
median_value = np.percentile(a, 50) # return 50th percentile
print median_value 

answered Apr 22, 2020 at 12:07

Gabriel123's user avatar

Gabriel123Gabriel123

4165 silver badges11 bronze badges

Here what I came up with during this exercise in Codecademy:

def median(data):
    new_list = sorted(data)
    if len(new_list)%2 > 0:
        return new_list[len(new_list)/2]
    elif len(new_list)%2 == 0:
        return (new_list[(len(new_list)/2)] + new_list[(len(new_list)/2)-1]) /2.0

print median([1,2,3,4,5,9])

answered May 27, 2016 at 8:52

BynderRox's user avatar

Just two lines are enough.

def get_median(arr):
    '''
    Calculate the median of a sequence.
    :param arr: list
    :return: int or float
    '''
    arr = sorted(arr)
    return arr[len(arr)//2] if len(arr) % 2 else (arr[len(arr)//2] + arr[len(arr)//2-1])/2

Asclepius's user avatar

Asclepius

56.2k17 gold badges163 silver badges142 bronze badges

answered Sep 17, 2020 at 2:32

Rt.Tong's user avatar

Rt.TongRt.Tong

1863 silver badges5 bronze badges

median Function

def median(midlist):
    midlist.sort()
    lens = len(midlist)
    if lens % 2 != 0: 
        midl = (lens / 2)
        res = midlist[midl]
    else:
        odd = (lens / 2) -1
        ev = (lens / 2) 
        res = float(midlist[odd] + midlist[ev]) / float(2)
    return res

answered May 21, 2015 at 13:55

Юрий Мойдом Киев's user avatar

I had some problems with lists of float values. I ended up using a code snippet from the python3 statistics.median and is working perfect with float values without imports. source

def calculateMedian(list):
    data = sorted(list)
    n = len(data)
    if n == 0:
        return None
    if n % 2 == 1:
        return data[n // 2]
    else:
        i = n // 2
        return (data[i - 1] + data[i]) / 2

answered May 3, 2017 at 16:54

Dan's user avatar

DanDan

7471 gold badge7 silver badges20 bronze badges

def midme(list1):

    list1.sort()
    if len(list1)%2>0:
            x = list1[int((len(list1)/2))]
    else:
            x = ((list1[int((len(list1)/2))-1])+(list1[int(((len(list1)/2)))]))/2
    return x


midme([4,5,1,7,2])

answered Feb 18, 2018 at 18:00

vk123's user avatar

vk123vk123

213 bronze badges

def median(array):
    if len(array) < 1:
        return(None)
    if len(array) % 2 == 0:
        median = (array[len(array)//2-1: len(array)//2+1])
        return sum(median) / len(median)
    else:
        return(array[len(array)//2])

rollstuhlfahrer's user avatar

answered Apr 6, 2018 at 21:55

Luke Willey's user avatar

3

I defined a median function for a list of numbers as

def median(numbers):
    return (sorted(numbers)[int(round((len(numbers) - 1) / 2.0))] + sorted(numbers)[int(round((len(numbers) - 1) // 2.0))]) / 2.0

answered Oct 14, 2014 at 14:12

Fred Beck's user avatar

1

import numpy as np
def get_median(xs):
        mid = len(xs) // 2  # Take the mid of the list
        if len(xs) % 2 == 1: # check if the len of list is odd
            return sorted(xs)[mid] #if true then mid will be median after sorting
        else:
            #return 0.5 * sum(sorted(xs)[mid - 1:mid + 1])
            return 0.5 * np.sum(sorted(xs)[mid - 1:mid + 1]) #if false take the avg of mid
print(get_median([7, 7, 3, 1, 4, 5]))
print(get_median([1,2,3, 4,5]))

answered Aug 26, 2019 at 7:12

A more generalized approach for median (and percentiles) would be:

def get_percentile(data, percentile):
    # Get the number of observations
    cnt=len(data)
    # Sort the list
    data=sorted(data)
    # Determine the split point
    i=(cnt-1)*percentile
    # Find the `floor` of the split point
    diff=i-int(i)
    # Return the weighted average of the value above and below the split point
    return data[int(i)]*(1-diff)+data[int(i)+1]*(diff)

# Data
data=[1,2,3,4,5]
# For the median
print(get_percentile(data=data, percentile=.50))
# > 3
print(get_percentile(data=data, percentile=.75))
# > 4

# Note the weighted average difference when an int is not returned by the percentile
print(get_percentile(data=data, percentile=.51))
# > 3.04

answered May 7, 2020 at 19:46

conmak's user avatar

conmakconmak

1,15010 silver badges13 bronze badges

Try This

import math
def find_median(arr):
    if len(arr)%2==1:
        med=math.ceil(len(arr)/2)-1
        return arr[med]
    else:
        return -1
print(find_median([1,2,3,4,5,6,7,8]))

answered Dec 20, 2021 at 13:32

0xN1nja's user avatar

0xN1nja0xN1nja

8281 gold badge9 silver badges17 bronze badges

1

Implement it:

def median(numbers):
    """
    Calculate median of a list numbers.
    :param numbers: the numbers to be calculated.
    :return: median value of numbers.

    >>> median([1, 3, 3, 6, 7, 8, 9])
    6
    >>> median([1, 2, 3, 4, 5, 6, 8, 9])
    4.5
    >>> import statistics
    >>> import random
    >>> numbers = random.sample(range(-50, 50), k=100)
    >>> statistics.median(numbers) == median(numbers)
    True
    """
    numbers = sorted(numbers)
    mid_index = len(numbers) // 2
    return (
        (numbers[mid_index] + numbers[mid_index - 1]) / 2 if mid_index % 2 == 0
        else numbers[mid_index]
    )


if __name__ == "__main__":
    from doctest import testmod

    testmod()

source from

Asclepius's user avatar

Asclepius

56.2k17 gold badges163 silver badges142 bronze badges

answered Oct 4, 2020 at 16:36

duyuanchao's user avatar

duyuanchaoduyuanchao

3,7531 gold badge25 silver badges16 bronze badges

Function median:

def median(d):
    d=np.sort(d)
    n2=int(len(d)/2)
    r=n2%2
    if (r==0):
        med=d[n2] 
    else:
        med=(d[n2] + d[n2+1]) / 2
    return med

Siong Thye Goh's user avatar

answered Feb 15, 2020 at 11:03

fati's user avatar

1

Simply, Create a Median Function with an argument as a list of the number and call the function.

def median(l):
    l = sorted(l)
    lent = len(l)
    if (lent % 2) == 0:
        m = int(lent / 2)
        result = l[m]
    else:
        m = int(float(lent / 2) - 0.5)
        result = l[m]
    return result

Asclepius's user avatar

Asclepius

56.2k17 gold badges163 silver badges142 bronze badges

answered Apr 27, 2021 at 5:17

Romesh Borawake's user avatar

What I did was this:

def median(a):
    a = sorted(a)
    if len(a) / 2 != int:
        return a[len(a) / 2]
    else:
        return (a[len(a) / 2] + a[(len(a) / 2) - 1]) / 2

Explanation: Basically if the number of items in the list is odd, return the middle number, otherwise, if you half an even list, python automatically rounds the higher number so we know the number before that will be one less (since we sorted it) and we can add the default higher number and the number lower than it and divide them by 2 to find the median.

Asclepius's user avatar

Asclepius

56.2k17 gold badges163 silver badges142 bronze badges

answered Nov 6, 2020 at 6:31

CodingCuber's user avatar

1

Here’s the tedious way to find median without using the median function:

def median(*arg):
    order(arg)
    numArg = len(arg)
    half = int(numArg/2)
    if numArg/2 ==half:
        print((arg[half-1]+arg[half])/2)
    else:
        print(int(arg[half]))

def order(tup):
    ordered = [tup[i] for i in range(len(tup))]
    test(ordered)
    while(test(ordered)):
        test(ordered)
    print(ordered)


def test(ordered):
    whileloop = 0 
    for i in range(len(ordered)-1):
        print(i)
        if (ordered[i]>ordered[i+1]):
            print(str(ordered[i]) + ' is greater than ' + str(ordered[i+1]))
            original = ordered[i+1]
            ordered[i+1]=ordered[i]
            ordered[i]=original
            whileloop = 1 #run the loop again if you had to switch values
    return whileloop

answered Jan 24, 2017 at 19:05

I Like's user avatar

I LikeI Like

1,6812 gold badges25 silver badges50 bronze badges

2

It is very simple;

def median(alist):
    #to find median you will have to sort the list first
    sList = sorted(alist)
    first = 0
    last = len(sList)-1
    midpoint = (first + last)//2
    return midpoint

And you can use the return value like this median = median(anyList)

answered Dec 7, 2018 at 16:11

Farhan's user avatar

FarhanFarhan

1,42315 silver badges23 bronze badges

2

In this tutorial, we will look at how to get the median value of a list of values in Python. We will walk you through the usage of the different methods with the help of examples.

Median of a list in Python.

What is median?

Median is a descriptive statistic that is used as a measure of central tendency of a distribution. It is equal to the middle value of the distribution. There are equal number of values smaller and larger than the median. It is also not much sensitive to the presence of outliers in the data like the mean (another measure of central tendency).

To calculate the median of a list of values –

  1. Sort the values in ascending or descending order (either works).
  2. If the number of values, n, is odd, then the median is the value in the (n+1)/2 position in the sorted list(or array) of values.
    If the number of values, n, is even, then the median is the average of the values in n/2 and n/2 + 1 position in the sorted list(or array) of values.

For example, calculate the median of the following values –

A bunch of numbers whose median is to be calculated.

First, let’s sort these numbers in ascending order.

numbers sorted in ascending order

Now, since the total number of values is even (8), the median is the average of the 4th and the 5th value.

Median calculation

Thus, median comes out to be 3.5

Now that we have seen how is the median mathematically calculated, let’s look at how to compute the median in Python.

To compute the median of a list of values in Python, you can write your own function, or use methods available in libraries like numpy, statistics, etc. Let’s look at these methods with the help of examples.

1. From scratch implementation of median in Python

You can write your own function in Python to compute the median of a list.

def get_median(ls):
    # sort the list
    ls_sorted = ls.sort()
    # find the median
    if len(ls) % 2 != 0:
        # total number of values are odd
        # subtract 1 since indexing starts at 0
        m = int((len(ls)+1)/2 - 1)
        return ls[m]
    else:
        m1 = int(len(ls)/2 - 1)
        m2 = int(len(ls)/2)
        return (ls[m1]+ls[m2])/2

# create a list
ls = [3, 1, 4, 9, 2, 5, 3, 6]
# get the median
print(get_median(ls))

Output:

3.5

Here, we use the list sort() function to sort the list, and then depending upon the length of the list return the median. We get 3.5 as the median, the same we manually calculated above.

Note that, compared to the above function, the libraries you’ll see next are better optimized to compute the median of a list of values.

2. Using statistics library

You can also use the statistics standard library in Python to get the median of a list. Pass the list as argument to the statistics.median() function.

import statistics

# create a list
ls = [3, 1, 4, 9, 2, 5, 3, 6]
# get the median
print(statistics.median(ls))

Output:

3.5

We get the same results as above.

For more on the statistics library in Python, refer to its documentation.

3. Using numpy library

The numpy library’s median() function is generally used to calculate the median of a numpy array. You can also use this function on a Python list.

import numpy as np

# create a list
ls = [3, 1, 4, 9, 2, 5, 3, 6]
print(np.median(ls))

Output:

3.5

You can see that we get the same result.

Subscribe to our newsletter for more informative guides and tutorials.
We do not spam and you can opt out any time.

  • Piyush Raj

    Piyush is a data professional passionate about using data to understand things better and make informed decisions. He has experience working as a Data Scientist in the consulting domain and holds an engineering degree from IIT Roorkee. His hobbies include watching cricket, reading, and working on side projects.

    View all posts

To calculate the median value in Python:

  1. Import the statistics module.
  2. Call the statistics.median() function on a list of numbers.

For example, let’s calculate the median of a list of numbers:

import statistics

numbers = [1, 2, 3, 4, 5, 6, 7]
med = statistics.median(numbers)

print(med)

Output:

4

The median value is a common way to measure the “centrality” of a dataset.

If you are looking for a quick answer, I’m sure the above example will do. But to really learn what median really is, why it is useful, and how to find it, read along.

This is a comprehensive guide to finding the median in Python.

What Is the Median Value in Maths

The Median is the middle value of a given dataset.

If you have a list of 3 numbers, the median is the second number as it is in the middle.

But in case you have a list of 4 values, there is no “middle value”. When calculating the median, of an even-sized dataset, the average of the two middle values is used.

Median odd or even number of items

Why and When Is Median Value Useful

When dealing with statistics, you usually want to have a single number that describes the nature of a dataset.

Think about your school grades for example. Instead of seeing the dozens of grades, you want to know the average (the mean).

Usually, measuring the “centrality” of a dataset means calculating the mean value. But if you have a skewed distribution, the mean value can be unintuitive.

Let’s say you drive to your nearby shopping mall 7 times. Usually, the drive takes around 10 minutes. But one day the traffic jam makes it last 2 hours.

Here is a list of driving times to the mall:

[9, 120, 10, 9, 10, 10, 10]

Now if you take the average of this list, you get ~25 minutes. But how well does this number really describe your trip?

Pretty badly.

As you can see, most of the time the trip takes around 10 minutes.

To better describe the driving time, you should use a median value instead. To calculate the median value, you need to sort the driving times first:

[9, 9, 10, 10, 10, 10, 120]

Then you can choose the middle value, which in this case is 10 minutes. 10 minutes describes your typical trip length way better than 25, right?

The usefulness of calculating the median, in this case, is that the unusually high value of 120 does not matter.

In short, you can calculate the median value when measuring centrality with average is unintuitive.

In Python, you can either create a function that calculates the median or use existing functionality.

How to Implement Median Function in Python

If you want to implement the median function, you need to understand the procedure of finding the median.

The median function works such that it:

  1. Takes a dataset as input.
  2. Sorts the dataset.
  3. Checks if the dataset is odd/even in length.
  4. If the dataset is odd in length, the function picks the mid-value and returns it.
  5. If the dataset is even, the function picks the two mid values, calculates the average, and returns the result.

Here is how it looks in the code:

def median(data):
    sorted_data = sorted(data)
    data_len = len(sorted_data)

    middle = (data_len - 1) // 2

    if middle % 2:
        return sorted_data[middle]
    else:
        return (sorted_data[middle] + sorted_data[middle + 1]) / 2.0

Example usage:

numbers = [1, 2, 3, 4, 5, 6, 7]
med = median(numbers)

print(med)

Output:

4

Now, this is a valid approach if you need to write the median function yourself. But with common maths operations, you should use a built-in function to save time and headaches.

Let’s next take a look at how to calculate the median with a built-in function in Python.

How to Use a Built-In Median Function in Python

In Python, there is a module called statistics. This module contains useful mathematical tools for data science and statistics.

One of the great methods of this module is the median() function.

As the name suggests, this function calculates the median of a given dataset.

To use the median function from the statistics module, remember to import it into your project.

Here is an example of calculating the median for a bunch of numbers:

import statistics

numbers = [1, 2, 3, 4, 5, 6, 7]
med = statistics.median(numbers)

print(med)

Result:

4

Conclusion

Today you learned how to calculate the median value in Python.

To recap, the median value is a way to measure the centrality of a dataset. The Median is useful when the average doesn’t properly describe the dataset and gives falsy results.

To calculate the median in Python, use the built-in median() function from the statistics module.

import statistics

numbers = [1, 2, 3, 4, 5, 6, 7]
med = statistics.median(numbers)

Thanks for reading. Happy coding!

Further Reading

  • Python Tricks
  • How to Write to a File in Python
  • The with Statement in Python

About the Author

I’m an entrepreneur and a blogger from Finland. My goal is to make coding and tech easier for you with comprehensive guides and reviews.

Recent Posts

Problem Formulation

Given a Python list of integer or float numbers.

How to calculate the median of a Python list?

Formally, the median is “the value separating the higher half from the lower half of a data sample” (wiki).

Note that the median is different to the mean or average as can be seen in the following graphic:

If there are an even number of elements in the list (i.e., len(list)%2==0), there is no middle element. In this case, the median can be the average of the two middle elements.

Method 1: statistics.median()

The most straightforward way to get the median of a Python list your_list is to import the statistics library and call statistics.median(your_list). The statistics library is included in the Python standard libraries, so it doesn’t have to be manually installed.

Here’s a simple example:

import statistics


def get_median(lst):
    return statistics.median(lst)


odd = [3, 2, 4, 7, 1]
print(get_median(odd))
# 3


even = [3, 2, 4, 7, 1, 1]
print(get_median(even))
# 2.5

We create two lists:

  • 3 is the median of the list [3, 2, 4, 7, 1] as can be seen in the sorted representation [1, 2, 3, 4, 7].
  • 2.5 is the median of the list [3, 2, 4, 7, 1, 1] as can be seen in the sorted representation [1, 1, 2, 3, 4, 7] and (2+3)/2 is 2.5.

Method 2: No Library Approach

To get the median of a Python list without library support, perform the following three steps:

  • Sort the list.
  • Get the index of the left mid element.
  • Average the left and right mid elements.

This is done in the three Python lines:

  • tmp = sorted(lst)
  • mid = len(tmp) // 2
  • res = (tmp[mid] + tmp[-mid-1]) / 2

The third line contains the median of the Python list. This works for lists both with an even and an odd number of elements.

We use negative list indexing to access the right mid element. If the list has an odd number of elements, the left and right mid indices are actually the same in which case the value of the single mid element is returned.

Here’s an example:

def get_median(lst):
    tmp = sorted(lst)
    mid = len(tmp) // 2
    return (tmp[mid] + tmp[-mid-1]) / 2


odd = [3, 2, 4, 7, 1]
print(get_median(odd))
# 3


even = [3, 2, 4, 7, 1, 1]
print(get_median(even))
# 2.5

It should be noted that the naive approach of not averaging the two mid elements in the case of a list with an even number of elements is often sufficient too:

Method 3: Naive No-Library Approach

If you’re okay with returning the first mid element when searching the median of a list with an even number of elements, you can use the following approach:

  • Sort the list.
  • Get the index of the left mid element (in case the list length is even) and the index of the single mid element (in case the length of the list is odd).
  • Return the median by accessing the mid element in the sorted list.

In particular, the three lines in Python do the job:

  • tmp = sorted(lst)
  • mid = len(tmp) // 2
  • res = tmp[mid]

The variable res contains the median of the list.

Here’s an example:

def get_median(lst):
    tmp = sorted(lst)
    mid = len(tmp) // 2
    return tmp[mid]


odd = [3, 2, 4, 7, 1]
print(get_median(odd))
# 3


even = [3, 2, 4, 7, 1, 1]
print(get_median(even))
# 3

Please note that this is not necessarily the statistical sound way of calculating the median for a list with an even number of elements.

Method 4: np.median()

You can get the median of a Python list your_list by importing the numpy library and call numpy.median(your_list).

Here’s a simple example of how we use NumPy to calculate the median of a Python list:

import numpy as np


def get_median(lst):
    return np.median(lst)


odd = [3, 2, 4, 7, 1]
print(get_median(odd))
# 3.0


even = [3, 2, 4, 7, 1, 1]
print(get_median(even))
# 2.5

We create two lists:

  • 3 is the median of the list [3, 2, 4, 7, 1] as can be seen in the sorted representation [1, 2, 3, 4, 7]. NumPy converts all outputs to float if possible.
  • 2.5 is the median of the list [3, 2, 4, 7, 1, 1] as can be seen in the sorted representation [1, 1, 2, 3, 4, 7] and (2+3)/2 is 2.5.

What’s the difference between numpy.median() and statistics.median()

Unlike the statistics library, the numpy library is not included in the Python standard libraries, so it must be manually installed if you haven’t already.

That’s why I recommend using statistics.median() rather than numpy.median() if all you want to do is calculating the median of a Python list.

Also, statistics.median() returns an integer value for integer lists with an odd number of elements whereas numpy.median() always returns a float. Otherwise, both functions are the same.

Related Tutorial: How to Install NumPy in Python?

Method 5: np.percentile()

A generalized approach to calculating the median of a list my_list of numbers is to use the np.percentile(my_list, 50) function that returns the exact 50th percentile of the list. The 50th percentile is the median.

Definition: 50th Percentile – Also known as the Median. The median cuts the data set in half. Half of the answers lie below the median and half lie above the median. (source)

Here’s the code example:

import numpy as np


def get_median(lst):
    return np.percentile(lst, 50)


odd = [3, 2, 4, 7, 1]
print(get_median(odd))
# 3.0


even = [3, 2, 4, 7, 1, 1]
print(get_median(even))
# 2.5

Method 6: Basic Python in Multiple Lines

A simple approach to finding the median of a Python list is to handle evenly-sized and oddly-sized lists differently after sorting the list:

  • If the list has an odd number of elements, return the median right away by using len(l)//2 to get the index of the mid element.
  • Otherwise, average the two elements in the middle of the sorted list.

Here’s the code snippet that implements this approach — comments for explanation of the relevant parts:

def get_median(lst):
    l = sorted(lst)
    mid = len(l) // 2
    if len(lst)%2:
        # list is odd-sized:
        # single median exists
        return l[mid]
    else:
        # list is evenly-sized:
        # average two mid values
        return (l[mid-1]+l[mid])/2


odd = [3, 2, 4, 7, 1]
print(get_median(odd))
# 3.0


even = [3, 2, 4, 7, 1, 1]
print(get_median(even))
# 2.5

👉 Recommended: Find the Index of the Median in Python

Related Video – Finding the Median of a Python List

How to Find the Median of a List in Python TUTORIAL (Common Python Interview Question)

Where to Go From Here?

Enough theory. Let’s get some practice!

Coders get paid six figures and more because they can solve problems more effectively using machine intelligence and automation.

To become more successful in coding, solve more real problems for real people. That’s how you polish the skills you really need in practice. After all, what’s the use of learning theory that nobody ever needs?

You build high-value coding skills by working on practical coding projects!

Do you want to stop learning with toy projects and focus on practical code projects that earn you money and solve real problems for people?

🚀 If your answer is YES!, consider becoming a Python freelance developer! It’s the best way of approaching the task of improving your Python skills—even if you are a complete beginner.

If you just want to learn about the freelancing opportunity, feel free to watch my free webinar “How to Build Your High-Income Skill Python” and learn how I grew my coding business online and how you can, too—from the comfort of your own home.

Join the free webinar now!

While working as a researcher in distributed systems, Dr. Christian Mayer found his love for teaching computer science students.

To help students reach higher levels of Python success, he founded the programming education website Finxter.com that has taught exponential skills to millions of coders worldwide. He’s the author of the best-selling programming books Python One-Liners (NoStarch 2020), The Art of Clean Code (NoStarch 2022), and The Book of Dash (NoStarch 2022). Chris also coauthored the Coffee Break Python series of self-published books. He’s a computer science enthusiast, freelancer, and owner of one of the top 10 largest Python blogs worldwide.

His passions are writing, reading, and coding. But his greatest passion is to serve aspiring coders through Finxter and help them to boost their skills. You can join his free email academy here.

Добавить комментарий