Как найти индекс строки по значению pandas

У меня есть массив. Я хочу получить индекс строки, в котором находится искомое значение.

Начало Конец Продолжительность Сумма
2020-01-01 2020-03-15 75 -961.4
2020-03-16 2020-03-16 1 0.2
2020-03-17 2020-03-29 13 -86.1
2020-03-30 2020-03-30 1 1.0
2020-03-31 2020-04-01 2 -6.8
2020-04-02 2020-10-06 188 2287.6
2020-10-07 2020-10-13 7 -18.9
2020-10-14 2020-10-18 5 4.2
2020-10-19 2020-10-21 3 -3.8
2020-10-22 2020-10-22 1 1.9
2020-10-23 2020-11-07 16 -114.3
2020-11-08 2020-11-10 3 3.5
2020-11-11 2020-12-31 51 -962.4

Я обращаюсь к таблице, чтобы вернуть значение индекса. Например хочу знать какой индекс имеет строка, где находить значение “Сумма” равное 2287.6. Оно должно мне возвращать 5.

ind = df[df['Сумма'] == 2287.6].index.values.astype(int)

Но мне возвращается просто пустой список: [].

В чем может быть проблема? Также, как можно получить значение из другой колонки, которое находится на этой же строке?

MaxU - stand with Ukraine's user avatar

задан 22 дек 2021 в 8:22

Дмитрий Емельянов's user avatar

6

Вы можете сначала преобразовать столбцы к нужному типу данных, а потом делать поиск:

df["Сумма"] = pd.to_numeric(df["Сумма"], errors="coerce")
idx = df[df['Сумма'] == 2287.6].index
print(idx.to_list())
# [5]

получить значение по найденному индексу из другого столбца – возвращается объект типа pandas.Series:

In [89]: print(df.loc[idx, "Начало"])
5    2020-04-02
Name: Начало, dtype: object

чтобы вернуть скалярное значение:

In [91]: print(df.at[idx[0], "Начало"])
# 2020-04-02

PS официальная документация об индексировании и доступе к данным в Pandas

ответ дан 22 дек 2021 в 20:59

MaxU - stand with Ukraine's user avatar

  • Редакция Кодкампа

17 авг. 2022 г.
читать 1 мин


Вы можете использовать следующий синтаксис, чтобы получить индекс строк в pandas DataFrame, столбец которого соответствует определенным значениям:

df.index [df['column_name']== value ]. tolist ()

В следующих примерах показано, как использовать этот синтаксис на практике со следующими пандами DataFrame:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'B', 'B', 'C', 'C', 'D'],
 'points': [5, 7, 7, 9, 12, 9, 9, 4],
 'rebounds': [11, 8, 10, 6, 6, 5, 9, 12]})

#view DataFrame
df

team points rebounds
0 A 5 11
1 A 7 8
2 A 7 10
3 B 9 6
4 B 12 6
5 C 9 5
6 C 9 9
7 D 4 12

Пример 1. Получение индекса строк, столбец которых соответствует значению

В следующем коде показано, как получить индекс строк, в которых один столбец равен определенному значению:

#get index of rows where 'points' column is equal to 7
df.index [df['points']== 7 ]. tolist ()

[1, 2]

Это говорит нам о том, что строки со значениями индекса 1 и 2 имеют значение «7» в столбце точек.

Обратите внимание, что мы также можем использовать операторы «меньше» и «больше», чтобы найти индекс строк, в которых один столбец меньше или больше определенного значения:

#get index of rows where 'points' column is greater than 7
df.index [df['points']> 7 ]. tolist ()

[3, 4, 5, 6]

Это говорит нам о том, что строки со значениями индекса 3 , 4 , 5 и 6 имеют значение больше 7 в столбце точек.

Пример 2. Получение индекса строк, столбец которых соответствует строке

В следующем коде показано, как получить индекс строк, в которых один столбец равен определенной строке:

#get index of rows where 'team' column is equal to 'B'
df.index [df['team']=='B']. tolist ()

[3, 4]

Это говорит нам о том, что строки со значениями индекса 3 и 4 имеют значение «B» в столбце команды.

Пример 3. Получение индекса строк с несколькими условиями

В следующем коде показано, как получить индекс строк, в которых значения в нескольких столбцах соответствуют определенным условиям:

#get index of rows where 'points' is equal to 7 *or* 12
df.index [(df['points']== 7 ) |(df['points']== 12 )]. tolist ()

[1, 2, 4]

#get index of rows where 'points' is equal to 9 *and* 'team' is equal to 'B'
df.index [(df['points']== 9 ) &(df['team']=='B')]. tolist ()

[3]

Дополнительные ресурсы

Как получить значение ячейки из Pandas DataFrame
Как переименовать индекс в Pandas DataFrame
Как сортировать столбцы по имени в Pandas

How can I get the number of the row in a dataframe that contains a certain value in a certain column using Pandas? For example, I have the following dataframe:

     ClientID  LastName
0    34        Johnson
1    67        Smith
2    53        Brows  

How can I find the number of the row that has ‘Smith’ in ‘LastName’ column?

asked Apr 3, 2017 at 20:42

sprogissd's user avatar

2

Note that a dataframe’s index could be out of order, or not even numerical at all. If you don’t want to use the current index and instead renumber the rows sequentially, then you can use df.reset_index() together with the suggestions below

To get all indices that matches ‘Smith’

>>> df[df['LastName'] == 'Smith'].index
Int64Index([1], dtype='int64')

or as a numpy array

>>> df[df['LastName'] == 'Smith'].index.to_numpy()  # .values on older versions
array([1])

or if there is only one and you want the integer, you can subset

>>> df[df['LastName'] == 'Smith'].index[0]
1

You could use the same boolean expressions with .loc, but it is not needed unless you also want to select a certain column, which is redundant when you only want the row number/index.

answered Apr 3, 2017 at 20:48

joelostblom's user avatar

joelostblomjoelostblom

42.3k17 gold badges144 silver badges157 bronze badges

3

df.index[df.LastName == 'Smith']

Or

df.query('LastName == "Smith"').index

Will return all row indices where LastName is Smith

Int64Index([1], dtype='int64')

answered Apr 3, 2017 at 20:48

piRSquared's user avatar

piRSquaredpiRSquared

283k57 gold badges468 silver badges617 bronze badges

1

df.loc[df.LastName == 'Smith']

will return the row

    ClientID    LastName
1   67          Smith

and

df.loc[df.LastName == 'Smith'].index

will return the index

Int64Index([1], dtype='int64')

NOTE: Column names ‘LastName’ and ‘Last Name’ or even ‘lastname’ are three unique names. The best practice would be to first check the exact name using df.columns. If you really need to strip the column names of all the white spaces, you can first do

df.columns = [x.strip().replace(' ', '') for x in df.columns]

answered Apr 3, 2017 at 20:49

Vaishali's user avatar

VaishaliVaishali

37.2k5 gold badges58 silver badges86 bronze badges

4

 len(df[df["Lastname"]=="Smith"].values)

answered Sep 5, 2018 at 13:13

Veera Samantula's user avatar

If the index of the dataframe and the ordinal number of the rows differ, most solutions posted here won’t work anymore. Given your dataframe with an alphabetical index:

In [2]: df = pd.DataFrame({"ClientID": {"A": 34, "B": 67, "C": 53}, "LastName": {"A": "Johnson", "B": "Smith", "C": "Brows"}})

In [3]: df
Out[3]: 
   ClientID LastName
A        34  Johnson
B        67    Smith
C        53    Brows

You have to use get_loc to access the ordinal row number:

In [4]: df.index.get_loc(df.query('LastName == "Smith"').index[0])
Out[4]: 1

If there may exist multiple rows where the condition holds, e.g. find the ordinal row numbers that have ‘Smith’ or ‘Brows’ in LastName column, you can use list comprehensions:

In [5]: [df.index.get_loc(idx) for idx in df.query('LastName == "Smith" | LastName == "Brows"').index]
Out[5]: [1, 2]

answered Jun 4, 2022 at 7:43

rachwa's user avatar

rachwarachwa

1,5641 gold badge13 silver badges17 bronze badges

1

count_smiths = (df['LastName'] == 'Smith').sum()

answered Apr 3, 2017 at 20:49

Scott Boston's user avatar

Scott BostonScott Boston

145k15 gold badges136 silver badges181 bronze badges

1

You can simply use shape method
df[df['LastName'] == 'Smith'].shape

Output
(1,1)

Which indicates 1 row and 1 column. This way you can get the idea of whole datasets

Let me explain the above code
DataframeName[DataframeName['Column_name'] == 'Value to match in column']

answered Apr 30, 2020 at 1:46

rogercake's user avatar

I know it’s many years later but don’t try the above solutions without reindexing your dataframe first. As many have pointed out already the number you see to the left of the dataframe 0,1,2 in the initial question is the index INSIDE that dataframe. When you extract a subset of it with a condition you might end up with 0,2 or 2,1, or 2,1 or 2,1,0 depending your condition. So by using that number (called “index”) you will not get the position of the row in the subset. You will get the position of that row inside the main dataframe.

use:

np.where([df['LastName'] == 'Smith'])[1][0]

and play with the string ‘Smith’ to see the various outcomes. Where will return 2 arrays. The 2nd one (index 1) is the one you care about.

NOTE:
When the value you search for does not exist where() will return 0 on [1][0]. When is the first value of the list it will also return 0 on [1][0]. Make sure you validate the existence first.

NOTE #2:
In case the same value as in your condition is present in the subset multiple times on [1] with will find the list with the position of all occurrences. You can use the length of [1] for future processing if needed.

answered Jan 6, 2022 at 18:56

Gabriel Cliseru's user avatar

If in the question “row number” means actual row number/position (rather than index label)
pandas.Index.get_loc(key, method=None, tolerance=None)
seems to be the answer, ie something like:

row_number = df.index.get_loc(df.query(f'numbers == {m}').index[0])  

The current answers, except one, explain how to get the index label rather than the row number.
Trivial code with index lables not corresponding to row numbers:

import pandas as pd

n = 3; m = n-1

df = pd.DataFrame({'numbers' : range(n) },
    index = range(n-1,-1,-1))
print(df,"n")

label =      df[df['numbers'] == m].index[0]
row_number = df.index.get_loc(df.query(f'numbers == {m}').index[0])

print(f'index label: {label}nrow number:  {row_number}',"n")
print(f"df.loc[{label},'numbers']: {df.loc[label, 'numbers']}")
print(f"df.iloc[{row_number}, 0]:       {df.iloc[row_number, 0]}")
   numbers
2        0
1        1
0        2 

index label: 0
row number:  2 

df.loc[0,'numbers']: 2
df.iloc[2, 0]:       2

answered Aug 9, 2022 at 20:17

user778806's user avatar

  1. To get exact row-number of single occurrence

row-number = df[df["LastName" == 'Smith']].index[0]

  1. To get exact row-number of multiple occurrence of ‘Smith’

row-number = df[df["LastName" == 'Smith']].index.tolist()

answered Nov 30, 2022 at 15:57

dataninsight's user avatar

In this post, we are going to understand how to find an index of the value in pandas dataframe in Python that includes an index of all rows and rows in which column matches with specific conditions.

Dataframe.Index property


To find the index of rows in the pandas dataframe. index property is used. The dataframe. index returns the row label of the dataframe as an object. The individual property is to be accessed by using a loop.

Syntax

dataframe.index

Let us understand by using index property with loop and without loop.

Python Program to find indexes of all rows

#Python Program  to find indexes  of all rows
import pandas as pd
  
Student_dict = {
    'Name': ['Jack', 'Rack', 'Max', 'David'],
    'Marks':[99,98, 100,100],
    'Subject': ['Math', 'Math', 'Math', 'Physic']
}
  
df = pd.DataFrame(Student_dict)

indexes = df.index
print('index object of Dataframe:n',indexes)

print('n Access each index by using loop:')

for index in indexes:
    
  print(index)

output

index object of Dataframe:
 RangeIndex(start=0, stop=4, step=1)

Access each index by using loop:
0
1
2
3

1. df.loc[] to find Row index which column match value


The pandas dataframe. loc method is used to access the row and column by index(label) and column name that is passed by the column label(Marks) to df. loc[df[‘Marks’] = 100 and it will return the rows which satisfy the given condition.

Python Program Example

#python program to find Row index which column match value
import pandas as pd
  
Student_dict = {
    'Name': ['Jack', 'Rack', 'Max', 'David'],
    'Marks':[99,98, 100,100],
    'Subject': ['Math', 'Math', 'Music', 'Physic']
}
  
df = pd.DataFrame(Student_dict)

print('indexes of Dataframe:')  
print(df.loc[df['Marks'] == 100])
print((df.loc[df['Marks'] == 100]) & df.loc[df['Subject'] == 'Physic'))

Output

indexes of Dataframe:
    Name  Marks Subject
2    Max    100   Music
3  David    100  Physic

Most Viewed Post


  • Find unique value in column of pandas dataframe
  • Drop Rows by mutiple conditions in Pandas DataFrame
  • Convert column to float in Pandas Dataframe
  • Convert string column to int Pandas dataframe
  • Split Pandas DataFrame Column By Multiple Delimiters
  • Split Pandas DataFrame By Rows And Columns
  • Pandas Create Empty Dataframe And Append
  • 15 Most asked Python Pandas Interview question on Data filter

2. df.index.values to Find an index of specific Value


To find the indexes of the specific value that match the given condition in the Pandas dataframe we will use df[‘Subject’] to match the given values and index. values to find an index of matched values.

The result shows us that rows 0,1,2 have the value ‘Math’ in the Subject column.

Python Program Example

import pandas as pd
  
Student_dict = {
    'Name': ['Jack', 'Rack', 'Max', 'David'],
    'Marks':[99,98, 100,100],
    'Subject': ['Math', 'Math', 'Math', 'Physic']
}
  
df = pd.DataFrame(Student_dict)

print('indexes of Dataframe:')  
print(df[df['Subject'] == 'Math'].index.values)





Output

indexes of Dataframe:
[0 1 2]

3. Find index of a row which column matches the value


In the below code example, Find the index of the row in which column matches the value of the single condition. The row 0 has value Name=’Jack’ and in case of multiple conditions rows 0,1 matches the given condition.

Python Program Example

#python program to find Row index which column match value
import pandas as pd
  
Student_dict = {
    'Name': ['Jack', 'Rack', 'Max', 'David'],
    'Marks':[99,98, 100,100],
    'Subject': ['Math', 'Math', 'Music', 'Physic']
}
  
df = pd.DataFrame(Student_dict)

print('index of Dataframe row which column match value:')

print('single condition:',df.index[df["Name"]=='Jack'].tolist())

print('Mutiple conditions:', df.index[(df["Subject"]=='Math')& (df['Marks']<=100)].tolist())





Output

index of Dataframe row which column match value:
single condition: [0]
Mutiple conditions: [0, 1]

4. index.tolist() to Find index of specific Value in Pandas dataframe


The df.Marks[df.Marks == 100].index is to find the index of matched value and finally using tolist() method to convert the indexes to list.In this example, the row 2,3 rows column marks has value of marks==100.

Python Program Example

import pandas as pd
  
Student_dict = {
    'Name': ['Jack', 'Rack', 'Max', 'David'],
    'Marks':[99,98, 100,100],
    'Subject': ['Math', 'Math', 'Music', 'Physic']
}
  
df = pd.DataFrame(Student_dict)

print('indexes of Dataframe:')  
print(df.Marks[df.Marks == 100].index.tolist())


Output

indexes of Dataframe:
[2, 3]

5. Find index of column in Padas Dataframe


The get_loc() function is used to find the index of any column in the Python pandas dataframe.We simply pass the column name to get_loc() function to find index.

Python Program Example

import pandas as pd
  
Student_dict = {
    'Name': ['Jack', 'Rack', 'Max', 'David'],
    'Marks':[99,98, 100,100],
    'Subject': ['Math', 'Math', 'Music', 'Physic']
}
  
df = pd.DataFrame(Student_dict)

print('index of Dataframe column:')  
print(df.columns.get_loc("Marks"))

Output

index of Dataframe column:
1

Summary

In this post, we have learned multiple ways of how to find index of value in Pandas dataframe with Python program code example that includes an index of all rows and row which column match with the specific condition.

  1. Get Indices of Rows Containing Integers/Floats in Pandas
  2. Get Indices of Rows Containing Strings in Pandas

Get Index of Rows Whose Column Matches Specific Value in Pandas

This article demonstrates how to to get the index of rows that matches certain criteria in Pandas.

The necessity to find the indices of the rows is important in feature engineering. These skills can be useful to remove the outliers or abnormal values in a Dataframe. The indices, also known as the row labels, can be found in Pandas using several functions. In the following examples, we will be working on the dataframe created using the following snippet.

import pandas as pd
import numpy as np

np.random.seed(0)

df = pd.DataFrame(np.random.randint(1,20,size=(20, 4)), columns=list('ABCD'))

print (df)

Output:

     A   B   C   D
0   13  16   1   4
1    4   8  10  19
2    5   7  13   2
3    7   8  15  18
4    6  14   9  10
5   17   6  16  16
6    1  19   4  18
7   15   8   1   2
8   10   1  11   4
9   12  19   3   1
10   1   5   6   7
11   9  18  16   5
12  10  11   2   2
13   8  10   4   7
14  12  15  19   1
15  15   4  13  11
16  12   5   7   5
17  16   4  13   5
18   9  15  16   4
19  16  14  17  18

Get Indices of Rows Containing Integers/Floats in Pandas

The pandas.DataFrame.loc function can access rows and columns by its labels/names. It is straight forward in returning the rows matching the given boolean condition passed as a label. Notice the square brackets next to df.loc in the snippet.

import pandas as pd
import numpy as np

np.random.seed(0)

df = pd.DataFrame(np.random.randint(1,20,size=(20, 4)), columns=list('ABCD'))

print (df.loc[df['B'] == 19])

The rows corresponding to the boolean condition is returned as an output in the format of a Dataframe.

Output:

    A   B  C   D
6   1  19  4  18
9  12  19  3   1

Multiple conditions can be chained and applied together to the function, as shown below. This helps in isolating the rows based on specific conditions.

import pandas as pd
import numpy as np

np.random.seed(0)

df = pd.DataFrame(np.random.randint(1,20,size=(20, 4)), columns=list('ABCD'))

print (df.loc[(df['B'] == 19) | (df['C'] == 19)])

Output:

     A   B   C   D
6    1  19   4  18
9   12  19   3   1
14  12  15  19   1

Get Index of Rows With pandas.DataFrame.index()

If you would like to find just the matched indices of the dataframe that satisfies the boolean condition passed as an argument, pandas.DataFrame.index() is the easiest way to achieve it.

import pandas as pd
import numpy as np

np.random.seed(0)

df = pd.DataFrame(np.random.randint(1,20,size=(20, 4)), columns=list('ABCD'))

print (df.index[df['B'] == 19].tolist())

In the above snippet, the rows of column A matching the boolean condition == 1 is returned as output as shown below.

Output:

The reason why we put tolist() behind the index() method is to convert the Index to the list; otherwise, the result is of Int64Index data type.

Int64Index([6, 9], dtype='int64'

Retrieving just the indices can be done based on multiple conditions too. The snippet can be written as follows:

import pandas as pd
import numpy as np

np.random.seed(0)

df = pd.DataFrame(np.random.randint(1,20,size=(20, 4)), columns=list('ABCD'))

print (df.index[(df['B'] == 19) | (df['C'] == 19)].tolist())

Output:

Get Indices of Rows Containing Strings in Pandas

The string values can be matched based on two methods. Both the methods shown in the previous section will work, except for the condition change.

In the following examples, we will use the following snippet.

import pandas as pd

df = pd.DataFrame({"Name": ["blue", 
                            "delta", 
                            "echo", 
                            "charlie", 
                            "alpha"], 
                    "Type": ["Raptors",
                            "Raptors",
                            "Raptors",
                            "Raptors",
                            "Tyrannosaurus rex"]
                    })

print (df)

Output:

      Name               Type
0     blue            Raptors
1    delta            Raptors
2     echo            Raptors
3  charlie            Raptors
4    alpha  Tyrannosaurus rex

Get Index of Rows With the Exact String Match

The equality condition used in the previous section can be used in finding the exact string match in the Dataframe. We will look for the two strings.

import pandas as pd

df = pd.DataFrame({"Name": ["blue", 
                            "delta", 
                            "echo", 
                            "charlie", 
                            "alpha"], 
                    "Type": ["Raptors",
                            "Raptors",
                            "Raptors",
                            "Raptors",
                            "Tyrannosaurus rex"]
                    })

print (df.index[(df['Name'] == 'blue')].tolist())
print ('n')
print (df.loc[df['Name'] == 'blue'])
print ('n')
print (df.loc[(df['Name'] == 'charlie') & (df['Type'] =='Raptors')])

Output:

[0]

   Name     Type
0  blue  Raptors

      Name     Type
3  charlie  Raptors

As seen above, both the index and the rows matching the condition can be received.

Get Index of Rows With the Partial String Match

The string values can be partially matched by chaining the dataframe to the str.contains function. In the following example, we will be looking for the string ha in charlie and alpha.

import pandas as pd

df = pd.DataFrame({"Name": ["blue", 
                            "delta", 
                            "echo", 
                            "charlie", 
                            "alpha"], 
                    "Type": ["Raptors",
                            "Raptors",
                            "Raptors",
                            "Raptors",
                            "Tyrannosaurus rex"]
                    })

print (df.index[df['Name'].str.contains('ha')].tolist())
print ('n')
print (df.loc[df['Name'].str.contains('ha')])
print ('n')
print (df.loc[(df['Name'].str.contains('ha')) & (df['Type'].str.contains('Rex'))])

Output:

[3, 4]

      Name               Type
3  charlie            Raptors
4    alpha  Tyrannosaurus rex


    Name               Type
4  alpha  Tyrannosaurus rex

This function can be very useful in performing a partial string matching across multiple columns of the dataframe.

Добавить комментарий