Как найти все вхождения подстроки python

There is no simple built-in string function that does what you’re looking for, but you could use the more powerful regular expressions:

import re
[m.start() for m in re.finditer('test', 'test test test test')]
#[0, 5, 10, 15]

If you want to find overlapping matches, lookahead will do that:

[m.start() for m in re.finditer('(?=tt)', 'ttt')]
#[0, 1]

If you want a reverse find-all without overlaps, you can combine positive and negative lookahead into an expression like this:

search = 'tt'
[m.start() for m in re.finditer('(?=%s)(?!.{1,%d}%s)' % (search, len(search)-1, search), 'ttt')]
#[1]

re.finditer returns a generator, so you could change the [] in the above to () to get a generator instead of a list which will be more efficient if you’re only iterating through the results once.

David Leon's user avatar

answered Jan 12, 2011 at 2:43

moinudin's user avatar

moinudinmoinudin

133k45 gold badges189 silver badges214 bronze badges

9

>>> help(str.find)
Help on method_descriptor:

find(...)
    S.find(sub [,start [,end]]) -> int

Thus, we can build it ourselves:

def find_all(a_str, sub):
    start = 0
    while True:
        start = a_str.find(sub, start)
        if start == -1: return
        yield start
        start += len(sub) # use start += 1 to find overlapping matches

list(find_all('spam spam spam spam', 'spam')) # [0, 5, 10, 15]

No temporary strings or regexes required.

Pratik Deoghare's user avatar

answered Jan 12, 2011 at 3:13

Karl Knechtel's user avatar

Karl KnechtelKarl Knechtel

61.5k11 gold badges97 silver badges146 bronze badges

6

Here’s a (very inefficient) way to get all (i.e. even overlapping) matches:

>>> string = "test test test test"
>>> [i for i in range(len(string)) if string.startswith('test', i)]
[0, 5, 10, 15]

answered Jan 12, 2011 at 2:48

thkala's user avatar

thkalathkala

83.4k23 gold badges155 silver badges199 bronze badges

3

Use re.finditer:

import re
sentence = input("Give me a sentence ")
word = input("What word would you like to find ")
for match in re.finditer(word, sentence):
    print (match.start(), match.end())

For word = "this" and sentence = "this is a sentence this this" this will yield the output:

(0, 4)
(19, 23)
(24, 28)

answered Feb 3, 2016 at 19:01

Idos's user avatar

IdosIdos

15k14 gold badges59 silver badges73 bronze badges

2

Again, old thread, but here’s my solution using a generator and plain str.find.

def findall(p, s):
    '''Yields all the positions of
    the pattern p in the string s.'''
    i = s.find(p)
    while i != -1:
        yield i
        i = s.find(p, i+1)

Example

x = 'banananassantana'
[(i, x[i:i+2]) for i in findall('na', x)]

returns

[(2, 'na'), (4, 'na'), (6, 'na'), (14, 'na')]

answered Dec 23, 2015 at 23:09

AkiRoss's user avatar

AkiRossAkiRoss

11.6k6 gold badges59 silver badges85 bronze badges

3

You can use re.finditer() for non-overlapping matches.

>>> import re
>>> aString = 'this is a string where the substring "is" is repeated several times'
>>> print [(a.start(), a.end()) for a in list(re.finditer('is', aString))]
[(2, 4), (5, 7), (38, 40), (42, 44)]

but won’t work for:

In [1]: aString="ababa"

In [2]: print [(a.start(), a.end()) for a in list(re.finditer('aba', aString))]
Output: [(0, 3)]

AnukuL's user avatar

AnukuL

5751 gold badge7 silver badges21 bronze badges

answered Jan 12, 2011 at 2:55

Chinmay Kanchi's user avatar

Chinmay KanchiChinmay Kanchi

62.2k22 gold badges86 silver badges114 bronze badges

2

Come, let us recurse together.

def locations_of_substring(string, substring):
    """Return a list of locations of a substring."""

    substring_length = len(substring)    
    def recurse(locations_found, start):
        location = string.find(substring, start)
        if location != -1:
            return recurse(locations_found + [location], location+substring_length)
        else:
            return locations_found

    return recurse([], 0)

print(locations_of_substring('this is a test for finding this and this', 'this'))
# prints [0, 27, 36]

No need for regular expressions this way.

answered Nov 1, 2013 at 3:16

Cody Piersall's user avatar

Cody PiersallCody Piersall

8,2242 gold badges42 silver badges57 bronze badges

2

If you’re just looking for a single character, this would work:

string = "dooobiedoobiedoobie"
match = 'o'
reduce(lambda count, char: count + 1 if char == match else count, string, 0)
# produces 7

Also,

string = "test test test test"
match = "test"
len(string.split(match)) - 1
# produces 4

My hunch is that neither of these (especially #2) is terribly performant.

answered Sep 24, 2014 at 21:12

jstaab's user avatar

jstaabjstaab

3,30925 silver badges40 bronze badges

1

this is an old thread but i got interested and wanted to share my solution.

def find_all(a_string, sub):
    result = []
    k = 0
    while k < len(a_string):
        k = a_string.find(sub, k)
        if k == -1:
            return result
        else:
            result.append(k)
            k += 1 #change to k += len(sub) to not search overlapping results
    return result

It should return a list of positions where the substring was found.
Please comment if you see an error or room for improvment.

answered Apr 1, 2015 at 9:23

Thurines's user avatar

ThurinesThurines

1111 silver badge3 bronze badges

This does the trick for me using re.finditer

import re

text = 'This is sample text to test if this pythonic '
       'program can serve as an indexing platform for '
       'finding words in a paragraph. It can give '
       'values as to where the word is located with the '
       'different examples as stated'

#  find all occurances of the word 'as' in the above text

find_the_word = re.finditer('as', text)

for match in find_the_word:
    print('start {}, end {}, search string '{}''.
          format(match.start(), match.end(), match.group()))

answered Jul 6, 2018 at 9:34

Bruno Vermeulen's user avatar

Bruno VermeulenBruno Vermeulen

2,8732 gold badges14 silver badges28 bronze badges

This thread is a little old but this worked for me:

numberString = "onetwothreefourfivesixseveneightninefiveten"
testString = "five"

marker = 0
while marker < len(numberString):
    try:
        print(numberString.index("five",marker))
        marker = numberString.index("five", marker) + 1
    except ValueError:
        print("String not found")
        marker = len(numberString)

wingerse's user avatar

wingerse

3,6301 gold badge28 silver badges57 bronze badges

answered Sep 1, 2014 at 12:48

Andrew H's user avatar

Andrew HAndrew H

46610 silver badges22 bronze badges

You can try :

>>> string = "test test test test"
>>> for index,value in enumerate(string):
    if string[index:index+(len("test"))] == "test":
        print index

0
5
10
15

answered Feb 27, 2018 at 6:44

Harsha Biyani's user avatar

Harsha BiyaniHarsha Biyani

7,0199 gold badges37 silver badges61 bronze badges

You can try :

import re
str1 = "This dress looks good; you have good taste in clothes."
substr = "good"
result = [_.start() for _ in re.finditer(substr, str1)]
# result = [17, 32]

answered Oct 25, 2021 at 10:13

Mohammad Amin Eskandari's user avatar

2

When looking for a large amount of key words in a document, use flashtext

from flashtext import KeywordProcessor
words = ['test', 'exam', 'quiz']
txt = 'this is a test'
kwp = KeywordProcessor()
kwp.add_keywords_from_list(words)
result = kwp.extract_keywords(txt, span_info=True)

Flashtext runs faster than regex on large list of search words.

answered Sep 28, 2018 at 17:29

Uri Goren's user avatar

Uri GorenUri Goren

13.2k6 gold badges57 silver badges109 bronze badges

This function does not look at all positions inside the string, it does not waste compute resources. My try:

def findAll(string,word):
    all_positions=[]
    next_pos=-1
    while True:
        next_pos=string.find(word,next_pos+1)
        if(next_pos<0):
            break
        all_positions.append(next_pos)
    return all_positions

to use it call it like this:

result=findAll('this word is a big word man how many words are there?','word')

answered Jan 13, 2020 at 12:39

Valentin Goikhman's user avatar

0

src = input() # we will find substring in this string
sub = input() # substring

res = []
pos = src.find(sub)
while pos != -1:
    res.append(pos)
    pos = src.find(sub, pos + 1)

answered May 16, 2020 at 17:05

mascai's user avatar

mascaimascai

1,1251 gold badge8 silver badges26 bronze badges

1

Whatever the solutions provided by others are completely based on the available method find() or any available methods.

What is the core basic algorithm to find all the occurrences of a
substring in a string?

def find_all(string,substring):
    """
    Function: Returning all the index of substring in a string
    Arguments: String and the search string
    Return:Returning a list
    """
    length = len(substring)
    c=0
    indexes = []
    while c < len(string):
        if string[c:c+length] == substring:
            indexes.append(c)
        c=c+1
    return indexes

You can also inherit str class to new class and can use this function
below.

class newstr(str):
def find_all(string,substring):
    """
    Function: Returning all the index of substring in a string
    Arguments: String and the search string
    Return:Returning a list
    """
    length = len(substring)
    c=0
    indexes = []
    while c < len(string):
        if string[c:c+length] == substring:
            indexes.append(c)
        c=c+1
    return indexes

Calling the method

newstr.find_all(‘Do you find this answer helpful? then upvote
this!’,’this’)

answered Feb 15, 2018 at 20:02

naveen raja's user avatar

This is solution of a similar question from hackerrank. I hope this could help you.

import re
a = input()
b = input()
if b not in a:
    print((-1,-1))
else:
    #create two list as
    start_indc = [m.start() for m in re.finditer('(?=' + b + ')', a)]
    for i in range(len(start_indc)):
        print((start_indc[i], start_indc[i]+len(b)-1))

Output:

aaadaa
aa
(0, 1)
(1, 2)
(4, 5)

darkByt3's user avatar

answered Jan 20, 2020 at 22:47

Ruman Khan's user avatar

if you want to use without re(regex) then:

find_all = lambda _str,_w : [ i for i in range(len(_str)) if _str.startswith(_w,i) ]

string = "test test test test"
print( find_all(string, 'test') ) # >>> [0, 5, 10, 15]

answered Nov 5, 2021 at 8:38

WangSung's user avatar

WangSungWangSung

2192 silver badges5 bronze badges

Here’s a solution that I came up with, using assignment expression (new feature since Python 3.8):

string = "test test test test"
phrase = "test"
start = -1
result = [(start := string.find(phrase, start + 1)) for _ in range(string.count(phrase))]

Output:

[0, 5, 10, 15]

answered Apr 8, 2022 at 10:06

Mike's user avatar

MikeMike

1132 silver badges6 bronze badges

I think the most clean way of solution is without libraries and yields:

def find_all_occurrences(string, sub):
    index_of_occurrences = []
    current_index = 0
    while True:
        current_index = string.find(sub, current_index)
        if current_index == -1:
            return index_of_occurrences
        else:
            index_of_occurrences.append(current_index)
            current_index += len(sub)

find_all_occurrences(string, substr)

Note: find() method returns -1 when it can’t find anything

SUTerliakov's user avatar

SUTerliakov

4,7163 gold badges14 silver badges36 bronze badges

answered Oct 13, 2022 at 20:06

ulas.kesik's user avatar

ulas.kesikulas.kesik

1181 silver badge5 bronze badges

The pythonic way would be:

mystring = 'Hello World, this should work!'
find_all = lambda c,s: [x for x in range(c.find(s), len(c)) if c[x] == s]

# s represents the search string
# c represents the character string

find_all(mystring,'o')    # will return all positions of 'o'

[4, 7, 20, 26] 
>>> 

perror's user avatar

perror

6,96316 gold badges58 silver badges85 bronze badges

answered Apr 10, 2018 at 19:40

Harvey's user avatar

2

if you only want to use numpy here is a solution

import numpy as np

S= "test test test test"
S2 = 'test'
inds = np.cumsum([len(k)+len(S2) for k in S.split(S2)[:-1]])- len(S2)
print(inds)

answered Jun 10, 2021 at 16:46

Phillip Maire's user avatar

please look at below code

#!/usr/bin/env python
# coding:utf-8
'''黄哥Python'''


def get_substring_indices(text, s):
    result = [i for i in range(len(text)) if text.startswith(s, i)]
    return result


if __name__ == '__main__':
    text = "How much wood would a wood chuck chuck if a wood chuck could chuck wood?"
    s = 'wood'
    print get_substring_indices(text, s)

answered Mar 16, 2017 at 1:14

黄哥Python培训's user avatar

黄哥Python培训黄哥Python培训

2392 silver badges5 bronze badges

1

def find_index(string, let):
    enumerated = [place  for place, letter in enumerate(string) if letter == let]
    return enumerated

for example :

find_index("hey doode find d", "d") 

returns:

[4, 7, 13, 15]

Sabito stands with Ukraine's user avatar

answered Nov 8, 2020 at 13:49

Elli's user avatar

1

Not exactly what OP asked but you could also use the split function to get a list of where all the substrings don’t occur. OP didn’t specify the end goal of the code but if your goal is to remove the substrings anyways then this could be a simple one-liner. There are probably more efficient ways to do this with larger strings; regular expressions would be preferable in that case

# Extract all non-substrings
s = "an-example-string"
s_no_dash = s.split('-')
# >>> s_no_dash
# ['an', 'example', 'string']

# Or extract and join them into a sentence
s_no_dash2 = ' '.join(s.split('-'))
# >>> s_no_dash2
# 'an example string'

Did a brief skim of other answers so apologies if this is already up there.

answered May 19, 2021 at 13:43

als0052's user avatar

als0052als0052

3893 silver badges12 bronze badges

def count_substring(string, sub_string):
    c=0
    for i in range(0,len(string)-2):
        if string[i:i+len(sub_string)] == sub_string:
            c+=1
    return c

if __name__ == '__main__':
    string = input().strip()
    sub_string = input().strip()
    
    count = count_substring(string, sub_string)
    print(count)

answered Jun 2, 2021 at 3:24

CHANDANA SAMINENI's user avatar

2

I runned in the same problem and did this:

hw = 'Hello oh World!'
list_hw = list(hw)
o_in_hw = []

while True:
    o = hw.find('o')
    if o != -1:
        o_in_hw.append(o)
        list_hw[o] = ' '
        hw = ''.join(list_hw)
    else:
        print(o_in_hw)
        break

Im pretty new at coding so you can probably simplify it (and if planned to used continuously of course make it a function).

All and all it works as intended for what i was doing.

Edit: Please consider this is for single characters only, and it will change your variable, so you have to create a copy of the string in a new variable to save it, i didnt put it in the code cause its easy and its only to show how i made it work.

answered Jun 25, 2021 at 20:18

Lucas LP's user avatar

By slicing we find all the combinations possible and append them in a list and find the number of times it occurs using count function

s=input()
n=len(s)
l=[]
f=input()
print(s[0])
for i in range(0,n):
    for j in range(1,n+1):
        l.append(s[i:j])
if f in l:
    print(l.count(f))

barbsan's user avatar

barbsan

3,40811 gold badges21 silver badges28 bronze badges

answered Jul 30, 2019 at 11:44

BONTHA SREEVIDHYA's user avatar

2

To find all the occurence of a character in a give string and return as a dictionary
eg: hello
result :
{‘h’:1, ‘e’:1, ‘l’:2, ‘o’:1}

def count(string):
   result = {}
   if(string):
     for i in string:
       result[i] = string.count(i)
     return result
   return {}

or else you do like this

from collections import Counter

   def count(string):
      return Counter(string)

answered Apr 30, 2022 at 8:00

Aminu Aminaldo's user avatar

Many times while working with strings, we have problems dealing with substrings. This may include the problem of finding all positions of a particular substrings in a string. Let’s discuss certain ways in which this task can be performed. 

Method #1 : Using list comprehension + startswith() This task can be performed using the two functionalities. The startswith function primarily performs the task of getting the starting indices of substring and list comprehension is used to iterate through the whole target string. 

Python3

test_str = "GeeksforGeeks is best for Geeks"

test_sub = "Geeks"

print("The original string is : " + test_str)

print("The substring to find : " + test_sub)

res = [i for i in range(len(test_str)) if test_str.startswith(test_sub, i)]

print("The start indices of the substrings are : " + str(res))

Output : 

The original string is : GeeksforGeeks is best for Geeks
The substring to find : Geeks
The start indices of the substrings are : [0, 8, 26]

Time Complexity: O(n*m), where n is the length of the original string and m is the length of the substring to find
Auxiliary Space: O(k), where k is the number of occurrences of the substring in the string

Method #2 : Using re.finditer() The finditer function of the regex library can help us perform the task of finding the occurrences of the substring in the target string and the start function can return the resultant index of each of them. 

Python3

import re

test_str = "GeeksforGeeks is best for Geeks"

test_sub = "Geeks"

print("The original string is : " + test_str)

print("The substring to find : " + test_sub)

res = [i.start() for i in re.finditer(test_sub, test_str)]

print("The start indices of the substrings are : " + str(res))

Output : 

The original string is : GeeksforGeeks is best for Geeks
The substring to find : Geeks
The start indices of the substrings are : [0, 8, 26]

Method #3 : Using find() and replace() methods

Python3

test_str = "GeeksforGeeks is best for Geeks"

test_sub = "Geeks"

print("The original string is : " + test_str)

print("The substring to find : " + test_sub)

res=[]

while(test_str.find(test_sub)!=-1):

    res.append(test_str.find(test_sub))

    test_str=test_str.replace(test_sub,"*"*len(test_sub),1)

print("The start indices of the substrings are : " + str(res))

Output

The original string is : GeeksforGeeks is best for Geeks
The substring to find : Geeks
The start indices of the substrings are : [0, 8, 26]

Time Complexity: O(n*m), where n is the length of the original string and m is the length of the substring to find.
Auxiliary Space: O(k), where k is the number of occurrences of the substring in the string.

Method #4 : Using find()

The find() method is used to find the index of the first occurrence of the substring in the string. We start searching for the substring from the beginning of the string and continue searching until the substring is not found in the remaining part of the string. If the substring is found, we add its start index to the list of indices and update the start index to start searching for the next occurrence of the substring.

Python3

def find_substring_indices(string, substring):

    indices = []

    start_index = 0

    while True:

        index = string.find(substring, start_index)

        if index == -1:

            break

        else:

            indices.append(index)

            start_index = index + 1

    return indices

string = "GeeksforGeeks is best for Geeks"

substring = "Geeks"

indices = find_substring_indices(string, substring)

print("The original string is:", string)

print("The substring to find:", substring)

print("The start indices of the substrings are:", indices)

Output

The original string is: GeeksforGeeks is best for Geeks
The substring to find: Geeks
The start indices of the substrings are: [0, 8, 26]

Time complexity: O(nm) 
Auxiliary space: O(1)

Method #5: Using string slicing and while loop

  1. Initialize an empty list to store the indices of all occurrences of the substring.
  2. Set the starting index i to 0.
  3. Use a while loop to keep searching for the substring in the string.
  4. Inside the while loop, use the find() method to find the first occurrence of the substring in the string, starting from the current index i.
  5. If find() returns -1, it means that there are no more occurrences of the substring in the string, so break out of the loop.
  6. If find() returns a non-negative value, append the index of the first character of the substring to the list, and update the starting index i to the next character after the end of the substring.
  7. Repeat steps 4-6 until there are no more occurrences of the substring in the string.
  8. Return the list of indices.

Python3

def find_all_substrings(string, substring):

    indices = []

    i = 0

    while i < len(string):

        j = string.find(substring, i)

        if j == -1:

            break

        indices.append(j)

        i = j + len(substring)

    return indices

test_str = "GeeksforGeeks is best for Geeks"

test_sub = "Geeks"

print(find_all_substrings(test_str, test_sub)) 

Time complexity: O(nm), where n is the length of the string and m is the length of the substring. 
Auxiliary space: O(k), where k is the number of occurrences of the substring in the string.

Method #6 : Using re.finditer() and reduce(): 

Algorithm:

1. Import the required modules – re and functools.
2.Initialize the input string test_str and the substring to be searched test_sub.
3.Use re.finditer() to find all the occurrences of the substring test_sub in the string test_str.
4. Use reduce() to get the start indices of all the occurrences found in step 3.
5. The lambda function inside the reduce() takes two arguments – the first one is the list x that accumulates the start 6.indices and the second one is the Match object y returned by finditer(). The function adds the start index of the 7.current Match object to the list x.
8. Convert the final result to a string and print it.

Python3

import re

from functools import reduce

test_str = "GeeksforGeeks is best for Geeks"

test_sub = "Geeks"

occurrences = re.finditer(test_sub, test_str)

res = reduce(lambda x, y: x + [y.start()], occurrences, [])

print("The start indices of the substrings are : " + str(res))

Output

The start indices of the substrings are : [0, 8, 26]

Time Complexity: O(n), where n is the length of the input string.

Auxiliary Space: O(m), where m is the number of occurrences of the substring in the input string. This is because we need to store the start indices of all the occurrences in a list.

Last Updated :
03 May, 2023

Like Article

Save Article

  1. Используйте функцию string.count() для поиска всех вхождений подстроки в строке в Python
  2. Используйте понимание списка и startswith(), чтобы найти все вхождения подстроки в строке в Python
  3. Используйте re.finditer(), чтобы найти все вхождения подстроки в строке в Python

Python: найти все вхождения в строке

Подстрока в Python – это набор символов, который встречается в другой строке. Работа с подстроками часто может быть проблематичной. Одна из таких проблем – найти все вхождения подстроки в определенной строке.

В этом руководстве будут рассмотрены различные методы поиска всех вхождений подстроки в строке в Python.

Используйте функцию string.count() для поиска всех вхождений подстроки в строке в Python

string.count() – это встроенная функция в Python, которая возвращает количество или количество вхождений подстроки в данной конкретной строке. Кроме того, в нем есть дополнительные параметры start и end для указания индексов начальной и конечной позиций.

Метод count() просматривает строку и возвращает количество раз, когда определенная подстрока встречалась в строке.

Следующий код использует функцию string.count() для поиска всех вхождений подстроки в строку.

#defining string and substring
str1 = "This dress looks good; you have good taste in clothes."
substr = "good"

#occurrence of word 'good' in whole string
count1 = str1.count(substr)
print(count1)

#occurrence of word 'good' from index 0 to 25
count2 = str1.count(substr,0,25)
print(count2)

Выход:

Это простой метод, который работает в любом случае. Единственный недостаток этого метода заключается в том, что он не возвращает различные индексы, по которым подстрока встречается в строке.

Используйте понимание списка и startswith(), чтобы найти все вхождения подстроки в строке в Python

Этому методу нужны две вещи: понимание списка и метод startswith().

Функция startswith() выполняет задачу получения начальных индексов подстроки, а понимание списка используется для итерации по всей целевой строке.

Следующий код использует понимание списка и startswith() для поиска всех вхождений подстроки в строку.

# defining string 
str1 = "This dress looks good; you have good taste in clothes."
  
# defining substring
substr = "good"
  
# printing original string 
print("The original string is : " + str1)
  
# printing substring 
print("The substring to find : " + substr)
  
# using list comprehension + startswith()
# All occurrences of substring in string 
res = [i for i in range(len(str1)) if str1.startswith(substr, i)]
  
# printing result 
print("The start indices of the substrings are : " + str(res))

Выход:

The original string is : This dress looks good; you have good taste in clothes.
The substring to find : good
The start indices of the substrings are : [17, 34]

Используйте re.finditer(), чтобы найти все вхождения подстроки в строке в Python

re.finditer() – это функция библиотеки регулярных выражений, которую Python предоставляет программистам для использования в своем коде. Это помогает в выполнении задачи поиска вхождения определенного шаблона в строке. Чтобы использовать эту функцию, нам нужно сначала импортировать библиотеку регулярных выражений re.

re.finditer() использует в своем синтаксисе параметры pattern иstring. В этом случае шаблон относится к подстроке.

Следующий код использует функцию re.finditer() для поиска всех вхождений подстроки в строку.

import re 
 
# defining string  
str1 = "This dress looks good; you have good taste in clothes."
 
#defining substring 
substr = "good"
 
print("The original string is: " + str1) 
 
print("The substring to find: " + substr) 
 
result = [_.start() for _ in re.finditer(substr, str1)] 
 
print("The start indices of the substrings are : " + str(result))

Выход:

The original string is: This dress looks good; you have good taste in clothes.
The substring to find: good
The start indices of the substrings are : [17, 34]

Автор оригинала: Team Python Pool.

Привет, кодеры!! В этой статье мы рассмотрим методы в Python, чтобы найти все вхождения в строку. Чтобы сделать концепцию ясной, мы рассмотрим подробную иллюстрацию кода для достижения требуемого результата.

Что такое подстрока?

Подстрока в Python – это последовательность символов, представленных в другой строке. Например, рассмотрим сильный abaabaabbaab. Здесь arab – это подстрока, встречающаяся дважды в строке. Кроме того, abs – это еще одна подстрока, встречающаяся трижды в строке.

Часто при обработке строк у нас могут возникнуть проблемы с обработкой подстрок. Это включает в себя неудобство нахождения всех позиций определенной подстроки в строке. В этой статье мы обсудим, как мы можем справиться с этим.

Код Python для поиска всех вхождений в строку

1)Использование понимания списка + начинается с() в Python для поиска всех вхождений в строку

Эта функция помогает найти заданную подстроку во всей строке или в заданной части строки.

Синтаксис:

string.startswith(значение, начало, конец)

Список параметров:

  • значение: Это обязательное поле. Он содержит значение, с помощью которого мы проверяем, начинается ли строка с этого значения.
  • start: Это необязательное поле. Это целочисленное значение, которое определяет позицию, с которой следует начать поиск.
  • конец: Это необязательное поле. Это целочисленное значение, которое указывает позицию, с которой следует завершить поиск.

Возвращаемое значение:

Возвращает индекс, по которому найдена данная подстрока.

Вывод и объяснение:

Использование list comprehension + начинается с() для поиска всех вхождений в строкуИспользование list comprehension + начинается с() для поиска всех вхождений в строку

Выход

В этом коде входная строка была “python pool for python coding”. Мы выбрали подстроку “python”. Используя функцию starts with() , мы нашли вхождения подстроки в строку. В результате мы нашли подстроку в индексах 0 и 15.

2) Использование re.finditer() в Python для поиска всех вхождений в строку

Это функция href=”https://docs.python.org/3/library/re.html”>библиотека регулярных выражений, предоставляемая python, которая помогает найти вхождение определенного шаблона в строку. href=”https://docs.python.org/3/library/re.html”>библиотека регулярных выражений, предоставляемая python, которая помогает найти вхождение определенного шаблона в строку.

Синтаксис:

re.finditer(шаблон, строка,)

Список параметров:

  • pattern: шаблон, который должен быть согласован

Возвращаемое значение:

Эта функция возвращает итератор неперекрывающихся совпадений для шаблона в строке.

import re 



print("The original string is: " + string) 

print("The substring to find: " + substring) 

result = [i.start() for i in re.finditer(substring, string)] 

print("The start indices of the substrings are : " + str(result))

Вывод и объяснение:

Использование re.finditer() в Python для поиска всех вхождений в строкуИспользование re.finditer() в Python для поиска всех вхождений в строку

Выход

В этом коде входная строка была “python pool for python coding”. Мы выбрали подстроку “python”. Используя функцию re.finditer (), мы нашли неперекрывающиеся вхождения подстроки в строке. В результате мы нашли подстроку в индексах 0 и 15.

3) Использование re.findall() в Python для поиска всех вхождений в строку

Эта функция используется для поиска всех неперекрывающихся подстрок в данной строке. Строка тщательно сканируется слева направо, возвращая совпадения в том же порядке.

Синтаксис:

re.finditer(шаблон, строка,)

Список параметров:

  • pattern: шаблон, который должен быть согласован

Возвращаемое значение:

Он возвращает все совпадения шаблона в виде списка строк.

import re 



print("The original string is: " + string) 

print("The substring to find: " + substring) 		
	.findall(substring, string) 
print(result)

Вывод и объяснение:

Использование re.findall() в Python для поиска всех вхождений в строкуИспользование re.findall() в Python для поиска всех вхождений в строку

Выход

В этом коде входная строка была “python pool 123 for python 456 coding”. Мы выбрали подстроку “d+”. Используя функцию re.findall (), мы нашли вхождения подстроки. В этом случае мы ищем целые числа в строке. В результате на выходе получается список, содержащий все целочисленные значения.

Вывод: Python находит все вхождения в строку

В этой статье мы изучили различные способы поиска всех вхождений данной подстроки в строку с помощью различных функций в Python.

Watch Now This tutorial has a related video course created by the Real Python team. Watch it together with the written tutorial to deepen your understanding: Check if a Python String Contains a Substring

If you’re new to programming or come from a programming language other than Python, you may be looking for the best way to check whether a string contains another string in Python.

Identifying such substrings comes in handy when you’re working with text content from a file or after you’ve received user input. You may want to perform different actions in your program depending on whether a substring is present or not.

In this tutorial, you’ll focus on the most Pythonic way to tackle this task, using the membership operator in. Additionally, you’ll learn how to identify the right string methods for related, but different, use cases.

Finally, you’ll also learn how to find substrings in pandas columns. This is helpful if you need to search through data from a CSV file. You could use the approach that you’ll learn in the next section, but if you’re working with tabular data, it’s best to load the data into a pandas DataFrame and search for substrings in pandas.

How to Confirm That a Python String Contains Another String

If you need to check whether a string contains a substring, use Python’s membership operator in. In Python, this is the recommended way to confirm the existence of a substring in a string:

>>>

>>> raw_file_content = """Hi there and welcome.
... This is a special hidden file with a SECRET secret.
... I don't want to tell you The Secret,
... but I do want to secretly tell you that I have one."""

>>> "secret" in raw_file_content
True

The in membership operator gives you a quick and readable way to check whether a substring is present in a string. You may notice that the line of code almost reads like English.

When you use in, the expression returns a Boolean value:

  • True if Python found the substring
  • False if Python didn’t find the substring

You can use this intuitive syntax in conditional statements to make decisions in your code:

>>>

>>> if "secret" in raw_file_content:
...    print("Found!")
...
Found!

In this code snippet, you use the membership operator to check whether "secret" is a substring of raw_file_content. If it is, then you’ll print a message to the terminal. Any indented code will only execute if the Python string that you’re checking contains the substring that you provide.

The membership operator in is your best friend if you just need to check whether a Python string contains a substring.

However, what if you want to know more about the substring? If you read through the text stored in raw_file_content, then you’ll notice that the substring occurs more than once, and even in different variations!

Which of these occurrences did Python find? Does capitalization make a difference? How often does the substring show up in the text? And what’s the location of these substrings? If you need the answer to any of these questions, then keep on reading.

Generalize Your Check by Removing Case Sensitivity

Python strings are case sensitive. If the substring that you provide uses different capitalization than the same word in your text, then Python won’t find it. For example, if you check for the lowercase word "secret" on a title-case version of the original text, the membership operator check returns False:

>>>

>>> title_cased_file_content = """Hi There And Welcome.
... This Is A Special Hidden File With A Secret Secret.
... I Don't Want To Tell You The Secret,
... But I Do Want To Secretly Tell You That I Have One."""

>>> "secret" in title_cased_file_content
False

Despite the fact that the word secret appears multiple times in the title-case text title_cased_file_content, it never shows up in all lowercase. That’s why the check that you perform with the membership operator returns False. Python can’t find the all-lowercase string "secret" in the provided text.

Humans have a different approach to language than computers do. This is why you’ll often want to disregard capitalization when you check whether a string contains a substring in Python.

You can generalize your substring check by converting the whole input text to lowercase:

>>>

>>> file_content = title_cased_file_content.lower()

>>> print(file_content)
hi there and welcome.
this is a special hidden file with a secret secret.
i don't want to tell you the secret,
but i do want to secretly tell you that i have one.

>>> "secret" in file_content
True

Converting your input text to lowercase is a common way to account for the fact that humans think of words that only differ in capitalization as the same word, while computers don’t.

Now that you’ve converted the string to lowercase to avoid unintended issues stemming from case sensitivity, it’s time to dig further and learn more about the substring.

Learn More About the Substring

The membership operator in is a great way to descriptively check whether there’s a substring in a string, but it doesn’t give you any more information than that. It’s perfect for conditional checks—but what if you need to know more about the substrings?

Python provides many additonal string methods that allow you to check how many target substrings the string contains, to search for substrings according to elaborate conditions, or to locate the index of the substring in your text.

In this section, you’ll cover some additional string methods that can help you learn more about the substring.

By using in, you confirmed that the string contains the substring. But you didn’t get any information on where the substring is located.

If you need to know where in your string the substring occurs, then you can use .index() on the string object:

>>>

>>> file_content = """hi there and welcome.
... this is a special hidden file with a secret secret.
... i don't want to tell you the secret,
... but i do want to secretly tell you that i have one."""

>>> file_content.index("secret")
59

When you call .index() on the string and pass it the substring as an argument, you get the index position of the first character of the first occurrence of the substring.

But what if you want to find other occurrences of the substring? The .index() method also takes a second argument that can define at which index position to start looking. By passing specific index positions, you can therefore skip over occurrences of the substring that you’ve already identified:

>>>

>>> file_content.index("secret", 60)
66

When you pass a starting index that’s past the first occurrence of the substring, then Python searches starting from there. In this case, you get another match and not a ValueError.

That means that the text contains the substring more than once. But how often is it in there?

You can use .count() to get your answer quickly using descriptive and idiomatic Python code:

>>>

>>> file_content.count("secret")
4

You used .count() on the lowercase string and passed the substring "secret" as an argument. Python counted how often the substring appears in the string and returned the answer. The text contains the substring four times. But what do these substrings look like?

You can inspect all the substrings by splitting your text at default word borders and printing the words to your terminal using a for loop:

>>>

>>> for word in file_content.split():
...    if "secret" in word:
...        print(word)
...
secret
secret.
secret,
secretly

In this example, you use .split() to separate the text at whitespaces into strings, which Python packs into a list. Then you iterate over this list and use in on each of these strings to see whether it contains the substring "secret".

Now that you can inspect all the substrings that Python identifies, you may notice that Python doesn’t care whether there are any characters after the substring "secret" or not. It finds the word whether it’s followed by whitespace or punctuation. It even finds words such as "secretly".

That’s good to know, but what can you do if you want to place stricter conditions on your substring check?

Find a Substring With Conditions Using Regex

You may only want to match occurrences of your substring followed by punctuation, or identify words that contain the substring plus other letters, such as "secretly".

For such cases that require more involved string matching, you can use regular expressions, or regex, with Python’s re module.

For example, if you want to find all the words that start with "secret" but are then followed by at least one additional letter, then you can use the regex word character (w) followed by the plus quantifier (+):

>>>

>>> import re

>>> file_content = """hi there and welcome.
... this is a special hidden file with a secret secret.
... i don't want to tell you the secret,
... but i do want to secretly tell you that i have one."""

>>> re.search(r"secretw+", file_content)
<re.Match object; span=(128, 136), match='secretly'>

The re.search() function returns both the substring that matched the condition as well as its start and end index positions—rather than just True!

You can then access these attributes through methods on the Match object, which is denoted by m:

>>>

>>> m = re.search(r"secretw+", file_content)

>>> m.group()
'secretly'

>>> m.span()
(128, 136)

These results give you a lot of flexibility to continue working with the matched substring.

For example, you could search for only the substrings that are followed by a comma (,) or a period (.):

>>>

>>> re.search(r"secret[.,]", file_content)
<re.Match object; span=(66, 73), match='secret.'>

There are two potential matches in your text, but you only matched the first result fitting your query. When you use re.search(), Python again finds only the first match. What if you wanted all the mentions of "secret" that fit a certain condition?

To find all the matches using re, you can work with re.findall():

>>>

>>> re.findall(r"secret[.,]", file_content)
['secret.', 'secret,']

By using re.findall(), you can find all the matches of the pattern in your text. Python saves all the matches as strings in a list for you.

When you use a capturing group, you can specify which part of the match you want to keep in your list by wrapping that part in parentheses:

>>>

>>> re.findall(r"(secret)[.,]", file_content)
['secret', 'secret']

By wrapping secret in parentheses, you defined a single capturing group. The findall() function returns a list of strings matching that capturing group, as long as there’s exactly one capturing group in the pattern. By adding the parentheses around secret, you managed to get rid of the punctuation!

Using re.findall() with match groups is a powerful way to extract substrings from your text. But you only get a list of strings, which means that you’ve lost the index positions that you had access to when you were using re.search().

If you want to keep that information around, then re can give you all the matches in an iterator:

>>>

>>> for match in re.finditer(r"(secret)[.,]", file_content):
...    print(match)
...
<re.Match object; span=(66, 73), match='secret.'>
<re.Match object; span=(103, 110), match='secret,'>

When you use re.finditer() and pass it a search pattern and your text content as arguments, you can access each Match object that contains the substring, as well as its start and end index positions.

You may notice that the punctuation shows up in these results even though you’re still using the capturing group. That’s because the string representation of a Match object displays the whole match rather than just the first capturing group.

But the Match object is a powerful container of information and, like you’ve seen earlier, you can pick out just the information that you need:

>>>

>>> for match in re.finditer(r"(secret)[.,]", file_content):
...    print(match.group(1))
...
secret
secret

By calling .group() and specifying that you want the first capturing group, you picked the word secret without the punctuation from each matched substring.

You can go into much more detail with your substring matching when you use regular expressions. Instead of just checking whether a string contains another string, you can search for substrings according to elaborate conditions.

Using regular expressions with re is a good approach if you need information about the substrings, or if you need to continue working with them after you’ve found them in the text. But what if you’re working with tabular data? For that, you’ll turn to pandas.

Find a Substring in a pandas DataFrame Column

If you work with data that doesn’t come from a plain text file or from user input, but from a CSV file or an Excel sheet, then you could use the same approach as discussed above.

However, there’s a better way to identify which cells in a column contain a substring: you’ll use pandas! In this example, you’ll work with a CSV file that contains fake company names and slogans. You can download the file below if you want to work along:

When you’re working with tabular data in Python, it’s usually best to load it into a pandas DataFrame first:

>>>

>>> import pandas as pd

>>> companies = pd.read_csv("companies.csv")

>>> companies.shape
(1000, 2)

>>> companies.head()
             company                                     slogan
0      Kuvalis-Nolan      revolutionize next-generation metrics
1  Dietrich-Champlin  envisioneer bleeding-edge functionalities
2           West Inc            mesh user-centric infomediaries
3         Wehner LLC               utilize sticky infomediaries
4      Langworth Inc                 reinvent magnetic networks

In this code block, you loaded a CSV file that contains one thousand rows of fake company data into a pandas DataFrame and inspected the first five rows using .head().

After you’ve loaded the data into the DataFrame, you can quickly query the whole pandas column to filter for entries that contain a substring:

>>>

>>> companies[companies.slogan.str.contains("secret")]
              company                                  slogan
7          Maggio LLC                    target secret niches
117      Kub and Sons              brand secret methodologies
654       Koss-Zulauf              syndicate secret paradigms
656      Bernier-Kihn  secretly synthesize back-end bandwidth
921      Ward-Shields               embrace secret e-commerce
945  Williamson Group             unleash secret action-items

You can use .str.contains() on a pandas column and pass it the substring as an argument to filter for rows that contain the substring.

When you’re working with .str.contains() and you need more complex match scenarios, you can also use regular expressions! You just need to pass a regex-compliant search pattern as the substring argument:

>>>

>>> companies[companies.slogan.str.contains(r"secretw+")]
          company                                  slogan
656  Bernier-Kihn  secretly synthesize back-end bandwidth

In this code snippet, you’ve used the same pattern that you used earlier to match only words that contain secret but then continue with one or more word character (w+). Only one of the companies in this fake dataset seems to operate secretly!

You can write any complex regex pattern and pass it to .str.contains() to carve from your pandas column just the rows that you need for your analysis.

Conclusion

Like a persistent treasure hunter, you found each "secret", no matter how well it was hidden! In the process, you learned that the best way to check whether a string contains a substring in Python is to use the in membership operator.

You also learned how to descriptively use two other string methods, which are often misused to check for substrings:

  • .count() to count the occurrences of a substring in a string
  • .index() to get the index position of the beginning of the substring

After that, you explored how to find substrings according to more advanced conditions with regular expressions and a few functions in Python’s re module.

Finally, you also learned how you can use the DataFrame method .str.contains() to check which entries in a pandas DataFrame contain a substring .

You now know how to pick the most idiomatic approach when you’re working with substrings in Python. Keep using the most descriptive method for the job, and you’ll write code that’s delightful to read and quick for others to understand.

Watch Now This tutorial has a related video course created by the Real Python team. Watch it together with the written tutorial to deepen your understanding: Check if a Python String Contains a Substring

Добавить комментарий