@F.M.F’s answers has a few problems in this version, so I made a few adjustments to make it work.
import os
from os import scandir
import ctypes
def is_sym_link(path):
# http://stackoverflow.com/a/35915819
FILE_ATTRIBUTE_REPARSE_POINT = 0x0400
return os.path.isdir(path) and (ctypes.windll.kernel32.GetFileAttributesW(str(path)) & FILE_ATTRIBUTE_REPARSE_POINT)
def find(base, filenames):
hits = []
def find_in_dir_subdir(direc):
content = scandir(direc)
for entry in content:
if entry.name in filenames:
hits.append(os.path.join(direc, entry.name))
elif entry.is_dir() and not is_sym_link(os.path.join(direc, entry.name)):
try:
find_in_dir_subdir(os.path.join(direc, entry.name))
except UnicodeDecodeError:
print("Could not resolve " + os.path.join(direc, entry.name))
continue
except PermissionError:
print("Skipped " + os.path.join(direc, entry.name) + ". I lacked permission to navigate")
continue
if not os.path.exists(base):
return
else:
find_in_dir_subdir(base)
return hits
unicode() was changed to str() in Python 3, so I made that adjustment (line 8)
I also added (in line 25) and exception to PermissionError. This way, the program won’t stop if it finds a folder it can’t access.
Finally, I would like to give a little warning. When running the program, even if you are looking for a single file/directory, make sure you pass it as a list. Otherwise, you will get a lot of answers that not necessarily match your search.
example of use:
find(“C:”, [“Python”, “Homework”])
or
find(“C:\”, [“Homework”])
but, for example: find(“C:\”, “Homework”) will give un-wanted answers.
I would be lying if I said I know why this happens. Again, this is not my code and I just made the adjustments I needed to make it work. All credit should go to @F.M.F.
Improve Article
Save Article
Like Article
Improve Article
Save Article
Like Article
There may be many instances when you want to search a system.Suppose while writing an mp3 player you may want to have all the ‘.mp3’ files present. Well here’s how to do it in a simple way.
This code searches all the folders in the file it’s being run. If you want some other kinds of files just change the extension.
Python3
import
os
dir_path
=
os.path.dirname(os.path.realpath(__file__))
for
root, dirs, files
in
os.walk(dir_path):
for
file
in
files:
if
file
.endswith(
'.mp3'
):
print
(root
+
'/'
+
str
(
file
))
os is not an external library in python. So I feel this is the simplest and the best way to do this.
Use the glob module:
One alternative approach to searching for files with a specific extension is to use the glob module. This module provides a way to search for files with a specific pattern using the glob function.
For example, to search for all .txt files in the current directory, you could use the following code:
Python3
import
glob
files
=
glob.glob(
'*.mp3'
)
for
file
in
files:
print
(
file
)
The glob function returns a list of file paths that match the specified pattern. In this case, the pattern ‘*.mp3’ matches all files in the current directory that have the .mp3 extension.
You can also specify a different directory to search in by including the path in the pattern. For example, ‘path/to/directory/*.mp3’ would search for all .mp3 files in the path/to/directory directory.
The glob module also supports more advanced patterns, such as ‘*.mp3’ to match all files with the .mp3 extension in any directory, or ‘**/*.mp3’ to match all .mp3 files in all subdirectories, recursively. You can learn more about the different pattern options in the documentation for the glob module.
This article is contributed by soumith kumar. If you like GeeksforGeeks and would like to contribute, you can also write an article using write.geeksforgeeks.org or mail your article to review-team@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above.
Last Updated :
29 Dec, 2022
Like Article
Save Article
Table of Contents
- Introduction
- The PathLib Module: Defining the Path Structure
- Finding Files in the Current Directory
- Searching for Files Recursively in Python
- Finding a Single File Recursively
- Finding an Absolute Path
- Getting the Directory of the Currently Executing File
- Creating a File In the Same Directory As a Python File
- Yes, But Does the File Exist?
- Working with Files and Directories
- Example: Recursively Listing Files and Directories in Python:
- Finding Other Files and Directories
- Get the User’s Home Directory in Python
- Getting the Current Working Directory
- Example: Creating a Directory in the User Home Directory in Python
- Finding Python Module Paths and Files
- Exploring the PYTHONPATH and sys.path Variables
- Finding the Filename from Which a Python Module Was Loaded
- Closing Thoughts
- You May Also Enjoy
Introduction
In addition to being an excellent general-purpose programming language, Python is also well known as a handy language for scripting. Naturally, therefore, Python has support for the kinds of file searching and path operations one can do in a bash script, for example.
However, finding files in Python can be daunting if you don’t know where to start, especially since you can choose between more than one library to find and manipulate paths and files. Before version 3.4, one would have used the glob
module to find a file and used os.path
module to perform other path operations, such as getting a directory given an absolute filename or creating an absolute path from a relative path. Python 3.4 introduced the pathlib
module as an object-oriented API for filesystem paths.
This guide will focus on the pathlib module and teach you the basics of finding files and manipulating paths, including using glob patterns and recursive searching. We’ll also learn how to find various paths we might need to use as a starting point. For example, how would you find a file in a subdirectory of the user’s home directory, what’s the path of the currently executing file, or given a Python module name, where did that name come from? With the tools we’ll provide in this article, you’ll be able to locate any file in your system with ease.
The PathLib Module: Defining the Path Structure
Whether we’re dealing with the user’s home directory, the current working directory, or the directory containing an executing Python file, we’ll always need a way to build and manipulate paths robustly. Pyth n’s pathlib module provides most of what we need for this task. Although this module has a concept of “pure” paths (i.e., paths you can manipulate without really referring to the file system), it’s much more common that we simply want to construct a concrete path class.
For example, passing a file or directory name to the Path class constructor creates a concrete object of the type PosixPath on Linux or Unix-style systems (like macOS) or a WindowsPath object on Windows. Using the Path class, the root of the concrete classes in Python, allows the code we’ll demonstrate in this article to work in both environments.
Finding Files in the Current Directory
To start getting our hands on the Pathlib module, we can run the following code either as a file or from the command prompt. This will list the contents of the current working directory, that is, wherever we run it from (not where the Python script is located).
# list_dirs.py
from pathlib import Path
here = Path(".")
files = here.glob("*")
for item in files:
print(item)
We pass in a “.” as a starting point to construct the Path. We can then call the glob method on the path object to list all the files in this directory using “*.” The glob expression, *, will expand to mean “all files and directories,” just as it would if we used “ls *” on Linux or “dir *” on Windows.
Listing the contents of the directory I was working in, I get the following output:
I’ve tried to keep this folder pretty minimal. You c n see we list two python files and the subdirectory. By changing"*"
to "*.py"
, we could focus on just the Python files, or specify a full filename, etc.
Searching for Files Recursively in Python
The glob method also supports an extended glob syntax, "**/*"
, which allows you to search for files recursively. (This syntax may be new to some of my readers, but it’s the same syntax supported by zsh and .gitignore files if you’re familiar with those). This glob pattern means “search here and in all subdirectories.”
# list_dirs_recursive.py
from pathlib import Path
here = Path(".")
files = here.glob("**/*")
for item in files:
print(item)
This time we see a test file I put in the directory for the purpose and our new Python file.
Finding a Single File Recursively
Using just a little creativity, we can also search recursively for a single file or a subset of files. For example, if we just wanted to find something.txt no matter where it was starting from our current working directory, we could use “**/something.txt” as the glob pattern:
# find_file.py
from pathlib import Path
here = Path(".")
files = here.glob("**/something.txt")
for item in files:
print(item)
The output will show the folder structure relative to where the Path object is set.
my_subdirectory/something.txt
We use a for loop to iterate because glob
returns a generator of path objects of the same type as the path on which it’s called (in our case, PosixPath).
Finding an Absolute Path
So far we’ve shown examples of using relative paths. We started out by setting a variable to Path(".")
, and relative to that, we were able to use “glob” to find “my_subdirectory omething.txt”, for example. But sometimes what we need to find is an absolute path given that information. Using path objects like Posix path or WindowsPath, we can use the resolve
method to get this information. Resolve returns an absolute path. It also resolves any symbolic links.
"""Display some known relative and absolute paths"""
from pathlib import Path
here = Path(".")
file_path = Path("my_subdirectory/something.txt"
print(f"The absolute path to {here} is {here.resolve()}")
print(f"The absolute path to {file_path} is {file_path.resolve()}")
Output:
The absolute path to . is /Users/johnlockwood/paths-demo
The absolute path to my_subdirectory/something.txt is /Users/johnlockwood/paths-demo/my_subdirectory/something.txt
Getting the Directory of the Currently Executing File
You may already be aware that when Python loads a module, such as a program file, it sets a variable for that module, “__file__
“, to the (absolute) filename of the module. Given a file, therefore, we can display where it is:
# print_path.py
print(__file__)
Output:
/Users/johnlockwood/paths-demo/print_path.py
The __file__
variable is not a path object, but a string. However, by constructing a path from it, we can query two special attributes of a file, name and parent, which will give us just the name portion of the file and the directory portion, respectively:
# print_directory.py
from pathlib import Path
file_path = Path(__file__)
print(f"The file {file_path.name} is located in directory {file_path.parent}.")
Output:
The file print_directory.py is located in directory /Users/johnlockwood/paths-demo.
Creating a File In the Same Directory As a Python File
Generally, retrieving the file’s directory using the parent
attribute is more useful than getting its name, since we can now construct an absolute filename we can use to pass to another function. This is the correct approach to locating a file next to another Python file, since — unlike Path(".")
— the path given by Path(__file__)
will always be relative to the file itself, not where the file was run from.
For the Path classes in the pathlib module, we can join path components together using an overloaded “/” operator. (Note that we use “/” for this even on Windows — where the Path class will do the right thing and use the backslash operator in the constructed path).
With Path imported and given what we have so far, we can create a new filename using the one-liner:
output_path = Path(__file__).parent / "output.txt"
To review, this code constructs a Path from the __file__ string, uses the parent attribute to get the file’s directory, then uses the slash operator to construct a new Path object to an output file in the same directory as the running Python file.
Yes, But Does the File Exist?
In the last example, you may have noticed that we constructed a path that doesn’t exist. Not to worry, we can ask a path to tell us if what it points to exists or not, us ng (no surprises here), the exists
method.
Here’s a quick IPython screenshot that demonstrates how it works:
Next, let’s turn our attention to two more such “query” methods.
Working with Files and Directories
When we constructed a new filename based on the name of the Python __file__
built-in variable, we knew that __file__
was a file name, so it made sense to use the parent
attribute to get the directory where the file was located. Of course, given an absolute path to a directory, parent
behaves “differently,” returning the directory that contains this one. For example:
"""Display the current and parent directory"""
from pathlib import Path
here = Path(".").resolve()
print(f"You are here: {here}, a sub-directory of {here.parent}.")
Sample output:
You are here: /Users/johnlockwood/paths-demo, a sub-directory of /Users/johnlockwood.
So far, in these simple examples, we’ve always known the type of Path we’re dealing with, but if we didn’t, the concrete path classes we’ve been dealing with can let us know if we’re referring to a file or a directory using the methods is_file()
and is_dir()
. We’ll use these methods in the following example:
Example: Recursively Listing Files and Directories in Python:
Let’s put together a few of the techniques we’ve been discussing to build a sample where we do a recursive search for all files and directories and print the results.
# list_paths.py
"""Recursively list files and directories"""
from pathlib import Path
here = Path(".")
for path in sorted(here.glob("**/*")):
path_type = "?"
if path.is_file():
path_type = "F"
elif path.is_dir():
path_type = "D"
print(f"{path_type} {path}")
Abbreviated output:
F abs_path.py
F construct_filename.py
F directory_parent.py
F list_paths.py
D module
F module/file_in_module.py
D my_subdirectory
F my_subdirectory/something.txt
F print_directory.py
Finding Other Files and Directories
Get the User’s Home Directory in Python
To get the user’s home directory, simply call the class method “home
” on the Path class. This will retrieve the user’s home directory whether in a Linux-like environment or on Windows.
"""Display the user's home directory"""
from pathlib import Path
print(Path.home())
The output of course will depend on the user.
Getting the Current Working Directory
To get us started constructing Path objects, I’ve been using the path given by Path(".")
to retrieve the current working directory. I’ve tested that that method works even if you change the current working directory programmatically, but there is another way to accomplish this that is perhaps a bit more clear: Path.cwd()
. As you might expect, just like Path.home
, Path.cwd
is a class method, so you can call it on the class itself — you don’t need an instance.
Like the Path string constructor, Path.cwd()
and Path.home()
both return concrete Path objects, so we can build paths immediately using the slash operator. Do you need to query whether the user already has a configuration installed for Amazon’s AWS command-line interface, for example?
from pathlib import Path
default = Path.home() / ".aws"
print(default.exists())
Example: Creating a Directory in the User Home Directory in Python
The concrete pathlib.Path
class has many useful methods in addition to the ones we’re focusing on here, i.e., those that are related to finding files and directories.
Because it’s such a common use case, however, we thought we’d show an example using the Path.mkdir()
method. Starting with the user’s home directory, we’ll create a “hidden” directory for a hypothetical CodeSolid command line.
Because creating a directory will fail with an exception if the path already exists, we check for this and do the right thing.
"""Creates a .codesolid directory in the user's home directory"""
from pathlib import Path
DIRECTORY_NAME = ".codesolid"
config_directory = Path.home() / DIRECTORY_NAME
if not config_directory.exists():
config_directory.mkdir()
print(f"Directory {config_directory} created.")
else:
print(f"Directory {config_directory} already exists, skipping.")
Finding Python Module Paths and Files
Unlike the current working directory and the user’s home directory, which are common locations in which we might want to read or write files and directories, it’s perhaps less common to need to find locations related to where Python will search for modules or the actual locations containing a specific module. Nevertheless, occasionally this may help with troubleshooting, especially if the wrong code appears to be getting loaded or things are otherwise not working as you’d expect.
Exploring the PYTHONPATH and sys.path Variables
As you know, the PYTHONPATH environment variable adds search paths to the default locations through which Python will search when you import a module. Therefore you can read it as you would any environment variable:
$ export PYTHONPATH=/Users/johnlockwood/source/CodeSolid
$ python
...
>>> import os
>>> os.environ["PYTHONPATH"]
'/Users/johnlockwood/source/CodeSolid'
If you’re troubleshooting, however, reading the environment variable is probably not sufficient, since what you’re really interested in is the effective combination of the PYTHONPATH plus whatever the default search path is. For this purpose, the sys.path
attribute is a much more handy tool:
>>> import sys
>>> sys.path
['', '/Users/johnlockwood/source/CodeSolid', '/Users/johnlockwood/.pyenv/versions/3.11.0a6/lib/python311.zip', '/Users/johnlockwood/.pyenv/versions/3.11.0a6/lib/python3.11', '/Users/johnlockwood/.pyenv/versions/3.11.0a6/lib/python3.11/lib-dynload', '/Users/johnlockwood/.pyenv/versions/3.11.0a6/lib/python3.11/site-packages']
The first element of the array will be the directory that loaded the current module or an empty string, as shown here, if Python was launched interactively. After that, you’ll see the PYTHONPATH entries and the default module search location
PYTHONPATH shows us how the module path could have been altered, while sys.path
gives us all the paths from which a module could have been loaded. We can even go further than this, however, and examine the exact location from which Python has loaded (or even would load) a specific module. The simplest way to do this for a loaded module is to use our friend, the __file__
attribute. or example:
Finding the Filename from Which a Python Module Was Loaded
>>> import json
>>> json.__file__
'/Users/johnlockwood/.pyenv/versions/3.11.0a6/lib/python3.11/json/__init__.py'
Using a somewhat less reliable method, we can sometimes find the path from which a module would be loaded even without actually loading it. This generally works for user-defined modules and Python runtime library modules that are not loaded by the “FrozenLoader” module loader. As we’ll see below, it doesn’t work well for the “os” module.
"""Shows how find_spec can sometimes be used to get a file load location path"""
from importlib.util import find_spec
for module in ['os', 'json', 'module.file_in_module']:
spec = find_spec(module)
print(f"{module} origin: {spec.origin}")
Sample output:
os origin: frozen
json origin: /Users/johnlockwood/.pyenv/versions/3.11.0a6/lib/python3.11/json/__init__.py
module.file_in_module origin: /Users/johnlockwood/paths-demo/module/file_in_module.py
Closing Thoughts
As we’ve seen, the Python pathlib library contains most of what you need to find files in Python and list directories and files recursively as well. However, as we suggested at the outset, there’s considerable overlap between what’s available in pathlib
and what we find in the os
and os.path
libraries. We chose to focus on pathlib
here since we believe that in many cases it makes the code more convenient to work with, and we didn’t see much point in a tutorial that taught you two ways to do something if one would suffice. That said, if you have code based on os.path
that works well, we honestly see no compelling reason to go rework it.
In the end, for cases like these, sometimes we just have to open a console, type import this
, and bask in the wisdom of the Zen of Python:
There should be one– and preferably only one –obvious way to do it.
Although that way may not be obvious at first unless you’re Dutch.
You May Also Enjoy
Python Indexing and Slicing
Python Operators: The Building Blocks of Successful Code
- Find File With the
os.walk()
Function in Python - Find File With the
glob.glob()
Function in Python - Find File With the
Path.glob()
Function in Python
This tutorial will discuss the methods to find a file in Python.
Find File With the os.walk()
Function in Python
If we want to find the path of a specific file on our machine with python, we can use the os
module. The os
module provides many os-related functionalities to our code. The os.walk()
function takes a path
string as an input parameter and gives us the directory path, the directory name, and the filename for each file in the path
. The sample code below shows us how to find a file in Python with the os.walk()
function.
import os
def findfile(name, path):
for dirpath, dirname, filename in os.walk(path):
if name in filename:
return os.path.join(dirpath, name)
filepath = findfile("file2.txt", "/")
print(filepath)
Output:
/UsersmaisaDocumentsPythonProjectsfile2.txt
In the above code, we declared the findfile()
function that uses os.walk()
function to find our file. The findfile()
function takes the file’s name and the root path as input parameters and returns the path of our specified file. This approach gives us the absolute path of the file.
Find File With the glob.glob()
Function in Python
We can also use the glob.glob()
function to solve our current problem. The glob.glob()
function takes a pathname as an input parameter and returns a list of all the file paths that match the input argument. We can specify a regular expression as an input parameter that matches our file only. The sample code below shows us how to find a file in Python with the glob.glob()
function.
import glob
filepath = glob.glob('**/file.txt', recursive=True)
print(filepath)
Output:
We passed our file name as the input parameter to the glob.glob()
function, and it returned the relative path of our file. This method can give us the relative path as well as the absolute path of our file.
Find File With the Path.glob()
Function in Python
Another approach is to use the pathlib
module. This Python module offers classes that represent filesystem paths for different operating systems. We can use the Path.glob()
function inside the pathlib
module to solve our specific problem. This function is similar to the glob()
function inside the glob
module. The Path.glob()
function takes a pattern as an input parameter and returns a list of path objects that match the input argument. The sample code snippet shows us how to find a file in Python with the pathlib
module.
import pathlib
filepath = sorted(pathlib.Path('.').glob('**/file2.txt'))
print(filepath)
Output:
[WindowsPath('file2.txt')]
We passed a pattern string that matches our file to the Path.glob()
function. The Path.glob()
function returns us a list of WindowsPath
objects that match the pattern. With this method, we get path objects specific to our operating system.
The glob module, part of the Python Standard Library, is used to find the files and folders whose names follow a specific pattern. The searching rules are similar to the Unix Shell path expansion rules.
After reading this article, you will learn:
- How to find all files that match the specified pattern
- How to search files recursively using the
glob()
function - The
iglob()
to iterate over a list of filenames. - Search Files Using Wildcard Characters
The following are the list of functions available in the glob module. we’ll learn each one by one.
Function | Description |
---|---|
glob.glob(pathname) |
Returns a list of files that matches the path specified in the function argument |
glob.iglob(pathname) |
Return a generator object that we can iterate over and get the individual file names |
glob.escape(pathname) |
Useful especially in the case of the filenames with special characters |
Table of contents
- Python glob() Method to Search Files
- glob() to Search Files Recursively
- Glob to Search Files Using Wildcard Characters
- Match Any Character in File Name Using asterisk (*):
- Search all files and folders in given directory
- Match Single character in File Name Using Question Mark(?):
- Match File Name using a Range of Characters
- iglob() for Looping through the Files
- Search for Filenames with Special Characters using escape() method
- glob() Files with Multiple Extensions
- Using glob() with regex
- glob for finding text in files
- Sorting the glob() output
- Deleting files using glob()
- scandir() vs glob()
Python glob() Method to Search Files
Using the glob module we can search for exact file names or even specify part of it using the patterns created using wildcard characters.
These patterns are similar to regular expressions but much simpler.
- Asterisk (
*
): Matches zero or more characters - Question Mark (
?
) matches exactly one character - We can specify a range of alphanumeric characters inside the
[]
.
We need to import Python’s built-in glob module to use the glob()
function.
Syntax of glob()
function
glob.glob(pathname, *, recursive=False)
Python glob.glob()
method returns a list of files or folders that matches the path specified in the pathname
argument. This function takes two arguments, namely pathname, and recursive flag.
pathname
: Absolute (with full path and the file name) or relative (with UNIX shell-style wildcards). We can perform file search by passing the absolute or relative path to the glob() method.
An absolute path is a path name with a complete directory structure. A relative path is a pathname with one or more wild card characters in the path along with the directory names.recursive
: If set toTrue
it will search files recursively.
Example: Search all .txt files present in the current working directory
Let’s assume the following test files are present in the current working directory.
sales_march.txt profit_march.txt sales_april.txt profit_april.txt
import glob
# relative path to search all text files
files = glob.glob("*.txt")
print(files)
Output:
['profit_april.txt', 'profit_march.txt', 'sales_april.txt', 'sales_march.txt']
Example 2: Search files using a absolute path
Also, you can use the absolute path to search files.
import glob
# absolute path to search all text files inside a specific folder
path = r'E:/performance/pynative/*.txt'
print(glob.glob(path))
glob() to Search Files Recursively
Set recursive=True
to search inside all subdirectories. It is helpful If we are not sure exactly in which folder our search term or file is located. it recursively searches files under all subdirectories of the current directory.
The default value of the recursive flag is False
. I.e., it will only search in the folder specified in our search path. For example, if our search path is '/sales/abc.jpeg'
and you set recursive
to True
, it will search abc.jpeg
under all subfolders of sales.
Use Python 3.5+ to find files recursively using the glob module. The glob module supports the **
directive. When you set a recursive flag to True, the glob method parses the given path look recursively in the directories.
Example to search .txt files under all subdirectories of the current directory.
import glob
# path to search file
path = '**/*.txt'
for file in glob.glob(path, recursive=True):
print(file)
Output:
profit_april.txt profit_march.txt sales_april.txt sales_march.txt salesmarch_profit_2020.txt salesmarch_sales_2020.txt
Note: If the pathname has **
, the method will search for the directories and sub-directories. In a large file structure, this operation will typically consume a lot of time.
Glob to Search Files Using Wildcard Characters
We can use glob()
with wildcard characters to search for a folder or file in a multi-level directory. Two wildcards are most commonly used for search operations. Let us see both of them with examples.
Wildcard | Matches | Example |
---|---|---|
* |
Matches everything | *.pdf matches all files with the pdf extension |
? |
Matches any single character | sales/??.jpeg matches all files with two characters long present in the sales folder |
[] |
Matches any character in the sequence. | [psr]* matches files starting with the letter p, s, or r. |
[!] |
Matches any character not in sequence | [!psr]* matches files not starting with the letter p, s, or r. |
Match Any Character in File Name Using asterisk (*):
This wildcard character(*) will return a list of files or folders with zero or more character matches. We can extend our search of the glob() function using the wild character up to multi-level directories.
The following example will return all the files with a .txt extension and further extending the search in the subdirectory levels.
Example:
import glob
# path to search all txt files
path = "sales/*.txt"
for file in glob.glob(path):
print(file)
Output:
salesmarch_profit_2020.txt salesmarch_sales_2020.txt
Search all files and folders in given directory
Here we will see following three scenarios:
- Match every pathname inside a current directory, i.e. We will print all folders and files present inside the current directory
- Match every files and folder inside a given directory
- Match every files and folder that starts with the word ‘march’
import glob
# using glob to match every pathname
print('Inside current directory')
for item in glob.glob("*"):
print(item)
# Match every files and folder from a given folder
print('Inside Sales folder')
for item in glob.glob("sales/*"):
print(item)
print('All files starts with word march')
for item in glob.glob("sales/march*"):
print(item)
Output:
Inside current directory sales glob_demo.py profit_april.txt profit_march.txt sales_april.txt sales_march.txt Inside Sales folder salesbar.jpeg saleschart.jpeg salesmarch_profit_2020.txt salesmarch_sales_2020.txt salesp.jpeg All files starts with word march salesmarch_profit_2020.txt salesmarch_sales_2020.txt
Match Single character in File Name Using Question Mark(?):
This wildcard(?
) will return a list of files or folders with exactly one character match. This is generally used to search for a list of filenames, almost similar names with only one or few characters unique.
The following example will return all the files with single character names.
import glob
# path to search single character filename
path = "sales/?.jpeg"
for file in glob.glob(path):
print(file)
# path to search three-character filename
path = "sales/???.jpeg"
for file in glob.glob(path):
print(file)
# search file that starts with word 'cha' followed by exact two-character
path = "sales/cha??.txt"
for file in glob.glob(path):
print(file)
Output:
salesp.jpeg salesbar.jpeg saleschart.txt
Match File Name using a Range of Characters
We can give a range of characters or numbers as the search string by enclosing them inside the square brackets ([]
).
We can have either alphabets or numbers in the search pattern. The following example will show how to use glob to match files with characters from a-t and a list of files with numerals 2 to 5 in theirs names.
import glob
print(glob.glob("sales/[a-f]*.txt"))
print(glob.glob("sales/[2-5].*"))
Output:
['salesbar.txt', 'saleschart.txt']
['sales2.txt']
iglob()
for Looping through the Files
The glob.iglob()
works exactly the same as the glob()
method except it returns an iterator yielding file names matching the pattern. This method returns an iterator object that we can iterate over and get the individual file names.
Syntax:
glob.iglob(pathname, *, recursive=False)
Return an iterator which yields the same values as glob()
without actually storing them all simultaneously.
Why use iglob()
:
In some scenarios, the number of file or folders to match is high, and you could risk filling up your memory by loading them all using glob()
. Instead of that using the iglob()
, you can get all matching filenames in the form of an iterator object, which will improve performance.
It means, iglob()
returns a callable object which will load results in memory when called. Please refer to this Stackoverflow answer to get to know the performance benefits of iterators.
We can loop through the folders and subfolders to get the list of files in two ways.
Example
import glob
# using iglob
for item in glob.iglob("*.txt"):
print(item)
# check type
print('glob()')
print(type(glob.glob("*.txt")))
print('iglob()')
print(type(glob.iglob("*.txt")))
Output:
profit_april.txt profit_march.txt sales_april.txt sales_march.txt glob() <class 'list'> iglob() <class 'generator'>
Search for Filenames with Special Characters using escape()
method
In addition to the character and numeric ranges, we have the escape()
method to enable the pattern inside the glob()
with special characters.
syntax:
glob.escape(pathname)
As the name of the function suggests, this method escapes the special characters in the pathname
passed in the argument. This function is useful to search filenames with special characters like _, #, $, etc.
We can use this method along with the glob()
while searching for filenames with special characters. Let us see an example to find the files with special characters in their names.
import glob
print("All JPEG's files")
print(glob.glob("*.jpeg"))
print("JPEGs files with special characters in their name")
# set of special characters _, $, #
char_seq = "_$#"
for char in char_seq:
esc_set = "*" + glob.escape(char) + "*" + ".jpeg"
for file in (glob.glob(esc_set)):
print(file)
Output
All JPEG's files ['abc.jpeg', 'y_.jpeg', 'z$.jpeg', 'x#.jpeg'] JPEGs files with special characters in their name y_.jpeg z$.jpeg x#.jpeg
glob() Files with Multiple Extensions
We can search files having different extensions using the glob module. For example, you wanted to find files having .pdf or .txt extensions from a given folder.
import glob
print("All pdf and txt files")
extensions = ('*.pdf', '*.jpeg')
files_list = []
for ext in extensions:
files_list.extend(glob.glob(ext))
print(files_list)
Output
['christmas_envelope.pdf', 'reindeer.pdf', '1.jpeg', '2.jpeg', '4.jpeg', '3.jpeg', 'abc.jpeg']
Using glob() with regex
The glob()
function internally calls the fnmatch.fnmatch
which uses only the following four rules for pattern matching.
If you want to extend file matching with more flexible rules, we can combine the glob with regular expressions.
Consider a folder with jpeg files for employees, and we want to search for an employee whose name matches the user input. We can mention the folder name where the glob has to search and then use the regex search to search pattern.
import glob
import re
num = input('Enter the employee number ')
# [a-z] for any employee name
# {file_name} is the employee number
regex = r'[a-z_]+{file_num}.*'.format(file_num=num)
# search emp jpeg in employees folder
for file in glob.glob("2020/*"):
if re.search(regex, file):
print('Employee Photo:', file)
Output:
Enter the employee number 3 Employee Photo: 2020emp_3.jpeg
glob for finding text in files
The glob module is also handy for finding text in files. We generally use the glob module to find matching file names.
But most of the time, we wanted to replace a specific word from a file. Or we wanted files that contain the exact text, such as user id.
We can follow the below steps to get the files that contain the specific text
- Use glob to list all files in a directory and its subdirectories that match a file search pattern.
- Next, read the file and search for the matching text. (You can use regex if you wanted to find a specific pattern in the file)
Example: Search word profit in files
import glob
# Look all txt files of current directory and its sub-directories
path = '**/*.txt'
search_word = 'profit'
# list to store files that contain matching word
final_files = []
for file in glob.glob(path, recursive=True):
try:
with open(file) as fp:
# read the file as a string
data = fp.read()
if search_word in data:
final_files.append(file)
except:
print('Exception while reading file')
print(final_files)
Output:
['salesdata_2021.txt']
Sorting the glob() output
We can sort the output files list of the glob() method simply by using the sorted() function.
import glob
path = "*.txt"
print(sorted(glob.glob(path)))
Output:
['profit_april.txt', 'profit_march.txt', 'sales_april.txt', 'sales_march.txt']
We can sort the files based on the date and time of modification by combining the glob()
method with the getmtime()
method in the os module.
import glob
import os
# List all files and folders in the current directory
files = glob.glob(os.path.expanduser("*"))
# Sort by modification time (mtime) ascending and descending
files_ascending = sorted(files, key=lambda t: os.stat(t).st_mtime)
print(files_ascending)
files_descending = sorted(files, key=lambda t: -os.stat(t).st_mtime)
print(files_descending)
Output:
['sales_april.txt', 'sales_march.txt', 'profit_april.txt', 'profit_march.txt', 'sales', 'glob_demo.py'] ['glob_demo.py', 'sales', 'profit_march.txt', 'profit_april.txt', 'sales_april.txt', 'sales_march.txt']
Deleting files using glob()
We can remove the files from the directories using the glob() method by iterating over the list and then calling the os.remove()
for that file.
import glob
import os
# delete all pdf files
for pdf in (glob.glob("2020/*.pdf")):
# Removing the pdf file from the directory
print("Removing ", pdf)
os.remove(pdf)
Output:
Removing salesjune.pdf
scandir() vs glob()
Both the scandir()
and glob()
functions are internally searching for the files in a directory that matches a particular pattern.
But scandir()
is a generator function that returns an iterator object. The glob() method instead returns a list that consumes a lot of memory.