StackStalk
  • Home
  • Java
    • Java Collection
    • Spring Boot Collection
  • Python
    • Python Collection
  • C++
    • C++ Collection
    • Progamming Problems
    • Algorithms
    • Data Structures
    • Design Patterns
  • General
    • Tips and Tricks

Saturday, April 24, 2021

Using glob to find files recursively in Python

 April 24, 2021     Python     No comments   

Python glob module finds all pathnames matching a specified pattern according to the rules used by the Unix shell. This module comes built-in with Python and no need to install any external modules. It also allows to use wildcards *, ?, [range] to make it simple. Most common use case where glob is used to find recursively files in Python.

Let us take folder tree as below and use the glob function to filter files.

$ tree
.
├── a.txt
├── b.txt
├── folder1
│   ├── c.txt
│   └── d.py
└── folder2
    └── e.sh

2 directories, 5 files
Sample code to use glob module and recursively find matching files based on patterns.
import glob

base_folder = "/tmp/TEST"

# Using iglob to find all pathnames matching python or shell files
file_list = []
types = ('**/*.py','**/*.sh')
for type_str in types:
    pattern = base_folder + "/" + type_str
    for file in glob.iglob(pattern, recursive=True):
        file_list.append(file)
print(file_list)

# Using glob to find all files
print(glob.glob('/tmp/TEST/**/*', recursive=True))

# Using glob to find all files using range
print(glob.glob('/tmp/TEST/**/[a-c]*', recursive=True))
Output is:
['/tmp/TEST/folder1/d.py', '/tmp/TEST/folder2/e.sh']
['/tmp/TEST/folder2', '/tmp/TEST/b.txt', '/tmp/TEST/a.txt', '/tmp/TEST/folder1', '/tmp/TEST/folder2/e.sh', '/tmp/TEST/folder1/d.py', '/tmp/TEST/folder1/c.txt']
['/tmp/TEST/b.txt', '/tmp/TEST/a.txt', '/tmp/TEST/folder1/c.txt']
Additional notes on glob.
  • iglob returns an iterator without actually storing all the results simultaneously
  • Using the '**' pattern in a large directory may consume inordinate amount of time
Few common patterns.
  • '**/*'      - All files and folders including sub folders
  • '*'           - All files and folders at top level
  • ''*/*'       - All first level files and folders
  • '**/*.py' - All Python files in all folders
Email ThisBlogThis!Share to XShare to Facebook
Newer Post Older Post Home

0 comments:

Post a Comment

Follow @StackStalk
Get new posts by email:
Powered by follow.it

Popular Posts

  • Python FastAPI file upload and download
    In this article, we will look at an example of how to implement a file upload and download API in a Python FastAPI microservice. Example bel...
  • Avro Producer and Consumer with Python using Confluent Kafka
    In this article, we will understand Avro a popular data serialization format in streaming data applications and develop a simple Avro Produc...
  • Monitor Spring Boot App with Micrometer and Prometheus
    Modern distributed applications typically have multiple microservices working together. Ability to monitor and manage aspects like health, m...
  • Server-Sent Events with Spring WebFlux
    In this article we will review the concepts of server-sent events and work on an example using WebFlux. Before getting into this article it ...
  • Accessing the Kubernetes API
    In this article, we will explore the steps required to access the Kubernetes API and overcome common challenges. All operations and communic...
  • Python FastAPI microservice with Okta and OPA
    Authentication (AuthN) and Authorization (AuthZ) is a common challenge when developing microservices. In this article, we will explore how t...
  • Scheduling jobs in Python
    When developing applications and microservices we run into scenarios where there is a need to run scheduled tasks. Examples include performi...
  • Using Tekton to deploy KNative services
    Tekton is a popular open-source framework for building continuous delivery pipelines. Tekton provides a declarative way to define pipelines ...

Copyright © StackStalk