Should You Use Pathlib?

This post compares os and pathlib, two Python libraries for working with files, paths and directories.

Both os and pathlib are part of the Python standard library.

If you've been programming in Python a while, it's likely you are using the functions in os such as os.path.join - historically this was the only functionality offered by the standard library for handling file paths.

pathlib was introduced in Python 3.4, and offers different ways to work with files, directories and paths.

In this post we will compare both libraries at these tasks:

  1. single paths - forming file paths from strings, getting the home & current working directory, working with file names & suffixes.
  2. making things - making directories, saving data to text files, appending data to text files.
  3. reading and finding - reading text files, finding files, finding directories.
  4. removing things - removing directories & removing files.

Working with File Paths

Creating One File Path

Operating systems take different approaches with file paths - LINUX uses / as a separator where Windows uses \.

This complexity with different file separators on different operating systems makes portability between operating systems a key concern when dealing with file paths.

Both os.path and pathlib offer portable ways to construct paths. This means the same piece of Python code can be run on either LINUX or Windows, with the path created being compatible with that OS.

os offers os.path.join to create a file path:

python
import os
path = os.path.join(os.path.expanduser('~'), 'data', 'file.txt')
# /Users/adam/data/file.txt

In pathlib the path is formed using the division operator / with an initialized Path object:

python
from pathlib import Path

path = Path.home() / 'data' / 'file.txt'
# /Users/adam/data/file.txt

The Path object is the focus of pathlib - almost all of the functionality we need can be accessed as either attributes or methods on this object.

Get the Home Directory

The home directory is in different places on different operating systems - both our contenders offer a way to get the user's home directory that will work on both UNIX & Windows systems:

  • Ubuntu - /home/$USER
  • MacOS - /Users/$USER
  • Windows - C:\Users\$USER

Get the home directory with os:

python
import os

os.path.expanduser('~')
# /Users/adam

Get the home directory with pathlib:

python
from pathlib import Path

Path.home()
# /Users/adam

Get the Current Working Directory

Get the current working directory with os:

python
import os

os.getcwd()

Get the current working directory with pathlib:

python
from pathlib import Path

Path.cwd()

Separating File Paths into Name, Stem & Suffix

A file path can be separated into different parts. We can decompose a file name into a stem and suffix - name = stem.suffix.

The name of a file includes the suffix. The suffix is useful for determining what the file type is.

Getting the file name with os requires using basename:

python
import os

os.path.basename('/path/file.suffix')
# file.suffix

With pathlib we can use the name attribute on a Path object:

python
from pathlib import Path

Path('/path/file.suffix').name
# file.suffix

The stem is the file name without the suffix.

Getting this with os requires using both basename and splitext:

python
from os.path import basename, splitext

splitext(basename('/path/file.suffix'))[0]
# file

With pathlib we can use the stem attribute on a Path object:

python
from pathlib import Path

Path('/path/file.suffix').stem
# file

The suffix is the final part of a filepath.

To get the suffix with os.path:

python
import os

os.path.splitext('/path/file.suffix')[-1]
# .suffix

pathlib has suffix as an attribute of the Path object:

python
from pathlib import Path

Path('/path/file.suffix').suffix
# .suffix

Making Directories & Files

Making Directories

We can create a directory using os with os.mkdir():

python
import os

path = os.path.join(os.path.expanduser('~'), 'python-file-paths')
os.mkdir(path)

With pathlib, we use a method on the Path class Path.mkdir():

python
from pathlib import Path
path = Path.home() / 'python-file-paths'
path.mkdir()

Sometimes we want to make a deeper folder structure, where many of the folders in the path don't exist.

Trying this will raise an error (as the folder foo doesn't exist yet):

python
from pathlib import Path

path = Path.home() / 'python-file-paths' / 'foo' / 'bar'
path.mkdir()
output
FileNotFoundError: [Errno 2] No such file or directory: '/Users/adam/python-file-paths/foo/bar'

We can avoid this error by using parents=True:

python
from pathlib import Path

path = Path.home() / 'python-file-paths' / 'foo' / 'bar'
path.mkdir(parents=True)

Another cause of error is trying to make a directory that already exists:

python
from pathlib import Path

path = Path.home() / 'python-file-paths' 
path.mkdir()
# FileExistsError

It's common to use both parents=True and exist_ok=True whenever we make a folder:

python
from pathlib import Path

path = Path.home() / 'python-file-paths' / 'foo' / 'bar'
path.mkdir(parents=True, exist_ok=True)

The examples above are all about creating a directory from a path.

Sometimes we actually have a full file path (including both folders and a filename). If use mkdir on a full file path, we will end up making a directory with the same name as our soon to be file!

We can use Path.parent to access the enclosing folder of our file, and call .mkdir on that folder:

python
from pathlib import Path

path = Path.home() / 'python-file-paths' / 'foo' / 'bar' / 'baz.file'
path.parent.mkdir(parents=True, exist_ok=True)

Writing Data to Files

Imagine we have a dataset of 32 samples, and we want to save each sample in a file in $HOME/python-file-paths/.

First using os, where we the lack of the exist_ok argument in os.mkdir means we need to check if the base folder exists before making it:

python
import os
import random

random.seed(42)
dataset = [[random.uniform(0, 100) for _ in range(4)] for _ in range(32)]

base_path = os.path.join(os.path.expanduser('~'), 'python-file-paths')
if not os.path.exists(base_path):
    os.mkdir(base_path)

for n, sample in enumerate(dataset):
    file_path = os.path.join(base_path, f'sample_{n}.data')
    with open(file_path, 'w') as file:
        file.write(str(sample))

We can use cat to print out our first sample:

shell-session
$ cat ~/python-file-paths/sample_0.data
[37.45401188 95.07143064 73.19939418 59.86584842]

And then using pathlib, where we can using exist_ok=True along with a write_text method on our Path object:

python
from pathlib import Path
import random

random.seed(42)
dataset = [[random.uniform(0, 100) for _ in range(4)] for _ in range(32)]
for n, sample in enumerate(dataset):
    path = Path.home() / 'python-file-paths' / f'sample_{n}.data'
    path.parent.mkdir(exist_ok=True)
    path.write_text(str(sample))

Again using cat to print out our first sample, which due to our random seed, is the same:

shell-session
$ cat ~/python-file-paths/sample_0.data
[37.45401188 95.07143064 73.19939418 59.86584842]

Appending Data to a File

The above task was writing to many files - one file per sample. Other times we want to append to a file - the advantage being all our data is stored in one file.

These examples append text to a single file all_samples.data. First with os:

python
import os
import random

random.seed(42)
dataset = [[random.uniform(0, 100) for _ in range(4)] for _ in range(32)]

base_path = os.path.join(os.path.expanduser('~'), 'python-file-paths')
if not os.path.exists(base_path):
    os.mkdir(base_path)

file_path = os.path.join(base_path, 'all_samples.data')
for n, sample in enumerate(dataset):
    with open(file_path, 'a') as file:
        file.write(str(sample) + '\n')

And with pathlib - note here we are forced to use context management to be able to pass an append flag of a:

python
from pathlib import Path
import random

random.seed(42)
dataset = [[random.uniform(0, 100) for _ in range(4)] for _ in range(32)]
for n, sample in enumerate(dataset):
    path = Path.home() / 'python-file-paths' / 'samples.data'
    path.parent.mkdir(exist_ok=True)
    with path.open('a') as fi:
        fi.write(str(sample)+'\n')

Now our data is stored in a single file (one line per row):

shell-session
$ head -n 2 ~/python-file-paths/samples.data
[37.45401188 95.07143064 73.19939418 59.86584842]
[15.60186404 15.59945203  5.80836122 86.61761458]

Reading & Finding Files

Reading from Text Files

Let's open one of the text files we created earlier.

Reading a text file with os requires context management to properly close the file after opening:

python
from os.path import join, expanduser

path = join(expanduser('~'), 'python-file-paths', 'samples.data')
with open(path, 'r') as fi:
    data = fi.read()

With pathlib we can open, read & close the file using the read_text() method on our Path object:

python
from pathlib import Path

path = Path.home() / 'python-file-paths', 'samples.data')
data = path.read_text()

Finding Many Files Recursively

Sometimes we want to find the paths for many files. We want to find paths deep in the file system.

With os we can use os.walk:

python
from os import walk
from os.path import join, expanduser
home = expanduser('~')

files = []
for root, dirs, files in walk(join(expanduser('~'), 'python-file-paths')):
    for path in files:
        if path.endswidth('.py'):
            files.append(join(root, path))

With pathlib, glob is best:

python
from pathlib import Path

path = Path().home()
paths = [p for p in path.glob('**/*.py') if p.is_file()]

glob will not return path orders deterministically - if you are relying on the order, be sure to call sorted on paths.

Finding All Directories

Often we want a list of directories at a certain path - here we use the user's home directory. We don't want this to be recursive.

For os.path we use os.path.listdir to iterate over a path, with os.path.isdir to check the path is a directory:

python
from os import listdir
from os.path import expanduser, join, isdir

path = expanduser('~')
dirs = [join(path, p) for p in listdir(path) if isdir(join(path, p))]

For pathlib we use path.iterdir and path.is_dir - both methods are called on the Path object:

python
from pathlib import Path

path = Path().home()
dirs = [p.name for p in path.iterdir() if p.is_dir()]

Finding All Directories Recursively

Sometimes we want to look beyond a single path, and recursively search for folders.

We can do this using os.walk:

python
from os import walk
from os.path import expanduser, join, isdir

paths = []
for root, dirs, files in walk(join(expanduser('~'), 'python-file-paths')):
    for path in files:
        full_path = join(root, path)
        if isdir(full_path):
            paths.append(full_path)

With pathlib this is best done using path.glob:

python
from pathlib import Path

path = Path().home()
paths = [p for p in path.glob('**/*') if p.is_dir()]

Removing Things

Removing Directories

For the first time we need to use a function from outside pathlib - using shutil.rmtree to remove a non-empty directory.

The best way to do this is with shutil.rmtree, which will remove the directory even if it is not empty.

There is no real difference between os & pathlib except for when creating the filepath - the example below uses pathlib:

python
from shutil import rmtree
from pathlib import Path

path = Path.home() / 'python-file-paths'
rmtree(path)

This is usually the behaviour you want when removing directories - remove even if not empty.

Removing Files

Sometimes we want to remove specific files - when we know the path.

We can do this with os:

python
import os
from os.path import expanduser, isdir, join

path = join(expanduser('~'), 'python-file-paths', 'data.txt')
if os.path.exists(path):
    os.remove(path)

And with pathlib:

python
from pathlib import Path

path = Path.home() / 'python-file-paths' / 'data.txt'
path.unlink(missing_ok=True)

As summary of the different options for removing files and directories:

Taskos.pathpathlib
Remove empty directoryos.rmdirpath.rmdir
Remove fileos.removepath.unlink
Remove directoryshutil.rmtreeshutil.rmtree

Summary

Thanks for reading! Here are the key takeaways:

  • pathlib moves functionality into a single Path object,
  • use exist_ok and parents when creating directories,
  • Path.parents attribute allows easy access of the folder a file is in,
  • work with text files using with Path.write_text & Path.read_text,
  • check if a path is a directory using Path.is_dir() or a folder with Path.is_file().

This content is a sample from our Introduction to Python course.


Thanks for reading!

If you enjoyed this blog post, make sure to check out our free 77 data science lessons across 22 courses.