Written by Adam Green.
This post compares os
and pathlib
, two Python libraries for working with files, paths and directories.
Both os
and pathlib
are part of the Python standard library.
If you've been programming in Python a while, it's likely you are using the functions in os
such as os.path.join
- historically this was the only functionality offered by the standard library for handling file paths.
pathlib
was introduced in Python 3.4, and offers different ways to work with files, directories and paths.
In this post we will compare both libraries at these tasks:
Operating systems take different approaches with file paths - LINUX uses /
as a separator where Windows uses \
.
This complexity with different file separators on different operating systems makes portability between operating systems a key concern when dealing with file paths.
Both os.path
and pathlib
offer portable ways to construct paths. This means the same piece of Python code can be run on either LINUX or Windows, with the path created being compatible with that OS.
os
offers os.path.join
to create a file path:
python
import os path = os.path.join(os.path.expanduser('~'), 'data', 'file.txt') # /Users/adam/data/file.txt
In pathlib
the path is formed using the division operator /
with an initialized Path
object:
python
from pathlib import Path path = Path.home() / 'data' / 'file.txt' # /Users/adam/data/file.txt
The Path
object is the focus of pathlib
- almost all of the functionality we need can be accessed as either attributes or methods on this object.
The home directory is in different places on different operating systems - both our contenders offer a way to get the user's home directory that will work on both UNIX & Windows systems:
/home/$USER
/Users/$USER
C:\Users\$USER
Get the home directory with os
:
python
import os os.path.expanduser('~') # /Users/adam
Get the home directory with pathlib
:
python
from pathlib import Path Path.home() # /Users/adam
Get the current working directory with os
:
python
import os os.getcwd()
Get the current working directory with pathlib
:
python
from pathlib import Path Path.cwd()
A file path can be separated into different parts. We can decompose a file name into a stem and suffix - name
= stem.suffix
.
The name of a file includes the suffix. The suffix is useful for determining what the file type is.
Getting the file name with os
requires using basename
:
python
import os os.path.basename('/path/file.suffix') # file.suffix
With pathlib
we can use the name
attribute on a Path
object:
python
from pathlib import Path Path('/path/file.suffix').name # file.suffix
The stem is the file name without the suffix.
Getting this with os
requires using both basename
and splitext
:
python
from os.path import basename, splitext splitext(basename('/path/file.suffix'))[0] # file
With pathlib
we can use the stem
attribute on a Path
object:
python
from pathlib import Path Path('/path/file.suffix').stem # file
The suffix is the final part of a filepath.
To get the suffix with os.path
:
python
import os os.path.splitext('/path/file.suffix')[-1] # .suffix
pathlib
has suffix
as an attribute of the Path
object:
python
from pathlib import Path Path('/path/file.suffix').suffix # .suffix
We can create a directory using os
with os.mkdir()
:
python
import os path = os.path.join(os.path.expanduser('~'), 'python-file-paths') os.mkdir(path)
With pathlib
, we use a method on the Path
class Path.mkdir()
:
python
from pathlib import Path path = Path.home() / 'python-file-paths' path.mkdir()
Sometimes we want to make a deeper folder structure, where many of the folders in the path don't exist.
Trying this will raise an error (as the folder foo
doesn't exist yet):
python
from pathlib import Path path = Path.home() / 'python-file-paths' / 'foo' / 'bar' path.mkdir()
output
FileNotFoundError: [Errno 2] No such file or directory: '/Users/adam/python-file-paths/foo/bar'
We can avoid this error by using parents=True
:
python
from pathlib import Path path = Path.home() / 'python-file-paths' / 'foo' / 'bar' path.mkdir(parents=True)
Another cause of error is trying to make a directory that already exists:
python
from pathlib import Path path = Path.home() / 'python-file-paths' path.mkdir() # FileExistsError
It's common to use both parents=True
and exist_ok=True
whenever we make a folder:
python
from pathlib import Path path = Path.home() / 'python-file-paths' / 'foo' / 'bar' path.mkdir(parents=True, exist_ok=True)
The examples above are all about creating a directory from a path.
Sometimes we actually have a full file path (including both folders and a filename). If use mkdir
on a full file path, we will end up making a directory with the same name as our soon to be file!
We can use Path.parent
to access the enclosing folder of our file, and call .mkdir
on that folder:
python
from pathlib import Path path = Path.home() / 'python-file-paths' / 'foo' / 'bar' / 'baz.file' path.parent.mkdir(parents=True, exist_ok=True)
Imagine we have a dataset of 32 samples, and we want to save each sample in a file in $HOME/python-file-paths/
.
First using os
, where we the lack of the exist_ok
argument in os.mkdir
means we need to check if the base
folder exists before making it:
python
import os import random random.seed(42) dataset = [[random.uniform(0, 100) for _ in range(4)] for _ in range(32)] base_path = os.path.join(os.path.expanduser('~'), 'python-file-paths') if not os.path.exists(base_path): os.mkdir(base_path) for n, sample in enumerate(dataset): file_path = os.path.join(base_path, f'sample_{n}.data') with open(file_path, 'w') as file: file.write(str(sample))
We can use cat
to print out our first sample:
shell-session
$ cat ~/python-file-paths/sample_0.data [37.45401188 95.07143064 73.19939418 59.86584842]
And then using pathlib
, where we can using exist_ok=True
along with a write_text
method on our Path
object:
python
from pathlib import Path import random random.seed(42) dataset = [[random.uniform(0, 100) for _ in range(4)] for _ in range(32)] for n, sample in enumerate(dataset): path = Path.home() / 'python-file-paths' / f'sample_{n}.data' path.parent.mkdir(exist_ok=True) path.write_text(str(sample))
Again using cat
to print out our first sample, which due to our random seed, is the same:
shell-session
$ cat ~/python-file-paths/sample_0.data [37.45401188 95.07143064 73.19939418 59.86584842]
The above task was writing to many files - one file per sample. Other times we want to append to a file - the advantage being all our data is stored in one file.
These examples append text to a single file all_samples.data
. First with os
:
python
import os import random random.seed(42) dataset = [[random.uniform(0, 100) for _ in range(4)] for _ in range(32)] base_path = os.path.join(os.path.expanduser('~'), 'python-file-paths') if not os.path.exists(base_path): os.mkdir(base_path) file_path = os.path.join(base_path, 'all_samples.data') for n, sample in enumerate(dataset): with open(file_path, 'a') as file: file.write(str(sample) + '\n')
And with pathlib
- note here we are forced to use context management to be able to pass an append flag of a
:
python
from pathlib import Path import random random.seed(42) dataset = [[random.uniform(0, 100) for _ in range(4)] for _ in range(32)] for n, sample in enumerate(dataset): path = Path.home() / 'python-file-paths' / 'samples.data' path.parent.mkdir(exist_ok=True) with path.open('a') as fi: fi.write(str(sample)+'\n')
Now our data is stored in a single file (one line per row):
shell-session
$ head -n 2 ~/python-file-paths/samples.data [37.45401188 95.07143064 73.19939418 59.86584842] [15.60186404 15.59945203 5.80836122 86.61761458]
Let's open one of the text files we created earlier.
Reading a text file with os
requires context management to properly close the file after opening:
python
from os.path import join, expanduser path = join(expanduser('~'), 'python-file-paths', 'samples.data') with open(path, 'r') as fi: data = fi.read()
With pathlib
we can open, read & close the file using the read_text()
method on our Path
object:
python
from pathlib import Path path = Path.home() / 'python-file-paths', 'samples.data') data = path.read_text()
Sometimes we want to find the paths for many files. We want to find paths deep in the file system.
With os
we can use os.walk
:
python
from os import walk from os.path import join, expanduser home = expanduser('~') files = [] for root, dirs, files in walk(join(expanduser('~'), 'python-file-paths')): for path in files: if path.endswidth('.py'): files.append(join(root, path))
With pathlib
, glob
is best:
python
from pathlib import Path path = Path().home() paths = [p for p in path.glob('**/*.py') if p.is_file()]
glob
will not return path orders deterministically - if you are relying on the order, be sure to call sorted
on paths
.
Often we want a list of directories at a certain path - here we use the user's home directory. We don't want this to be recursive.
For os.path
we use os.path.listdir
to iterate over a path, with os.path.isdir
to check the path is a directory:
python
from os import listdir from os.path import expanduser, join, isdir path = expanduser('~') dirs = [join(path, p) for p in listdir(path) if isdir(join(path, p))]
For pathlib
we use path.iterdir
and path.is_dir
- both methods are called on the Path
object:
python
from pathlib import Path path = Path().home() dirs = [p.name for p in path.iterdir() if p.is_dir()]
Sometimes we want to look beyond a single path, and recursively search for folders.
We can do this using os.walk
:
python
from os import walk from os.path import expanduser, join, isdir paths = [] for root, dirs, files in walk(join(expanduser('~'), 'python-file-paths')): for path in files: full_path = join(root, path) if isdir(full_path): paths.append(full_path)
With pathlib
this is best done using path.glob
:
python
from pathlib import Path path = Path().home() paths = [p for p in path.glob('**/*') if p.is_dir()]
For the first time we need to use a function from outside pathlib
- using shutil.rmtree
to remove a non-empty directory.
The best way to do this is with shutil.rmtree
, which will remove the directory even if it is not empty.
There is no real difference between os
& pathlib
except for when creating the filepath - the example below uses pathlib
:
python
from shutil import rmtree from pathlib import Path path = Path.home() / 'python-file-paths' rmtree(path)
This is usually the behaviour you want when removing directories - remove even if not empty.
Sometimes we want to remove specific files - when we know the path.
We can do this with os
:
python
import os from os.path import expanduser, isdir, join path = join(expanduser('~'), 'python-file-paths', 'data.txt') if os.path.exists(path): os.remove(path)
And with pathlib
:
python
from pathlib import Path path = Path.home() / 'python-file-paths' / 'data.txt' path.unlink(missing_ok=True)
As summary of the different options for removing files and directories:
Task | os.path | pathlib |
---|---|---|
Remove empty directory | os.rmdir | path.rmdir |
Remove file | os.remove | path.unlink |
Remove directory | shutil.rmtree | shutil.rmtree |
Thanks for reading! Here are the key takeaways:
pathlib
moves functionality into a single Path
object,exist_ok
and parents
when creating directories,Path.parents
attribute allows easy access of the folder a file is in,Path.write_text
& Path.read_text
,Path.is_dir()
or a folder with Path.is_file()
.This content is a sample from our Introduction to Python course.
Thanks for reading!
If you enjoyed this blog post, make sure to check out our free 77 data science lessons across 22 courses.