Written by Adam Green.
SQL databases are the backbone of the data ecosystem.
The popularity of SQL makes using a SQL database a core skill for developers and data professionals.
Using a SQL database requires two things - a database tool and SQL queries.
Different database tools excel at different tasks. SQL can be written in different ways, offering tradeoffs that are useful in different situations.
This article is a practical guide that presents the different options for database tools and writing SQL queries.
Many data professionals use a combination of tools to interact with databases.
SQL database tools include GUI programs, command-line tools or code.
Each tool has different advantages and disadvantages, making it better in some use cases than others.
For example, a developer could use code to create a database schema and data, and then use a GUI to manually inspect the data during debugging.
Different database tools offer different options for writing SQL:
A key insight here is the usefulness of raw SQL - every tool offers a way to run raw SQL.
SQL (Structured Query Language) is a language for querying and manipulating data in relational databases.
The query language offered by different databases (such as SQLite or Postgres) is largely the same - most SQL dialects will have a SELECT
statement.
There are however important points of difference between the SQL dialects. These can include things like the data types (such as a TIMESTAMP
type) and functions (such as CURRENT_TIMESTAMP
).
A single SQL dialect can be written in different ways.
The different options for writing SQL can be organized by their level of abstraction over raw SQL:
A GUI program allows interacting with a database using visual tools. A database GUI can use tools like buttons to creating tables or insert data.
Many database GUIs are able to run raw SQL queries, allowing you to view the results directly within the interface. With smart code completion and the ability to save queries, a database GUI is great for debugging.
Some database GUI's only allow a read-only view of the database; other tools will allow you to make changes to the database schema or to delete rows.
Some GUI programs, such as DBeaver or DataGrip, interact with many kinds of databases. Other programs, like PGAdmin for Postgres or Datasette for SQLite, are only able to be used with one kind of database.
Advantages of GUI programs include:
Disadvantages of GUI programs include:
Use a database GUI when you are debugging - they are great for running one-off queries.
A CLI (command-line interface) allows interacting with a database using a shell.
Many databases offer a CLI as part of the core database software.
A CLI is usually database specific. Examples include sqlite3
for SQLite and psql
for Postgres.
Advantages of CLIs include:
Disadvantages of CLIs include:
Use a CLI when:
One of the main advantages of a CLI is that they are useful for both humans and machines.
Below are examples of how the sqlite3
CLI can be used in a human and machine friendly way.
A human friendly use of a CLI is via an interactive REPL (Read-Evaluate-Print Loop).
We can create a new SQLite database using the sqlite3
command-line program, which is part of the Python standard library. This opens up an interactive REPL where we can interact with the database:
shell-session
$ sqlite3 database.sqlite sqlite>
We can create a table in the database by running a CREATE TABLE
statement - below we create a table named cities
with columns of city
, population
, and hemisphere
:
shell-session
sqlite> CREATE TABLE cities (city TEXT, population REAL, hemisphere TEXT);
We can insert data into a table by running a SQL INSERT INTO
statement - below we insert a row into the cities
table:
shell-session
sqlite> INSERT INTO cities VALUES ('auckland', 1.6, 'south');
We can retrieve data from a table by running a SELECT
statement - below we retrieve all the data (a single row) from the cities
table:
shell-session
sqlite> SELECT * FROM cities; auckland|1.6|south
This style of interactive, REPL based querying is great for debugging and exploring a database.
A machine friendly use of the CLI is via a shell script. A shell script can be automated in a workflow like Github Actions.
Below is a single shell script that does the same steps as in our human-friendly version - creating a table cities
, inserting data and then selecting the entire cities
table:
create-db.sh | bash
#!/bin/sh sqlite3 database.db "CREATE TABLE cities (city TEXT, population REAL, hemisphere TEXT);" sqlite3 database.db "INSERT INTO cities VALUES ('auckland', 1.6, 'south');" sqlite3 database.db "SELECT * FROM cities;"
We can run all this SQL as a single shell command by running the script:
shell-session
$ sh create-db.sh auckland|1.6|south
We can also achieve the same thing as above by keeping our SQL in a .sql
file called cities.sql
by using the cat
command:
cities.sql | sql
CREATE TABLE cities (city TEXT, population REAL, hemisphere TEXT); INSERT INTO cities VALUES ('auckland', 1.6, 'south'); SELECT * FROM cities;
We can then run these SQL commands directly from the file by passing them into the sqlite3
CLI. We use the shell redirection operator <
to pass the contents of the file into the sqlite3
CLI:
shell-session
$ sqlite3 database.db < cities.sql auckland|1.6|south
Writing SQL in code allows a flexible and automatable way to interact with a database.
Interacting with a database through code is convenient when you are using a database as part of an application or when running a script.
Many programming languages have libraries that allow interacting with SQL databases. In Python we can use the sqlite3
module from the standard library to interact with sqlite
.
One way to work with SQL in a programming language is to write raw SQL inside your code.
Care must be taken when constructing SQL queries in code to prevent injection attacks, a security vulnerability where an attacker can manipulate SQL queries to compromise the database.
An example of a SQL injection is below - user input ends up being executed as SQL:
python
# SQL injection from user input user_input = "DROP TABLE users;" cursor.execute(f"SELECT * from user_input WHERE name = {user_input};")
The correct way to insert data into a SQL query is to escape user input. This can be done by using prepared statements or parameterized queries.
python
# parameterizing the query in SQLite cursor.execute(f"SELECT * from user_input WHERE name = ?", (user_input, ))
Advantages of raw SQL include:
Disadvantages of raw SQL include:
Use raw SQL in code when:
In Python we can interact with a SQLite database using the sqlite3
module in the standard library:
python
# import the sqlite3 library # sqlite3 is part of the Python standard library import sqlite3 # create a connection to the sqlite database file # if the database does not exist, it will be created conn = sqlite3.connect('cities.db') # the cursor is a core interface - it executes SQL commands cursor = conn.cursor() # create a table with a `CREATE TABLE` SQL statement # Cursor.execute() executes SQL commands cursor.execute("CREATE TABLE cities (city TEXT, population REAL, hemisphere TEXT)") # insert data into the table # second argument of execute() is a tuple of the data to be inserted # this is a parameterized query for city, population, hemisphere in zip( data['city'], data['population'], data['hemisphere'] ): cursor.execute("INSERT INTO cities VALUES (?, ?, ?)", (city, population, hemisphere)) # commit the transaction to the database # this writes the data to the database conn.commit() # execute a `SELECT` statement to retrieve all rows from the `cities` table cursor.execute("SELECT * FROM cities") # fetch all the rows of a query result rows = cursor.fetchall() # the query result is a list of tuples for row in rows: print(row) # close the Cursor and Connection objects # closing connection allows SQLite to commit any changes & release resources cursor.close() conn.close()
An ORM (Object-Relational Mapping) allows interacting with a database through an object-oriented paradigm.
An ORM will allow a developer to define the structure & functionality of a database using classes - these classes are the objects in the Object-Relational model.
Database tables become classes, with the attributes of those classes becoming the columns.
python
class City(Base): __tablename__ = 'cities' city = Column(String, primary_key=True) population = Column(Float) hemisphere = Column(String)
An ORM can simplify the process of creating tables, inserting data, querying and retrieving data. Under the hood an ORM will translate the object-oriented code into raw SQL.
ORMs are often database agnostic - the same object-oriented code can work with different databases, giving you some level of protection from the database concretions.
ORMs can also provide additional functionality such as connection pooling, lazy loading, caching, and transaction management.
Advantages of an ORM:
Disadvantages of an ORM:
Use an ORM when maintaining a large database - if you have a complex database schema, an ORM can provide a clean way to express the database tables & relationships.
Here is an example of how to use SQLAlchemy with our cities
dataset:
python
# import objects from sqlalchemy from sqlalchemy import create_engine, Column, String, Float from sqlalchemy.ext.declarative import declarative_base from sqlalchemy.orm import sessionmaker # create a new engine with our sqlite database file # if the database does not exist, it will be created # the engine is the core interface to the database engine = create_engine('sqlite:///cities.db') # create a base class for declarative class definitions # all mapped classes inherit from this Base class Base = declarative_base() # define a new class City that inherits from Base # this corresponds to a table in the database # each attribute of the class is a column class City(Base): __tablename__ = 'cities' city = Column(String, primary_key=True) population = Column(Float) hemisphere = Column(String) # create all the tables we have defined as classes Base.metadata.create_all(engine) # sessionmaker is a factory that constructs new session objects Session = sessionmaker(bind=engine) # create a Session instance session = Session() # insert data into the table by creating new City objects # this data is pending - it will be inserted into the database on transaction commit for city, population, hemisphere in zip( data['city'], data['population'], data['hemisphere'] ): session.add(City(city=city, population=population, hemisphere=hemisphere)) # commit the transaction to the database # this writes the data to the database session.commit() # query the database to get all cities using the `City` class # this is equivalent to "SELECT * FROM cities" in raw SQL cities = session.query(City).all() # our query returns a list of `City` class instances for city in cities: print(city.city, city.population, city.hemisphere) # close the session session.close()
Thanks for reading!
Here are the different approaches for using a SQL database, with their advantages and disadvantages:
GUI Programs | Command Line Tools | Raw SQL | ORMs | |
---|---|---|---|---|
Ease of Use | High | Medium | Low | Medium |
Discoverability | High | Low | Low | Medium |
Automation | Low | High | High | High |
Database Versatility | High | Low | High | High |
Key takeaways from this post are:
Use each tool in the following scenarios:
This content is a sample from the Introduction to Databases course.
Thanks for reading!
If you enjoyed this blog post, make sure to check out our free 77 data science lessons across 22 courses.