How to Use a SQL Database

SQL databases are the backbone of the data ecosystem.

The popularity of SQL makes using a SQL database a core skill for developers and data professionals.

Created with Stable Diffusion.
Prompt: 'a person using a sql database via a terminal, vibrant, style of dali'. Seed: 42.

Using a SQL database requires two things - a database tool and SQL queries.

Different database tools excel at different tasks. SQL can be written in different ways, offering tradeoffs that are useful in different situations.

This article is a practical guide that presents the different options for database tools and writing SQL queries.

Database Tools & SQL Queries

Many data professionals use a combination of tools to interact with databases.

SQL database tools include GUI programs, command-line tools or code.

Each tool has different advantages and disadvantages, making it better in some use cases than others.

For example, a developer could use code to create a database schema and data, and then use a GUI to manually inspect the data during debugging.

Different database tools offer different options for writing SQL:

graph LR subgraph "Database Tools" gui[GUI Program] cli[Command Line] code[Code] end subgraph "How to Write SQL" rawsql[Raw SQL] vis[Visual Tools] orm[ORM] clit[Shell] end gui --> vis gui --> rawsql cli --> clit cli --> rawsql code --> rawsql code --> orm ;

A key insight here is the usefulness of raw SQL - every tool offers a way to run raw SQL.

There Are Different Ways to Write SQL

SQL (Structured Query Language) is a language for querying and manipulating data in relational databases.

SQL Dialects

The query language offered by different databases (such as SQLite or Postgres) is largely the same - most SQL dialects will have a SELECT statement.

There are however important points of difference between the SQL dialects. These can include things like the data types (such as a TIMESTAMP type) and functions (such as CURRENT_TIMESTAMP).

Abstractions over Raw SQL

A single SQL dialect can be written in different ways.

The different options for writing SQL can be organized by their level of abstraction over raw SQL:

  1. raw SQL offers no additional abstraction,
  2. ORMs offer additional abstractions in the form of objects in a programming language like Python,
  3. graphical tools offer a visual interface, with high level abstractions like buttons.

A Guide to Database Tools

GUI Programs

A GUI program allows interacting with a database using visual tools. A database GUI can use tools like buttons to creating tables or insert data.

Many database GUIs are able to run raw SQL queries, allowing you to view the results directly within the interface. With smart code completion and the ability to save queries, a database GUI is great for debugging.

The raw SQL query functionality in Datasette

Some database GUI's only allow a read-only view of the database; other tools will allow you to make changes to the database schema or to delete rows.

Some GUI programs, such as DBeaver or DataGrip, interact with many kinds of databases. Other programs, like PGAdmin for Postgres or Datasette for SQLite, are only able to be used with one kind of database.

Advantages of GUI programs include:

  • ease of use - you can easily navigate and manipulate data through a GUI,
  • discoverability - visual tools are easy to discover and use through buttons and menus.

Disadvantages of GUI programs include:

  • less control - GUIs often hide the underlying SQL code, which can make it harder to understand and optimize what's happening behind the scenes,
  • repeatability - it's hard to repeat the same actions over and over again when clicking through an interface.

Use a database GUI when you are debugging - they are great for running one-off queries.

Command-Line Tools

A CLI (command-line interface) allows interacting with a database using a shell.

Many databases offer a CLI as part of the core database software.

A CLI is usually database specific. Examples include sqlite3 for SQLite and psql for Postgres.

Advantages of CLIs include:

  • automation - CLIs can be automated easily,
  • resource efficiency - CLIs are usually lightweight and fast compared to GUI programs,
  • functionality - a CLI can expose more functionality than a GUI.

Disadvantages of CLIs include:

  • discoverability - understanding all the options of a CLI can require multiple pages of documentation,
  • database specific - CLIs are usually only for one type of database.

Use a CLI when:

  • debugging - a CLI is a useful debugging tool for interactive queries in a REPL,
  • automating - a CLI can be automated through shell scripts.

One of the main advantages of a CLI is that they are useful for both humans and machines.

Below are examples of how the sqlite3 CLI can be used in a human and machine friendly way.

Human Friendly CLI Usage

A human friendly use of a CLI is via an interactive REPL (Read-Evaluate-Print Loop).

We can create a new SQLite database using the sqlite3 command-line program, which is part of the Python standard library. This opens up an interactive REPL where we can interact with the database:

shell-session
$ sqlite3 database.sqlite
sqlite>

We can create a table in the database by running a CREATE TABLE statement - below we create a table named cities with columns of city, population, and hemisphere:

shell-session
sqlite> CREATE TABLE cities (city TEXT, population REAL, hemisphere TEXT);

We can insert data into a table by running a SQL INSERT INTO statement - below we insert a row into the cities table:

shell-session
sqlite> INSERT INTO cities VALUES ('auckland', 1.6, 'south');

We can retrieve data from a table by running a SELECT statement - below we retrieve all the data (a single row) from the cities table:

shell-session
sqlite> SELECT * FROM cities;
auckland|1.6|south

This style of interactive, REPL based querying is great for debugging and exploring a database.

Machine Friendly CLI Usage

A machine friendly use of the CLI is via a shell script. A shell script can be automated in a workflow like Github Actions.

Below is a single shell script that does the same steps as in our human-friendly version - creating a table cities, inserting data and then selecting the entire cities table:

create-db.sh | bash
#!/bin/sh
sqlite3 database.db "CREATE TABLE cities (city TEXT, population REAL, hemisphere TEXT);"
sqlite3 database.db "INSERT INTO cities VALUES ('auckland', 1.6, 'south');"
sqlite3 database.db "SELECT * FROM cities;"

We can run all this SQL as a single shell command by running the script:

shell-session
$ sh create-db.sh
auckland|1.6|south

We can also achieve the same thing as above by keeping our SQL in a .sql file called cities.sql by using the cat command:

cities.sql | sql
CREATE TABLE cities (city TEXT, population REAL, hemisphere TEXT);
INSERT INTO cities VALUES ('auckland', 1.6, 'south');
SELECT * FROM cities;

We can then run these SQL commands directly from the file by passing them into the sqlite3 CLI. We use the shell redirection operator < to pass the contents of the file into the sqlite3 CLI:

shell-session
$ sqlite3 database.db < cities.sql
auckland|1.6|south

Code

Writing SQL in code allows a flexible and automatable way to interact with a database.

Interacting with a database through code is convenient when you are using a database as part of an application or when running a script.

Many programming languages have libraries that allow interacting with SQL databases. In Python we can use the sqlite3 module from the standard library to interact with sqlite.

Raw SQL in Code

One way to work with SQL in a programming language is to write raw SQL inside your code.

Care must be taken when constructing SQL queries in code to prevent injection attacks, a security vulnerability where an attacker can manipulate SQL queries to compromise the database.

An example of a SQL injection is below - user input ends up being executed as SQL:

python
# SQL injection from user input
user_input = "DROP TABLE users;"
cursor.execute(f"SELECT * from user_input WHERE name = {user_input};")

The correct way to insert data into a SQL query is to escape user input. This can be done by using prepared statements or parameterized queries.

python
# parameterizing the query in SQLite
cursor.execute(f"SELECT * from user_input WHERE name = ?", (user_input, ))

Advantages of raw SQL include:

  • flexibility - raw SQL is portable, and can be used across many different SQL databases,
  • performance - Writing raw SQL allows developers to optimize queries for performance. The developers can control how the SQL is written and what joins are used which can lead to more efficient and faster running queries.

Disadvantages of raw SQL include:

  • maintainability - raw SQL queries embedded in code can be challenging to understand, debug, modify,
  • security - raw SQL has the risk of a SQL injection,
  • database specificity - different databases have their own flavors of SQL with minor syntax and functionality differences. If you're switching databases, some raw SQL may need to be rewritten.

Use raw SQL in code when:

  • automating - code execution can be automated,
  • quick, one off queries - if you only need to make one or two queries, writing the SQL queries in code may be easier than introducing an ORM.

In Python we can interact with a SQLite database using the sqlite3 module in the standard library:

python
# import the sqlite3 library 
# sqlite3 is part of the Python standard library
import sqlite3

# create a connection to the sqlite database file
# if the database does not exist, it will be created
conn = sqlite3.connect('cities.db')

# the cursor is a core interface - it executes SQL commands
cursor = conn.cursor()

# create a table with a `CREATE TABLE` SQL statement
# Cursor.execute() executes SQL commands
cursor.execute("CREATE TABLE cities (city TEXT, population REAL, hemisphere TEXT)")

# insert data into the table
# second argument of execute() is a tuple of the data to be inserted
# this is a parameterized query 
for city, population, hemisphere in zip(
    data['city'],
    data['population'],
    data['hemisphere']
):
    cursor.execute("INSERT INTO cities VALUES (?, ?, ?)", (city, population, hemisphere))

# commit the transaction to the database
# this writes the data to the database
conn.commit()

# execute a `SELECT` statement to retrieve all rows from the `cities` table
cursor.execute("SELECT * FROM cities")

# fetch all the rows of a query result
rows = cursor.fetchall()

# the query result is a list of tuples
for row in rows:
    print(row)

# close the Cursor and Connection objects
# closing connection allows SQLite to commit any changes & release resources 
cursor.close()
conn.close()

ORMs

An ORM (Object-Relational Mapping) allows interacting with a database through an object-oriented paradigm.

An ORM will allow a developer to define the structure & functionality of a database using classes - these classes are the objects in the Object-Relational model.

Database tables become classes, with the attributes of those classes becoming the columns.

python
class City(Base):
    __tablename__ = 'cities'
    city = Column(String, primary_key=True)
    population = Column(Float)
    hemisphere = Column(String)

An ORM can simplify the process of creating tables, inserting data, querying and retrieving data. Under the hood an ORM will translate the object-oriented code into raw SQL.

ORMs are often database agnostic - the same object-oriented code can work with different databases, giving you some level of protection from the database concretions.

ORMs can also provide additional functionality such as connection pooling, lazy loading, caching, and transaction management.

Advantages of an ORM:

  • database agnostic - designed to work with any type of database without needing to change your code.
  • maintainability - ORMs typically promote cleaner, more maintainable code.

Disadvantages of an ORM:

  • amount of code - an ORM will require writing & maintaining more code than raw SQL,
  • performance - ORMs can be less performant for complex queries.

Use an ORM when maintaining a large database - if you have a complex database schema, an ORM can provide a clean way to express the database tables & relationships.

Here is an example of how to use SQLAlchemy with our cities dataset:

python
# import objects from sqlalchemy
from sqlalchemy import create_engine, Column, String, Float
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker

# create a new engine with our sqlite database file
# if the database does not exist, it will be created
# the engine is the core interface to the database
engine = create_engine('sqlite:///cities.db')

# create a base class for declarative class definitions
# all mapped classes inherit from this Base class  
Base = declarative_base()

# define a new class City that inherits from Base
# this corresponds to a table in the database
# each attribute of the class is a column
class City(Base):
    __tablename__ = 'cities'
    city = Column(String, primary_key=True)
    population = Column(Float)
    hemisphere = Column(String)

# create all the tables we have defined as classes
Base.metadata.create_all(engine)

# sessionmaker is a factory that constructs new session objects
Session = sessionmaker(bind=engine)

# create a Session instance
session = Session()

# insert data into the table by creating new City objects
# this data is pending - it will be inserted into the database on transaction commit
for city, population, hemisphere in zip(
    data['city'],
    data['population'],
    data['hemisphere']
):
    session.add(City(city=city, population=population, hemisphere=hemisphere))

# commit the transaction to the database
# this writes the data to the database
session.commit()

# query the database to get all cities using the `City` class
# this is equivalent to "SELECT * FROM cities" in raw SQL
cities = session.query(City).all()

# our query returns a list of `City` class instances
for city in cities:
    print(city.city, city.population, city.hemisphere)

# close the session
session.close()

Summary

Thanks for reading!

Here are the different approaches for using a SQL database, with their advantages and disadvantages:

GUI ProgramsCommand Line ToolsRaw SQLORMs
Ease of UseHighMediumLowMedium
DiscoverabilityHighLowLowMedium
AutomationLowHighHighHigh
Database VersatilityHighLowHighHigh

Key takeaways from this post are:

  • data professionals use a variety of tools to interact with a SQL database - different tools are strong in different tasks,
  • raw SQL is useful as you can use it with all database tools,
  • a database GUI program can be used with any database,
  • a CLI offers an automatable way to interact with a database,
  • raw SQL or an ORM can be used to interact with a database using code,
  • ORMs offer a database agnostic way to use a database with objects.

Use each tool in the following scenarios:

  • debugging - use a database GUI or CLI tool,
  • automating - use a CLI in a shell script or code,
  • a few automated queries - use raw SQL in code,
  • a large database application - use an ORM in code.

This content is a sample from the Introduction to Databases course.


Thanks for reading!

If you enjoyed this blog post, make sure to check out our free 77 data science lessons across 22 courses.