Advanced PostgreSQL with Vapor

Learn to use advanced PostgreSQL functionalities, joining tables, views, indexes and full-text search in your Vapor server app. By Mahdi Bahrami.

Login to leave a rating/review
Download materials
Save for later
Share

Learn to use advanced PostgreSQL functionalities, joining tables, views, indexes and full-text search in your Vapor server app.

The majority of applications today have some data they need to store in a database. Relational SQL databases like PostgreSQL are the most popular form of databases. They come with valuable features and commands that make using and managing data much easier.

In this tutorial, you’ll work on a sample project called Recruiter, which handles relations between a person and their company. You’ll learn how to use:

  • Joins to join tables together.
  • Views to simplify using multiple tables together and more.
  • Full-text search to search text documents.
  • Indexes for a significant performance boost when using operators.
Note: You’ll need the following for this project:

Getting Started

Download the starter project by clicking the Download Materials button at the top or bottom of this tutorial.

Open the starter project. You’ll see a variety of files and folders:

Image showing Recruiter's file and folders in Xcode's navigator

Open CreateCompany or CreatePerson. You’ll see two migrations in each file: one for creating the table and one for seeding the table with some initial values. The seeding uses two JSON files available under the Resources folder. These files contain the information of 50 companies and 100,000 people. These initial values will help demonstrate text-search and the usefulness of indexes.

Make sure Docker is running, then open the Terminal and copy-paste the following to start a new PostgreSQL database named recruiter:

docker run --name recruiter -e POSTGRES_DB=vapor_database \
  -e POSTGRES_USER=vapor_username -e POSTGRES_PASSWORD=vapor_password \
  -p 5432:5432 -d postgres

Don’t forget to set the custom working directory to the project’s directory. If you’re unfamiliar with how to do that, look at this section of Vapor’s official documentation.

Build and run Recruiter using the shortcut Command-R or the top-left Play button. The first run might take up to a minute because it’s adding information on 100,000 people to the person table by the SeedPerson migration.

You’ll eventually see a NOTICE indicating Recruiter’s successful run on address http://127.0.0.1:8080:

Image showing Xcode running Recruiter, with the console opened and a notice indicating successful run of Recruiter on address http://127.0.0.1:8080

Joining Tables

Open MainController.swift and look at getAllEmployees(_:):

func getAllEmployees(_ req: Request) async throws -> [Person] {
  try await Person
    .query(on: req.db)
    .filter(\.$company.$id != nil)
    .with(\.$company)
    .all()
}

This function finds all Persons that have a company and returns them.

Try it to get a better feel for how employed Persons appear. Build and run Recruiter, and open your preferred REST client. Send a GET request to the URL http://localhost:8080/employees/all, which will trigger the getAllEmployees(_:) function. You’ll receive a lot of people:

Postman window showing an HTTP GET request to the URL mentioned above, with a lot of people in response

In the code, two lines can be optimized:

.filter(\.$company.$id != nil)
.with(\.$company)

In these two lines, you filter out people who don’t have a company, by eager-loading each Person‘s Company. Remember that in this function, you only want people who have a company and are already employed.

with(_:) eager-loads other Models that have a relation with the Model you’re querying. Although this is convenient, it comes with a disadvantage. with(_:) queries all Persons, then does another query to retrieve their companies. That means it uses two queries instead of one, which is sub-optimal.

Using Joins

SQL Joins have a similar purpose to with(_:): joining tables related to each other but using only one query. Although it has with(_:), Fluent supports using Joins, too. Replace getAllEmployees(_:) with the following implementation, which uses Joins:

func getAllEmployees(_ req: Request) async throws -> [Person] {
  // 1
  let employees = try await Person
    .query(on: req.db)
     // 2
    .join(parent: \.$company, method: .inner)
    .all()

  // 3
  for employee in employees {
    employee.$company.value = try employee.joined(Company.self)
  }

  // 4
  return employees
}

In the code above, you:

  1. Load all employees.
  2. Use join(parent: \.$company, method: .inner) to join each Person with their Company.
  3. Iterate over each employee and set the company’s value using joined(_:). You need to set the company’s value using $company.value because Fluent doesn’t allow direct mutation of the value of a relation.
  4. Return the employees.

Note that using any of the join() functions, unlike with(_:), does not automatically set the related field’s value. Here, this means the company is loaded in Person‘s storage and can be retrieved using joined(_:), but a Person‘s company still needs to be manually populated.

But what is method: .inner doing in .join(parent: \.$company, method: .inner)?

You might have noticed that .filter(\.$company.$id != nil) has been removed from the new function, which uses joins. That’s exactly what method: .inner does.

Different Types of Joins

To improve your understanding, it’s time to go through the four Join methods PostgreSQL supports:

  • Inner: Joins two tables only if rows for both the primary table and joined table are available.
    In your case, this will load all the people with their company only if the person has a company. This is exactly why you don’t need .filter(\.$company.$id != nil) anymore: Inner join handles that for you.
  • Left: Includes all primary table rows and adds the joined table rows if available.

    If you were to use left join, it would mean you’d have all the Persons, regardless of whether they have a company. If the Person had a company, the result would contain the company; if the Person didn’t have a company, they would still be in the result.

  • Right: This method is the opposite of the left method. It includes all the joined table rows and adds the primary ones if available.
    In this context, it means you would query all the Companys alongside the Persons who have a Company. However, this won’t work out nicely because you’re using Fluent. You only get a Person object with the query in getAllEmployees(_:), and you won’t have access to any Company that doesn’t have a related Person.
  • Full: The full join includes all rows of both the primary and the joined table. If two rows in each table are related, PostgreSQL returns them together as a single row. If rows don’t have a corresponding row in the other table, they will still be in the result, just alone.
Different types of joins in SQL

The different types of joins in SQL

Right now, you only need to retrieve employees in one function. But employees are an important part of your app, and you’ll soon need them in many more places. Needing to write the same query repeatedly just to get all the employees will become a bit redundant. In the next section, you’ll learn how to use PostgreSQL’s Views to simplify frequently used queries in a SQL-native way.

Using Views for Frequently Accessed Queries

Views are simple yet powerful. Using Views, you can write a complex query and name it so you can call it by name the next time you need it.

In your case, you have two tables called person and company, which will be joined to form employees. This makes a good candidate for a view. You can make a new view named employee and set it to only return persons who are already employed. You can also exclude a person‘s hobby because it’s irrelevant to a person‘s employment status and you won’t need it.

For that, you need to make a normal model for employee, then instruct PostgreSQL with your new employee view’s properties.

Making an Employee Model

Make a new file called Employee in Models and replace its contents with the following:

import Fluent
import Vapor

final class Employee: Model, Content {
  
  static let schema = "employee"
  
  @ID
  var id: UUID?
  
  @Field(key: FieldKeys.firstName)
  var firstName: String
  
  @Field(key: FieldKeys.lastName)
  var lastName: String
  
  @Field(key: FieldKeys.job)
  var job: String

  @Field(key: FieldKeys.email)
  var email: String
  
  @Field(key: FieldKeys.companyName)
  var companyName: String
  
  @Field(key: FieldKeys.companyLocation)
  var companyLocation: String
  
  init() { }
}

extension Employee {
  enum FieldKeys {
    static let firstName: FieldKey = "first_name"
    static let lastName: FieldKey = "last_name"
    static let job: FieldKey = "job"
    static let email: FieldKey = "email"
    static let companyName: FieldKey = "company_name"
    static let companyLocation: FieldKey = "company_location"
  }
}

It seems pretty standard, doesn’t it? You’re treating employee as if it’s just a new table.

Creating Views Using SQLKit

Now, make a file called CreateEmployee in your Migrations folder. You need a new migration to tell PostgreSQL how does an Employee look like:

import Fluent
import SQLKit

struct CreateEmployee: AsyncMigration {
    
  private typealias CompanyKeys = Company.FieldKeys
  private typealias PersonKeys = Person.FieldKeys
  private typealias EmployeeKeys = Employee.FieldKeys
    
  func prepare(on database: Database) async throws {
    // 1
    let sqlDatabase = database as! SQLDatabase

    // 2
    let select = sqlDatabase
      .select()
      .column(SQLColumn(FieldKey.id.description, table: Person.schema))
      .column(
         SQLColumn(PersonKeys.firstName.description, table: Person.schema))
      .column(
         SQLColumn(PersonKeys.lastName.description, table: Person.schema))
      .column(SQLColumn(PersonKeys.job.description, table: Person.schema))
      .column(SQLColumn(PersonKeys.email.description, table: Person.schema))
      .column(
        SQLColumn(CompanyKeys.name.description, table: Company.schema),
        as: EmployeeKeys.companyName.description
      )
      .column(
        SQLColumn(CompanyKeys.location.description, table: Company.schema),
        as: EmployeeKeys.companyLocation.description
      )
      .column(
        SQLColumn(PersonKeys.companyId.description, table: Person.schema))
      .from(Person.schema)
      .join(
        SQLIdentifier(Company.schema),
        method: SQLJoinMethod.inner,
        on: SQLBinaryExpression(
          left: SQLColumn(
            PersonKeys.companyId.description, table: Person.schema),
          op: SQLBinaryOperator.equal,
          right: SQLColumn(FieldKey.id.description, table: Company.schema)
        )
      )
    
    // 3
    try await sqlDatabase.raw("""
    CREATE VIEW "\(raw: Employee.schema)" AS
    \(select.query)
    """)
    // 4
    .run()
  }
  
  func revert(on database: Database) async throws {
    let sqlDatabase = database as! SQLDatabase
    try await sqlDatabase.raw("DROP VIEW \(raw: Employee.schema)").run()
  }
}

Look at prepare(on:). This function seems slightly different from the usual Fluent code you write. That’s because it’s not Fluent! To write a migration for a new view, you need to drop down to SQLKit. SQLKit is what Fluent uses under the hood. It provides lower-level tools to communicate with your SQL database.

Go slowly through prepare(on:) to see how SQLKit works:

  1. First things first, you see let sqlDatabase = database as! SQLDatabase. Every database in Vapor is an abstraction over a lower-level database object, which handles the actual communication. Because you know you’re using PostgreSQL, you can safely force-cast your database to a SQLDatabase, because that’s the protocol all Vapor’s PostgreSQL databases conform to.
  2. Creating a view requires a Select query, which will be used to query a new View. Here, you make your Select query to select rows from the person table, then specify the columns you need, optionally give the columns a new name and, at the end, perform an inner join of each person with their company.

    Using SQLKit is basically writing raw SQL queries but with Swift syntax! This is possible thanks to SQLExpressions, which know how to turn their own values from what you write to something acceptable by your SQL database. All types used in a SQLKit query conform to SQLExpression, including SQLColumn, SQLIdentifier and SQLJoinMethod.

  3. SQLKit doesn’t yet fully support making views, so you need to use a bit of raw SQL. The syntax for making any views is as CREATE VIEW [view-name] AS [select-query], and this is what’s happening right here.

    Raw queries are not simple Strings. They use a SQLQueryString type, which is expressible by String. This enables interpolating different value types to a raw query. As you can see, you use \(raw: Employee.schema) to interpolate a plain String to the query, but you don’t need any raw labels when interpolating a SQLExpression like \(select.query). You can also bind values using \(bind:) for protection against injection attacks, but that’s not handy for this query.

  4. At the end, you execute the query to make a new view.

What happens in revert(on:) is much simpler. It’s just removing the view using the DROP VIEW command, in case you don’t want the employee view anymore.

Remember to add CreateEmployee to the list of migrations in your configure file:

app.migrations.add([
  CreateCompany(),
  SeedCompany(),
  CreatePerson(),
  SeedPerson(),
  CreateEmployee(),
])

Now, go back to getAllEmployees(_:) in MainController. You have some changes to make.

Replace getAllEmployees(_:) with the following code:

func getAllEmployees(_ req: Request) async throws -> [Employee] {
  try await Employee
    .query(on: req.db)
    .all()
}

This returns the same information as the previous code but uses your new Employee. That makes it simpler and nicer to reuse.

Using Materialized Views for Frequently Accessed Data

Materialized views are another useful feature of PostgreSQL.

The view you created in the last section is merely a saved query in the database, which will be executed when it’s called by its name. Materialized views, on the other hand, are different. They save the result of the Select query into another table upon creation and will only access that dedicated table when you query the materialized view. This has the advantage of having the data ready for faster queries while consuming disk space and needing to be updated manually or using another mechanism.

In summary, materialized views have three differences from normal views. For materialized views, you need to use:

  • CREATE MATERIALIZED VIEW to create.
  • DROP MATERIALIZED VIEW to delete.
  • REFRESH MATERIALIZED VIEW to refresh.

Now, you have the employees under control. But there are around 70,000 different employees and 30,000 unemployed people, and you need to be able to filter them on demand.

Full-Text Search

PostgreSQL comes with a few operators for pattern matching in text. The simplest of them is =. You can also use LIKE and ILIKE for more pattern-matching flexibility, or use SIMILAR, ~~ ~~* with regular expressions.

Although these are some helpful operators, they can result in unwanted matches when dealing with documents.

Imagine you want to find all sentences in a column in your database containing the word Knife. You could get away with filtering sentences using filters such as ILIKE '%knife%', which will return any columns that case-insensitively contain knife anywhere in them.

That wouldn’t be too bad! But what if you have this: “Remember to bring a few knives for cutting meats”?

This sentence has knives in it, which is the plural form of knife. Your ILIKE '%knife%' will fail to find this sentence even though it contains the plural form of knife.

Now, this is bad! You can try to use pattern matching using regular expressions for finding Knives alongside Knife. But that will be not only more complicated than you’d hope for but also slower.

Trying Full-Text Search

The good news is that PostgreSQL can do text search while understanding natural language.

It’s simple to start using PostgreSQL’s full text search. You only need to use its dedicated operator @@ instead of the previous operators.

Open MainController and look at getPeopleByHobby(_:). Right now, you have this:

func getPeopleByHobby(_ req: Request) async throws -> [Person] {
  let hobby = try req.query.get(String.self, at: "hobby")
  return try await Person
    .query(on: req.db)
    .filter(\.$hobby == hobby)
    .all()
}

Fluent doesn’t natively support @@, so you’ll need to provide a custom operator.

Change the filter part to use @@ like so:

.filter(\.$hobby, .custom("@@"), hobby)

Now, your filter will use PostgreSQL’s Full Text Search. Build and run Recruiter, and open your preferred REST client. Send a GET request to the URL http://localhost:8080/people/hobby?hobby=car to see if any people like hanging around cars. You’ll receive some people:

Image showing an HTTP request to http://localhost:8080/people/hobby?hobby=car returning an array of people

Look at hobby for a few people. Because you have 100,000 people, you’ll notice many people have the same interests around cars: Some mention Car spotting as their hobby, others Car riding and so on.

It’s important to note that although you have people with hobbies like Houseplant care or Wood carving, PostgreSQL never matched those for your car search because care and carving have nothing to do with a car.

Using Full-Text Search Operators

PostgreSQL also supports performing a full-text search for multiple words simultaneously.

For that, you need to turn your text input to tsquery using PostgreSQL’s to_tsquery() function. tsquery is the type that PostgreSQL uses for all Full Text Searches. The reason you didn’t need to_tsquery() before was that PostgreSQL automatically counted your text input as a single tsquery value. That won’t work when you want to use a more complex tsquery with different operators, so you need to use to_tsquery().

Change your filter part of the code to explicitly tell PostgreSQL your input is a tsquery. You’ll need to provide a custom filter to use the to_tsquery function:

.filter(.custom("\(Person.FieldKeys.hobby) @@ to_tsquery('\(hobby)')"))

Now, you can use different tsquery operators. The most important ones are | for OR, & for AND and ! for NOT.

Build and run Recruiter, and try the previous URL but with car & !ride hobby filter to find all hobbies that have car in them but no words related to ride. Be sure to use the URL-encoded form of &, which is %26. The new URL will be http://localhost:8080/people/hobby?hobby=car %26 !ride:

Image showing an HTTP request to http://localhost:8080/people/hobby?hobby=car %26 !ride, returning an array of people

You’ll notice you can no longer find the Car riding hobby in any of the people, because it includes riding.

This isn’t all that PostgreSQL’s full-text search offers. Full-text search has many more handy features. It not only supports many languages other than English but also can rank results or highlight parts of them. Learning about those other features will have to wait for another time, though. :]

The only problem with any text search is that it can become very slow. For now, you only have 100,000 people, and your personal computer won’t have any problems searching through them. But imagine having millions of people and your PostgreSQL server having limited resources. Your users expect an answer in a second or two, while your text-search query alone can take tens of seconds.

That’s when indexes shine!

Indexing Columns

Indexes are like the index page of a book. They keep track of what is where in a table, so you can find what you want faster.

They can drastically improve your query times at the expense of more workload beforehand, a trade-off that’s likely worth it for big tables.

Running Queries in a Database Client

Open your database client and connect to your PostgreSQL database using the password you set when creating the database, which was vapor_password. It looks like this in Postico:

Image showing Postico's connection window populated with recruiter PostgreSQL database info

After connection, you’ll see an overview of your database. Choose SQL Query so you can run a few raw SQL queries to test your current query speed:

Image showing Postico's overview window with table and view names in it and a 'SQL Query' button around the top

Now, try this raw SQL query statement. This will use normal pattern matching to find people with first names case-insensitively containing jul anywhere in them. Then, press Execute Statement or use the shortcut Command-Return:

SELECT * FROM person WHERE first_name ILIKE '%jul%';

You’ll see many people with first names like Jules, Julious, Julianna etc…:

Postico query window with the query above written and executed in it

Explaining SQL Queries

PostgreSQL comes with a handy command to explain and analyze the performance of any query. Simply add EXPLAIN ANALYZE to the previous Select command and execute:

EXPLAIN ANALYZE SELECT * FROM person WHERE first_name ILIKE '%jul%';

Postico query window with the query above written and executed in it

This explains what happens when you execute the Select command. The only important part for you is that this query takes about 55ms for PostgreSQL to execute.

This is acceptable for now, but your database is growing bigger every day. And with a few million people in it, your queries can start to take tens of seconds, which degrades your app’s user experience.

The solution is to use indexes. PostgreSQL supports a wide variety of indexes, each with a different purpose.

For searching through texts, you can use a Gin index.

Creating Indexes

Go back to Recruiter. Make a file named CreateIndexes in Migrations. Add this migration to it:

import Fluent
import SQLKit

struct CreatePG_TRGMExtension: AsyncMigration {
  
  func prepare(on database: Database) async throws {
    let sqlDatabase = (database as! SQLDatabase)
    try await sqlDatabase.raw("CREATE EXTENSION pg_trgm").run()
  }
  
  func revert(on database: Database) async throws {
    let sqlDatabase = (database as! SQLDatabase)
    try await sqlDatabase.raw("DROP EXTENSION pg_trgm").run()
  }
}

This enables PostgreSQL’s pg_trgm module, which is disabled by default. You’ll need this module to instruct PostgreSQL on how to make the a Gin index over the columns you want.

Add another migration to this file, this time to create a Gin index over the first_name and last_name columns:

struct CreatePersonFirstNameAndLastNameIndex: AsyncMigration {
  
  private typealias FieldKeys = Person.FieldKeys
  
  func prepare(on database: Database) async throws {
    let sqlDatabase = (database as! SQLDatabase)
    try await sqlDatabase.raw("""
    CREATE INDEX person_first_name_last_name_idx
    ON person
    USING GIN
    (\(raw: FieldKeys.firstName.description) gin_trgm_ops,
    \(raw: FieldKeys.lastName.description) gin_trgm_ops)
    """).run()
  }
  
  func revert(on database: Database) async throws {
    let sqlDatabase = (database as! SQLDatabase)
    try await sqlDatabase
      .raw("DROP INDEX person_first_name_last_name_idx")
      .run()
  }
}

In prepare(on:), you ask PostgreSQL to create a new index:

  • Named person_first_name_last_name_idx.
  • On the person table.
  • Using the Gin index type.
  • Over the columns first_name and last_name and using the gin_trgm_ops operator class. Gin indexes support specifying more than one column, so here you create a multi-column index on two columns instead of creating two indexes.

gin_trgm_ops is part of the pg_trgm module, which is why you needed to enable pg_trgm in the last migration.

Note: Operator classes such as gin_trgm_ops are out of scope for this tutorial. You can learn more about them in PostgreSQL’s official documentation.

Don’t forget to add your new migrations to your configure.swift:

app.migrations.add([
    CreateCompany(),
    SeedCompany(),
    CreatePerson(),
    SeedPerson(),
    CreateEmployee(),
    CreatePG_TRGMExtension(),
    CreatePersonFirstNameAndLastNameIndex(),
])

Build and run Recruiter to create the index. Gin indexes are helpful and flexible, but they’re also quite heavy, so creating and keeping them updated is costly. Luckily, you have only 100,000 people, so the index creation shouldn’t take more than a second!

Using Indexes in SQL Queries

Now, open Postico and rerun the Explain statement you did last time:

EXPLAIN ANALYZE SELECT * FROM person WHERE first_name ILIKE '%jul%';

Postico query window with the query above written and executed in it

PostgreSQL automatically uses the index you made earlier. That’s because PostgreSQL’s query planner can identify that the first_name column has an index available and that it’ll be helpful for your specific query.

You can see the execution time has decreased from 55ms, when you had no index over the first_name column, to 2.3ms when there’s a useful index available. That’s more than 20 times faster than before, and this execution time difference will only increase with more people in the person table.

Note that the execution times might differ on your device, but they will remain close to what you see here.

Where to Go From Here?

You can download the sample project by clicking Download Materials at the top or bottom of this tutorial.

To learn more about SQLKit, you can read SQLKit’s readme.

If you want to learn more about PostgreSQL’s great features, see the official PostgreSQL documentation.

We hope you enjoyed this tutorial. If you have any questions or comments, please join the forum discussion below.