Full Text Search using tsvector with Neon Postgres

A step-by-step guide describing how to implement full text search with tsvector in Postgres

The tsvector type enables you to use full text search on your text content in Postgres. Full text search allows you to search text content in a more flexible way than using LIKE. Full text search also supports features like stemming, which means searching for the word "run" will match variations like "ran" and "running".

Steps

Set up a table with a tsvector column
Execute your first full text search
Search for multiple words
Rank the results
Create a GIN index

Set up a table with a `tsvector` column

To set up full text search, you need to create a column of type tsvector that will enable full text search. You can run the following CREATE TABLE statement in the Neon SQL Editor or from a client such as psql that is connected to Neon. This statement will create a table with a column searchable of type tsvector.

CREATE TABLE documents (
  id SERIAL PRIMARY KEY,
  title TEXT,
  body TEXT,
  searchable tsvector
);

Next, insert two new rows into the documents table. The [to_tsvector()] (https://www.postgresql.org/docs/current/textsearch-controls.html) function takes in a language and the text content to tokenize. In the following example, the text content is the title and body columns concatenated together.

INSERT INTO documents (title, body, searchable)
  VALUES (
    'PostgreSQL Full-Text Search',
    'This is an introduction to full-text search in PostgreSQL.',
    to_tsvector('english', 'PostgreSQL Full-Text Search This is an introduction to full-text search in PostgreSQL.')
  );

INSERT INTO documents (title, body, searchable)
  VALUES (
    'My Mashed Potatoes Recipe',
    'These amazing homemade mashed potatoes are perfectly rich and creamy, full of great flavor, easy to make, and always a crowd fave!',
    to_tsvector('english', 'My Mashed Potatoes Recipe These amazing homemade mashed potatoes are perfectly rich and creamy, full of great flavor, easy to make, and always a crowd fave!')
  );

Once you have inserted the new rows, try running SELECT * FROM documents;. You will see that the data stored in the searchable column looks like the following.

'full':3,12 'full-text':2,11 'introduct':9 'postgresql':1,16 'search':5,14 'text':4,13

Internally, to_tsvector() uses a parser to break the text content into tokens for easier searching.

Execute your first full text search

You can execute a full text search query using a WHERE clause with @@ to_tsquery('english', 'content here') as shown below. The following query returns the "mashed potatoes" row because, although the word "flavorful" does not appear in that row's text content, the word "flavor" does. And "flavor" matches "flavorful".

SELECT
    *
  FROM documents
  WHERE searchable @@ to_tsquery('english', 'flavorful');

Similarly, the following query returns the "PostgeSQL" row, even though the word "searching" does not appear in that row's text content, but "search" does.

SELECT
    *
  FROM documents
  WHERE searchable @@ to_tsquery('english', 'searching');

The @@ operator is a special operator which compares the tsvector value stored in the searchable column with the tsquery value provided in the query. The tsquery type is different from the tsvector type. For example, if you run the following command, Postgres will return the string "searching".

SELECT to_tsquery('english', 'searching');

Search for multiple words

If you try using to_tsquery() with multiple words, like to_tsquery('english', 'searching text');, Postgres will throw the following error.

ERROR: syntax error in tsquery: "searching text" (SQLSTATE 42601)

That's because the input to to_tsquery() must be tokens separated by tsquery operators like &. The correct way to search for "searching" and "text" would be to_tsquery('english', 'searching & text');. To make full text search easier to work with, Postgres also has a phraseto_tsquery() function that converts text into a tsquery with no need for operators. The following query will successfully return the "PostgreSQL" row.

SELECT
    *
  FROM documents
  WHERE searchable @@ phraseto_tsquery('english', 'searching text');

tsquery also supports negations. For example, the following query will search for rows whose text content matches "searching" and does not match "text".

SELECT
    *
  FROM documents
  WHERE searchable @@ to_tsquery('english', 'searching & !text');

Postgres also supports a websearch_to_tsquery() function, which uses an alternative syntax that doesn't require putting operators between all tokens. websearch_to_tsquery() supports negations by prefixing a token with -. The following query also searches for rows whose text content matches "searching" and does not match "text".

SELECT
    *
  FROM documents
  WHERE searchable @@ websearch_to_tsquery('english', 'searching -text');

Rank the results

Postgres provides two functions for ranking the results, allowing you to sort by which results are the best match. The following statement sorts rows that match "searching text" using the ts_rank() function, which counts the number of tokens that match.

SELECT
    id,
    title,
    ts_rank(searchable, websearch_to_tsquery('english', 'searching text')) AS rank
  FROM documents
  WHERE searchable @@ websearch_to_tsquery('english', 'searching text')
  ORDER BY rank DESC;

To see how sorting works in practice, insert two more rows as follows. The first row contains 6 tokens that match "search" and "text", so that row should show up first.

INSERT INTO documents (title, body, searchable)
  VALUES (
    'PostgreSQL Text Search',
    'A comprehensive, searchable guide in plain text format. Covers full text search in PostgreSQL',
    to_tsvector('english', 'PostgreSQL Text Search A comprehensive, searchable guide in plain text format. Covers full text search in PostgreSQL')
  );

Running the ts_rank() SELECT statement with these two new rows outputs rows in the following order. "PostgreSQL Text Search" appears first because it has the most occurrences of tokens that match "search" and "text".

2	PostgreSQL Text Search	0.34941113
1	PostgreSQL Full-Text Search	0.3054688

Postgres also has a ts_rank_cd() function which uses an alternative ranking algorithm based on cover density. ts_rank_cd() also takes proximity of matching tokens into consideration, so the "PostgreSQL Text Search" row will rank slightly lower with ts_rank_cd() because there's more words between the matching tokens.

SELECT
    id,
    title,
    ts_rank_cd(searchable, websearch_to_tsquery('english', 'searching text')) AS rank
  FROM documents
  WHERE searchable @@ websearch_to_tsquery('english', 'searching text')
  ORDER BY rank DESC;

1	PostgreSQL Full-Text Search	0.21666667
2	PostgreSQL Text Search	0.21428572

Create a GIN index

GIN indexes allow you to index your tsvector properties, which can make your full text search queries faster as your data grows. Just be careful, GIN indexes can slow down your updates. Below is how you can create a GIN index on the searchable column.

CREATE INDEX searchable_idx ON documents USING GIN(searchable);

To test out the GIN index, let's first insert 100 copies of the "mashed potatoes" document. Sometimes Postgres decides to skip using indexes and use a sequential scan instead when a query matches most of the table.

DO $$
BEGIN
  FOR i IN 1..100 LOOP
    INSERT INTO documents (title, body, searchable)
    VALUES (
      'My Mashed Potatoes Recipe',
      'These amazing homemade mashed potatoes are perfectly rich and creamy, full of great flavor, easy to make, and always a crowd fave!',
      to_tsvector('english', 'My Mashed Potatoes Recipe These amazing homemade mashed potatoes are perfectly rich and creamy, full of great flavor, easy to make, and always a crowd fave!')
    );
  END LOOP;
END $$;

Next, you can run an EXPLAIN ANALYZE query (or just click the Explain button in the Neon SQL Editor) to confirm that Postgres is using your GIN index.

EXPLAIN ANALYZE
SELECT
    id, title
  FROM documents
WHERE searchable @@ to_tsquery('english', 'search');

The EXPLAIN ANALYZE query should produce output that resembles the following. The Bitmap Index Scan on searchable_idx means that Postgres is using a GIN index rather than a sequential scan to answer the query.

Bitmap Heap Scan on documents  (cost=8.54..13.10 rows=2 width=29) (actual time=0.021..0.022 rows=2 loops=1)
  Recheck Cond: (searchable @@ '''search'''::tsquery)
  Heap Blocks: exact=1
  ->  Bitmap Index Scan on searchable_idx  (cost=0.00..8.54 rows=2 width=0) (actual time=0.009..0.009 rows=2 loops=1)
        Index Cond: (searchable @@ '''search'''::tsquery)
Planning Time: 0.095 ms
Execution Time: 0.105 ms