PostgreSQL SELECT DISTINCT

Summary: in this tutorial, you will learn how to use the PostgreSQL SELECT DISTINCT clause to remove duplicate rows from a result set returned by a query.

Introduction to PostgreSQL SELECT DISTINCT clause

The SELECT DISTINCT removes duplicate rows from a result set. The SELECT DISTINCT clause retains one row for each group of duplicates.

The SELECT DISTINCT clause can be applied to one or more columns in the select list of the SELECT statement.

The following illustrates the syntax of the DISTINCT clause:

SELECT
  DISTINCT column1
FROM
  table_name;

In this syntax, the SELECT DISTINCT uses the values in the column1 column to evaluate the duplicate.

If you specify multiple columns, the SELECT DISTINCT clause will evaluate the duplicate based on the combination of values in these columns. For example:

SELECT
   DISTINCT column1, column2
FROM
   table_name;

In this syntax, the SELECT DISTINCT uses the combination of values in both column1 and column2 columns for evaluating the duplicate.

Note that PostgreSQL also offers the DISTINCT ON clause that retains the first unique entry of a column or combination of columns in the result set.

If you want to find distinct values of all columns in a table, you can use SELECT DISTINCT *:

SELECT DISTINCT *
FROM table_name;

The star or asterisk (*) means all columns of the table_name.

PostgreSQL SELECT DISTINCT examples

Let’s create a new table to practice the SELECT DISTINCT clause.

Note that you will learn how to create a table and insert data into it in the subsequent tutorial. In this tutorial, you need to execute the statement in psql or pgAdmin to execute the statements.

First, create the colors table that has three columns: id, bcolor and fcolor using the following CREATE TABLE statement:

CREATE TABLE colors(
  id SERIAL PRIMARY KEY,
  bcolor VARCHAR,
  fcolor VARCHAR
);

Second, insert some rows into the colors table:

INSERT INTO
  colors (bcolor, fcolor)
VALUES
  ('red', 'red'),
  ('red', 'red'),
  ('red', NULL),
  (NULL, 'red'),
  (NULL, NULL),
  ('green', 'green'),
  ('blue', 'blue'),
  ('blue', 'blue');

Third, retrieve the data from the colors table using the SELECT statement:

SELECT
  id,
  bcolor,
  fcolor
FROM
  colors;

Output:

id | bcolor | fcolor
----+--------+--------
  1 | red    | red
  2 | red    | red
  3 | red    | null
  4 | null   | red
  5 | null   | null
  6 | green  | green
  7 | blue   | blue
  8 | blue   | blue
(8 rows)

1) PostgreSQL SELECT DISTINCT one column example

The following statement selects unique values from the bcolor column of the t1 table and sorts the result set in alphabetical order by using the ORDER BY clause.

SELECT
  DISTINCT bcolor
FROM
  colors
ORDER BY
  bcolor;

Output:

bcolor
--------
 blue
 green
 red
 null
(4 rows)

The bcolor column has 3 red values, two NULL, 1 green value, and two blue values. The DISTINCT removes two read values, 1 NULL, and one blue.

Note that PostgreSQL treats NULLs as duplicates so that it keeps one NULL for all NULLs when you apply the SELECT DISTINCT clause.

2) SELECT DISTINCT on multiple columns

The following statement applies the SELECT DISTINCT clause to both bcolor and fcolor columns:

SELECT
  DISTINCT bcolor, fcolor
FROM
  colors
ORDER BY
  bcolor,
  fcolor;

Output:

bcolor | fcolor
--------+--------
 blue   | blue
 green  | green
 red    | red
 red    | null
 null   | red
 null   | null
(6 rows)

In this example, the query uses the values from both bcolor and fcolor columns to evaluate the uniqueness of rows.

3) Using the SELECT DISTINCT clause in practice

In practice, you often use the SELECT DISTINCT clause to analyze the uniqueness of values in a column.

For example, you may want to know how many rental rates for films from the film table:

PostgreSQL SELECT DISTINCT - sample table To achieve this, you can specify the rental_rate column in the SELECT DISTINCT clause as follows:

SELECT DISTINCT
  rental_rate
FROM
  film
ORDER BY
  rental_rate;

Output:

rental_rate
-------------
        0.99
        2.99
        4.99
(3 rows)

The output indicates that there are only three distinct rental rates 0.99, 2.99, and 4.99.

Summary

Use the SELECT DISTINCT to remove duplicate rows from a result set of a query.