PostgreSQL WIDTH_BUCKET() Function
Summary: in this tutorial, you will learn how to use the PostgreSQL WIDTH_BUCKET()
function for binning numeric data into discrete buckets.
Introduction to the PostgreSQL WIDTH_BUCKET() function
The WIDTH_BUCKET()
function allows you to categorize numeric values into discrete buckets based on specified boundaries.
Here’s the basic syntax of the WIDTH_BUCKET()
function:
In this syntax:
numeric_value
: The numeric value that you want to place into a bucket.lower_bound
: The lower bound of the range within which thenumeric_value
falls. It is inclusive.upper_bound
: The upper bound of the range within which thenumeric_value
falls. It is exclusive.num_buckets
: The total number of buckets to divide the range into.
The WIDTH_BUCKET()
function returns an integer value that represents the bucket number into which the specified value falls.
If numeric_value
is less than the lower bound, the WIDTH_BUCKET()
function returns 0.
If the numeric_value
is greater than or equal to the lower_bound
but less than the upper bound of the first bucket, the WIDTH_BUCKET()
function returns 1.
If the numeric value falls within the second bucket, the WIDTH_BUCKET()
function returns 2.
This pattern continues until the numeric value falls into the last bucket, where the WIDTH_BUCKET()
function returns the num_buckets
If the numeric_value
is less than the lower_bound
, the WIDTH_BUCKET()
function returns 0. If the numeric_value
is greater than the upper_bound
, the WIDTH_BUCKET()
function returns the num_buckets
plus 1.
In practice, you often use the WIDTH_BUCKET()
function for data binning tasks and histogram generation.
PostgreSQL WIDTH_BUCKET() function examples
Let’s explore some examples of using the WIDTH_BUCKET()
function.
1) Basic WIDTH_BUCKET() function example
The following example uses the WIDTH_BUCKET()
function to assign numeric values to three buckets (0, 10), (10, 20), and (20, 30):
Output:
In this example:
- The
WIDTH_BUCKET()
function returns 0 for the value -1 because -1 is less than the lower bound (0). - The
WIDTH_BUCKET()
function returns 1 for the value 0 because 0 falls in the range of the first bucket (0,10). - The
WIDTH_BUCKET()
function returns 2 for the value 12 because 12 falls in the second bucket (10, 20). - The
WIDTH_BUCKET()
function returns 3 for the value 25 because 25 falls in the third bucket (20, 30). - The
WIDTH_BUCKET()
functions return 3 for the value 35 because 35 is greater than the upper bound (30).
2) Using the WIDTH_BUCKET() function with table data
We’ll use the film
table from the sample database:
The following example uses the WIDTH_BUCKET()
function to categorize the films into six buckets:
Output:
3) Using WIDTH_BUCKET() function to generate histogram
The following example uses a common table expression (CTE) to generate the bucket numbers and then calculate the frequency of films falling into each bucket:
Output:
How it works.
CTE:
- First, define a common table expression named
buckets_cte
. - Second, use the
WIDTH_BUCKET()
function to assign each film’s length into one of six buckets.
Main query:
- First, select data from the
buckets_cte
. - Second, count the number of films for each bucket using the
COUNT(*)
function and group the result by thebucket_number
using theGROUP BY
clause. - Third, sort the result by the
bucket_number
.
4) Generate ASCII histograms
Based on the result set, you can generate a histogram in the application. But if you want to generate a histogram in psql, you can use the following query:
Output:
How it works.
The query defines two common table expressions and generates the histogram using the asterisk character (*).
buckets_cte
CTE
– Assign films to buckets using theWIDTH_BUCKET()
function based on their lengths.histogram_cte
CTE
– Select data from thebuckets_cte
by grouping the frequency of the bucket numbers.- The main query – Select data from the
histogram_cte
CTE
. We use theREPEAT
() function to generate the histogram bars by repeating the asterisk (*) character a number of times which equals the integer division of frequency by 10. This is to scale down the histogram to prevent it from being too wide. To perform integer divisions, we use theDIV()
function. Since theREPEAT
function accepts an integer only, we cast the result of theDIV()
to an integer using the cast operator (::).
Summary
- Use the PostgreSQL
WIDTH_BUCKET()
function for binning numeric data into discrete buckets or generating histograms.