Postgres regexp_match() function
Extract substrings matching a regular expression pattern
The Postgres regexp_match()
function is used to extract substrings that match a regular expression pattern from a given string. It returns an array of matching substrings, including capture groups if specified in the pattern.
This function is particularly useful for complex string parsing tasks, such as extracting structured information from semi-structured text data. For example, it can be used to parse log files, extract specific components from URLs, or analyze text data for specific patterns.
Function signature
The regexp_match()
function has the following form:
string
: The input string to search for matches.pattern
: A POSIX regular expression pattern to match against the string.flags
(optional): A string of one or more single-letter flags that modify how the regular expression is interpreted.
The function returns an array of text values, where each element corresponds to a substring within the first match of the pattern in the input string. If there are no matches, the function returns NULL. If there are no capture groups in the pattern, the array contains a single element with the full match.
Example usage
Consider a table log_entries
with a log_text
column containing log messages. We can use regexp_match()
to extract specific information from these logs.
This query extracts the timestamp, log level, and message from each log entry. The regular expression pattern \[(.*?)\] (\w+): (.*)$
captures three groups:
- The timestamp between square brackets
- The log level (INFO, ERROR, etc.), which is alphabetical and terminated with a colon
- The rest of the message
Advanced examples
regexp_match()
with regex flags
Use The regexp_match()
function accepts optional flags to modify how the regular expression is interpreted. Here's an example using the 'i' flag for case-insensitive matching:
This query extracts the browser name and version from user agent strings, using case-insensitive matching.
regexp_match()
in a WHERE clause
Use You can use regexp_match()
in a WHERE clause to filter rows based on a regex pattern:
This query selects all rows from the emails
table where the email address ends with .com
, ``.org, or
.io`.
Additional considerations
Performance implications
Using regexp_match()
can be computationally expensive, especially on large datasets or with complex patterns. For better performance:
- Use simpler patterns when possible.
- Consider using
LIKE
orSIMILAR TO
for simple pattern matching. - If you frequently filter based on regex patterns, consider creating a functional index using the
regexp_match()
expression.
NULL handling
regexp_match()
returns NULL if there's no match or if the input string is NULL. This behavior can be useful in WHERE
clauses but may require careful handling in SELECT
lists.
Alternative functions
regexp_matches()
: Returns a set of all matches, useful for extracting multiple occurrences of the pattern in the input string.regexp_replace()
: Replaces substrings matching a regex pattern within a specified string.regexp_split_to_array()
: Splits a string using a regex pattern as the delimiter and returns the result as an array.substring()
: Extracts substrings based on a regex pattern similar toregexp_match()
, but only returns the first captured group of the match.
Resources
- PostgreSQL documentation: Pattern Matching
- PostgreSQL documentation: Regular Expression Details
- Regular Expression Tester: A useful tool for testing and debugging regular expressions