REGEXP_SUBSTR

Returns the occurence of a regex match.

Syntax

REGEXP_SUBSTR( string_expr, string_test_expr [ , start_index [ , occurence ] ] ) --> INT

Arguments

Parameter	Description
`string_expr`	String to test
`string_test_expr`	Test pattern
`start_index`	The character index offset to start counting from. Defaults to 1
`occurence`	Which occurence to search for. Defaults to 1
`return_position`	Setes the position within the string to return. Using 0, the function returns the string position of the first character of the substring that matches the pattern. Defaults to 0

Test patterns

Pattern	Description
`^`	Match the beginning of a string
`$`	Match the end of a string
`.`	Match any character (including whitespace such as carriage return and newline)
`*`	Match the preceding pattern zero or more times
`+`	Match the preceding pattern at least once
`?`	Match the preceding pattern once at most
`de\|abc`	Match either `de` or `abc`
`(abc)*`	Match zero or more instances of the sequence `abc`
`{2}`	Match the preceding pattern exactly two times
`{2,4}`	Match the preceding pattern between two and four times
`[a-dX]`, `[^a-dX]`	Matches any character that is (or is not when negated with `^`) either `a`, `b`, `c`, `d`, or `X`. The `-` character between two other characters forms a range that matches all characters from the first character to the second. For example, [0-9] matches any decimal digit. To include a literal `]` character, it must immediately follow the opening bracket [. To include a literal - character, it must be written first or last. Any character that does not have a defined special meaning inside a [] pair matches only itself.

Returns

Returns the same type as the argument supplied, or Integer start position of a regex match, or an empty string ('') if no match was found.

Notes

The test pattern must be literal string. Column references or complex expressions are currently unsupported.
If the value is NULL, the result is NULL.

Examples

For these examples, assume a table named nba, with the following structure:

CREATE TABLE nba
(
   "Name" text(40),
   "Team" text(40),
   "Number" tinyint,
   "Position" text(2),
   "Age" tinyint,
   "Height" text(4),
   "Weight" real,
   "College" text(40),
   "Salary" float
 );

Here’s a peek at the table contents (Download nba.csv):

nba.csv
Name	Team	Number	Position	Age	Height	Weight	College	Salary
Avery Bradley	Boston Celtics	0.0	PG	25.0	6-2	180.0	Texas	7730337.0
Jae Crowder	Boston Celtics	99.0	SF	25.0	6-6	235.0	Marquette	6796117.0
John Holland	Boston Celtics	30.0	SG	27.0	6-5	205.0	Boston University
R.J. Hunter	Boston Celtics	28.0	SG	22.0	6-5	185.0	Georgia State	1148640.0
Jonas Jerebko	Boston Celtics	8.0	PF	29.0	6-10	231.0		5000000.0
Amir Johnson	Boston Celtics	90.0	PF	29.0	6-9	240.0		12000000.0
Jordan Mickey	Boston Celtics	55.0	PF	21.0	6-8	235.0	LSU	1170960.0
Kelly Olynyk	Boston Celtics	41.0	C	25.0	7-0	238.0	Gonzaga	2165160.0
Terry Rozier	Boston Celtics	12.0	PG	22.0	6-2	190.0	Louisville	1824360.0

Find players with ‘o’ in their name

nba=> SELECT "Name", REGEXP_SUBSTR("Name", '([a-zA-Z]+o[a-zA-Z]+)+') FROM nba ORDER BY 2 DESC LIMIT 10;
Name               | regexp_substr
-------------------+--------------
James Young        | Young
Thaddeus Young     | Young
Nick Young         | Young
Metta World Peace  | World
Christian Wood     | Wood
Justise Winslow    | Winslow
Wilson Chandler    | Wilson
C.J. Wilcox        | Wilcox
Shayne Whittington | Whittington
Russell Westbrook  | Westbrook

Using the `return_position` argument

Get the last name (or middle name) for players with ‘o’ in their first and last name. We set start_index to 1 (the default)

nba=> SELECT "Name", REGEXP_SUBSTR("Name", '([a-zA-Z]+o[a-zA-Z]+)+', 1, 2) FROM nba ORDER BY 2 DESC LIMIT 10;
Name               | regexp_substr
-------------------+--------------
Joe Young          | Young
Tony Wroten        | Wroten
Noah Vonleh        | Vonleh
Karl-Anthony Towns | Towns
Anthony Tolliver   | Tolliver
Hollis Thompson    | Thompson
Jason Thompson     | Thompson
Donald Sloan       | Sloan
Jonathon Simmons   | Simmons
Ramon Sessions     | Sessions