REGEXP_SUBSTR

Returns the occurrence of a regex match.

Syntax

REGEXP_SUBSTR( string_expr, string_test_expr [ , start_index [ , occurrence ] ] ) --> TEXT

Arguments

Parameter	Description
`string_expr`	String to test
`string_test_expr`	Test pattern
`start_index`	The character index offset to start counting from. Defaults to 1
`occurrence`	Which occurrence to search for. Defaults to 1
`return_position`	Sets the position within the string to return. Using 0, the function returns the string position of the first character of the substring that matches the pattern. Defaults to 0

Supported RegEx Patterns

Pattern	Description
`^`	Match the beginning of a string
`[^]`	Characters that do not match the specified string
`$`	Match the end of a string
`.`	Match any character (including white-space such as carriage return and newline)
`*`	Match the preceding pattern zero or more times
`+`	Match the preceding pattern at least once
`?`	Match the preceding pattern once at most (`0` or `1` time)
`(abc)*`	Match zero or more instances of the sequence `abc`, treating the expression within the parentheses as a single unit
`{m}`	Match the preceding pattern exactly `m` times
`{m,n}`	Match the preceding pattern at least `m` times but no more than `n` times
`[...]`	Match any sing character from the list within the parentheses
`\|`	`OR` clause
`\`	Treating the subsequent characters in the expression as ordinary characters rather than meta-characters
`\n`	Matching the nth (`1`-`9`) preceding sub-expression grouped within parentheses
`*?`	Occurs zero or more times
`+?`	Occurs one or more times
`??`	Occurs zero or one times

Returns

Returns the same type as the argument supplied, or Integer start position of a regex match, or an empty string ('') if no match was found.

Notes

The test pattern must be literal string. Column references or complex expressions are currently unsupported.
If the value is NULL, the result is NULL.

Examples

For these examples, assume a table named nba, with the following structure:

CREATE TABLE nba
(
   "Name" text(40),
   "Team" text(40),
   "Number" tinyint,
   "Position" text(2),
   "Age" tinyint,
   "Height" text(4),
   "Weight" real,
   "College" text(40),
   "Salary" float
 );

Here’s a peek at the table contents (Download nba.csv):

nba.csv
Name	Team	Number	Position	Age	Height	Weight	College	Salary
Avery Bradley	Boston Celtics	0.0	PG	25.0	6-2	180.0	Texas	7730337.0
Jae Crowder	Boston Celtics	99.0	SF	25.0	6-6	235.0	Marquette	6796117.0
John Holland	Boston Celtics	30.0	SG	27.0	6-5	205.0	Boston University
R.J. Hunter	Boston Celtics	28.0	SG	22.0	6-5	185.0	Georgia State	1148640.0
Jonas Jerebko	Boston Celtics	8.0	PF	29.0	6-10	231.0		5000000.0
Amir Johnson	Boston Celtics	90.0	PF	29.0	6-9	240.0		12000000.0
Jordan Mickey	Boston Celtics	55.0	PF	21.0	6-8	235.0	LSU	1170960.0
Kelly Olynyk	Boston Celtics	41.0	C	25.0	7-0	238.0	Gonzaga	2165160.0
Terry Rozier	Boston Celtics	12.0	PG	22.0	6-2	190.0	Louisville	1824360.0

Find players with ‘o’ in their name

nba=> SELECT "Name", REGEXP_SUBSTR("Name", '([a-zA-Z]+o[a-zA-Z]+)+') FROM nba ORDER BY 2 DESC LIMIT 10;
Name               | regexp_substr
-------------------+--------------
James Young        | Young
Thaddeus Young     | Young
Nick Young         | Young
Metta World Peace  | World
Christian Wood     | Wood
Justise Winslow    | Winslow
Wilson Chandler    | Wilson
C.J. Wilcox        | Wilcox
Shayne Whittington | Whittington
Russell Westbrook  | Westbrook

Using the `return_position` argument

Get the last name (or middle name) for players with ‘o’ in their first and last name. We set start_index to 1 (the default)

nba=> SELECT "Name", REGEXP_SUBSTR("Name", '([a-zA-Z]+o[a-zA-Z]+)+', 1, 2) FROM nba ORDER BY 2 DESC LIMIT 10;
Name               | regexp_substr
-------------------+--------------
Joe Young          | Young
Tony Wroten        | Wroten
Noah Vonleh        | Vonleh
Karl-Anthony Towns | Towns
Anthony Tolliver   | Tolliver
Hollis Thompson    | Thompson
Jason Thompson     | Thompson
Donald Sloan       | Sloan
Jonathon Simmons   | Simmons
Ramon Sessions     | Sessions