REGEXP_SUBSTR¶

Returns the occurence of a regex match.

Syntax¶

REGEXP_SUBSTR( string_expr, string_test_expr [ , start_index [ , occurence ] ] ) --> INT

Arguments¶

Parameter	Description
`string_expr`	String to test
`string_test_expr`	Test pattern
`start_index`	The character index offset to start counting from. Defaults to 1
`occurence`	Which occurence to search for. Defaults to 1
`return_position`	Setes the position within the string to return. Using 0, the function returns the string position of the first character of the substring that matches the pattern. Defaults to 0

Test patterns¶

Pattern	Description
`^`	Match the beginning of a string
`$`	Match the end of a string
`.`	Match any character (including whitespace such as carriage return and newline)
`*`	Match the preceding pattern zero or more times
`+`	Match the preceding pattern at least once
`?`	Match the preceding pattern once at most
`de\|abc`	Match either `de` or `abc`
`(abc)*`	Match zero or more instances of the sequence `abc`
`{2}`	Match the preceding pattern exactly two times
`{2,4}`	Match the preceding pattern between two and four times
`[a-dX]`, `[^a-dX]`	Matches any character that is (or is not when negated with `^`) either `a`, `b`, `c`, `d`, or `X`. The `-` character between two other characters forms a range that matches all characters from the first character to the second. For example, [0-9] matches any decimal digit. To include a literal `]` character, it must immediately follow the opening bracket [. To include a literal - character, it must be written first or last. Any character that does not have a defined special meaning inside a [] pair matches only itself.

Returns¶

Returns the same type as the argument supplied, or Integer start position of a regex match, or an empty string ('') if no match was found.

Notes¶

The test pattern must be literal string. Column references or complex expressions are currently unsupported.
If the value is NULL, the result is NULL.

Examples¶

For these examples, assume a table named nba, with the following structure:

CREATE TABLE nba
(
   "Name" TEXT,
   "Team" TEXT,
   "Number" TINYINT,
   "Position" TEXT,
   "Age" TINYINT,
   "Height" TEXT,
   "Weight" REAL,
   "College" TEXT,
   "Salary" FLOAT
 );

Here’s a peek at the table contents (Download nba.csv):

nba.csv¶
Name	Team	Number	Position	Age	Height	Weight	College	Salary
Avery Bradley	Boston Celtics	0.0	PG	25.0	6-2	180.0	Texas	7730337.0
Jae Crowder	Boston Celtics	99.0	SF	25.0	6-6	235.0	Marquette	6796117.0
John Holland	Boston Celtics	30.0	SG	27.0	6-5	205.0	Boston University
R.J. Hunter	Boston Celtics	28.0	SG	22.0	6-5	185.0	Georgia State	1148640.0
Jonas Jerebko	Boston Celtics	8.0	PF	29.0	6-10	231.0		5000000.0
Amir Johnson	Boston Celtics	90.0	PF	29.0	6-9	240.0		12000000.0
Jordan Mickey	Boston Celtics	55.0	PF	21.0	6-8	235.0	LSU	1170960.0
Kelly Olynyk	Boston Celtics	41.0	C	25.0	7-0	238.0	Gonzaga	2165160.0
Terry Rozier	Boston Celtics	12.0	PG	22.0	6-2	190.0	Louisville	1824360.0

Find players with ‘o’ in their name¶

nba=> SELECT "Name", REGEXP_SUBSTR("Name", '([a-zA-Z]+o[a-zA-Z]+)+') FROM nba ORDER BY 2 DESC LIMIT 10;
Name               | regexp_substr
-------------------+--------------
James Young        | Young
Thaddeus Young     | Young
Nick Young         | Young
Metta World Peace  | World
Christian Wood     | Wood
Justise Winslow    | Winslow
Wilson Chandler    | Wilson
C.J. Wilcox        | Wilcox
Shayne Whittington | Whittington
Russell Westbrook  | Westbrook

Using the `return_position` argument¶

Get the last name (or middle name) for players with ‘o’ in their first and last name. We set start_index to 1 (the default)

nba=> SELECT "Name", REGEXP_SUBSTR("Name", '([a-zA-Z]+o[a-zA-Z]+)+', 1, 2) FROM nba ORDER BY 2 DESC LIMIT 10;
Name               | regexp_substr
-------------------+--------------
Joe Young          | Young
Tony Wroten        | Wroten
Noah Vonleh        | Vonleh
Karl-Anthony Towns | Towns
Anthony Tolliver   | Tolliver
Hollis Thompson    | Thompson
Jason Thompson     | Thompson
Donald Sloan       | Sloan
Jonathon Simmons   | Simmons
Ramon Sessions     | Sessions