REGEXP_SUBSTR

Returns the occurence of a regex match.

See also: REGEXP_INSTR, REGEXP_COUNT.

Syntax

REGEXP_SUBSTR( string_expr, string_test_expr [ , start_index [ , occurence ] ] ) --> INT

Arguments

Parameter Description
string_expr String to test
string_test_expr Test pattern
start_index The character index offset to start counting from. Defaults to 1
occurence Which occurence to search for. Defaults to 1
return_position Setes the position within the string to return. Using 0, the function returns the string position of the first character of the substring that matches the pattern. Defaults to 0

Test patterns

Pattern Description
^ Match the beginning of a string
$ Match the end of a string
. Match any character (including whitespace such as carriage return and newline)
* Match the preceding pattern zero or more times
+ Match the preceding pattern at least once
? Match the preceding pattern once at most
de|abc Match either de or abc
(abc)* Match zero or more instances of the sequence abc
{2} Match the preceding pattern exactly two times
{2,4} Match the preceding pattern between two and four times
[a-dX], [^a-dX] Matches any character that is (or is not when negated with ^) either a, b, c, d, or X. The - character between two other characters forms a range that matches all characters from the first character to the second. For example, [0-9] matches any decimal digit. To include a literal ] character, it must immediately follow the opening bracket [. To include a literal - character, it must be written first or last. Any character that does not have a defined special meaning inside a [] pair matches only itself.

Returns

Returns the same type as the argument supplied, or Integer start position of a regex match, or an empty string ('') if no match was found.

Notes

  • The test pattern must be literal string. Column references or complex expressions are currently unsupported.
  • If the value is NULL, the result is NULL.

Examples

For these examples, assume a table named nba, with the following structure:

CREATE TABLE nba
(
   "Name" varchar(40),
   "Team" varchar(40),
   "Number" tinyint,
   "Position" varchar(2),
   "Age" tinyint,
   "Height" varchar(4),
   "Weight" real,
   "College" varchar(40),
   "Salary" float
 );

Here’s a peek at the table contents (Download nba.csv):

nba.csv
Name Team Number Position Age Height Weight College Salary
Avery Bradley Boston Celtics 0.0 PG 25.0 6-2 180.0 Texas 7730337.0
Jae Crowder Boston Celtics 99.0 SF 25.0 6-6 235.0 Marquette 6796117.0
John Holland Boston Celtics 30.0 SG 27.0 6-5 205.0 Boston University  
R.J. Hunter Boston Celtics 28.0 SG 22.0 6-5 185.0 Georgia State 1148640.0
Jonas Jerebko Boston Celtics 8.0 PF 29.0 6-10 231.0   5000000.0
Amir Johnson Boston Celtics 90.0 PF 29.0 6-9 240.0   12000000.0
Jordan Mickey Boston Celtics 55.0 PF 21.0 6-8 235.0 LSU 1170960.0
Kelly Olynyk Boston Celtics 41.0 C 25.0 7-0 238.0 Gonzaga 2165160.0
Terry Rozier Boston Celtics 12.0 PG 22.0 6-2 190.0 Louisville 1824360.0

Find players with ‘o’ in their name

nba=> SELECT "Name", REGEXP_SUBSTR("Name", '([a-zA-Z]+o[a-zA-Z]+)+') FROM nba ORDER BY 2 DESC LIMIT 10;
Name               | regexp_substr
-------------------+--------------
James Young        | Young
Thaddeus Young     | Young
Nick Young         | Young
Metta World Peace  | World
Christian Wood     | Wood
Justise Winslow    | Winslow
Wilson Chandler    | Wilson
C.J. Wilcox        | Wilcox
Shayne Whittington | Whittington
Russell Westbrook  | Westbrook

Using the return_position argument

Get the last name (or middle name) for players with ‘o’ in their first and last name. We set start_index to 1 (the default)

nba=> SELECT "Name", REGEXP_SUBSTR("Name", '([a-zA-Z]+o[a-zA-Z]+)+', 1, 2) FROM nba ORDER BY 2 DESC LIMIT 10;
Name               | regexp_substr
-------------------+--------------
Joe Young          | Young
Tony Wroten        | Wroten
Noah Vonleh        | Vonleh
Karl-Anthony Towns | Towns
Anthony Tolliver   | Tolliver
Hollis Thompson    | Thompson
Jason Thompson     | Thompson
Donald Sloan       | Sloan
Jonathon Simmons   | Simmons
Ramon Sessions     | Sessions