REGEXP_SUBSTR
Returns the occurence of a regex match.
See also: REGEXP_INSTR, REGEXP_COUNT.
Syntax
REGEXP_SUBSTR( string_expr, string_test_expr [ , start_index [ , occurence ] ] ) --> INT
Arguments
Parameter |
Description |
---|---|
|
String to test |
|
Test pattern |
|
The character index offset to start counting from. Defaults to 1 |
|
Which occurence to search for. Defaults to 1 |
|
Setes the position within the string to return. Using 0, the function returns the string position of the first character of the substring that matches the pattern. Defaults to 0 |
Test patterns
Pattern |
Description |
---|---|
|
Match the beginning of a string |
|
Match the end of a string |
|
Match any character (including whitespace such as carriage return and newline) |
|
Match the preceding pattern zero or more times |
|
Match the preceding pattern at least once |
|
Match the preceding pattern once at most |
|
Match either |
|
Match zero or more instances of the sequence |
|
Match the preceding pattern exactly two times |
|
Match the preceding pattern between two and four times |
|
Matches any character that is (or is not when negated with |
Returns
Returns the same type as the argument supplied, or Integer start position of a regex match, or an empty string (''
) if no match was found.
Notes
The test pattern must be literal string. Column references or complex expressions are currently unsupported.
If the value is NULL, the result is NULL.
Examples
For these examples, assume a table named nba
, with the following structure:
CREATE TABLE nba
(
"Name" text(40),
"Team" text(40),
"Number" tinyint,
"Position" text(2),
"Age" tinyint,
"Height" text(4),
"Weight" real,
"College" text(40),
"Salary" float
);
Here’s a peek at the table contents (Download nba.csv
):
Name |
Team |
Number |
Position |
Age |
Height |
Weight |
College |
Salary |
---|---|---|---|---|---|---|---|---|
Avery Bradley |
Boston Celtics |
0.0 |
PG |
25.0 |
6-2 |
180.0 |
Texas |
7730337.0 |
Jae Crowder |
Boston Celtics |
99.0 |
SF |
25.0 |
6-6 |
235.0 |
Marquette |
6796117.0 |
John Holland |
Boston Celtics |
30.0 |
SG |
27.0 |
6-5 |
205.0 |
Boston University |
|
R.J. Hunter |
Boston Celtics |
28.0 |
SG |
22.0 |
6-5 |
185.0 |
Georgia State |
1148640.0 |
Jonas Jerebko |
Boston Celtics |
8.0 |
PF |
29.0 |
6-10 |
231.0 |
5000000.0 |
|
Amir Johnson |
Boston Celtics |
90.0 |
PF |
29.0 |
6-9 |
240.0 |
12000000.0 |
|
Jordan Mickey |
Boston Celtics |
55.0 |
PF |
21.0 |
6-8 |
235.0 |
LSU |
1170960.0 |
Kelly Olynyk |
Boston Celtics |
41.0 |
C |
25.0 |
7-0 |
238.0 |
Gonzaga |
2165160.0 |
Terry Rozier |
Boston Celtics |
12.0 |
PG |
22.0 |
6-2 |
190.0 |
Louisville |
1824360.0 |
Find players with ‘o’ in their name
nba=> SELECT "Name", REGEXP_SUBSTR("Name", '([a-zA-Z]+o[a-zA-Z]+)+') FROM nba ORDER BY 2 DESC LIMIT 10;
Name | regexp_substr
-------------------+--------------
James Young | Young
Thaddeus Young | Young
Nick Young | Young
Metta World Peace | World
Christian Wood | Wood
Justise Winslow | Winslow
Wilson Chandler | Wilson
C.J. Wilcox | Wilcox
Shayne Whittington | Whittington
Russell Westbrook | Westbrook
Using the return_position
argument
Get the last name (or middle name) for players with ‘o’ in their first and last name.
We set start_index
to 1
(the default)
nba=> SELECT "Name", REGEXP_SUBSTR("Name", '([a-zA-Z]+o[a-zA-Z]+)+', 1, 2) FROM nba ORDER BY 2 DESC LIMIT 10;
Name | regexp_substr
-------------------+--------------
Joe Young | Young
Tony Wroten | Wroten
Noah Vonleh | Vonleh
Karl-Anthony Towns | Towns
Anthony Tolliver | Tolliver
Hollis Thompson | Thompson
Jason Thompson | Thompson
Donald Sloan | Sloan
Jonathon Simmons | Simmons
Ramon Sessions | Sessions