REGEXP_SUBSTR

Returns the occurence of a regex match.

See also: REGEXP_INSTR, REGEXP_COUNT.

Syntax

REGEXP_SUBSTR( string_expr, string_test_expr [ , start_index [ , occurence ] ] ) --> INT

Arguments

Parameter

Description

string_expr

String to test

string_test_expr

Test pattern

start_index

The character index offset to start counting from. Defaults to 1

occurence

Which occurence to search for. Defaults to 1

return_position

Setes the position within the string to return. Using 0, the function returns the string position of the first character of the substring that matches the pattern. Defaults to 0

Test patterns

Pattern

Description

^

Match the beginning of a string

$

Match the end of a string

.

Match any character (including whitespace such as carriage return and newline)

*

Match the preceding pattern zero or more times

+

Match the preceding pattern at least once

?

Match the preceding pattern once at most

de|abc

Match either de or abc

(abc)*

Match zero or more instances of the sequence abc

{2}

Match the preceding pattern exactly two times

{2,4}

Match the preceding pattern between two and four times

[a-dX], [^a-dX]

Matches any character that is (or is not when negated with ^) either a, b, c, d, or X. The - character between two other characters forms a range that matches all characters from the first character to the second. For example, [0-9] matches any decimal digit. To include a literal ] character, it must immediately follow the opening bracket [. To include a literal - character, it must be written first or last. Any character that does not have a defined special meaning inside a [] pair matches only itself.

Returns

Returns the same type as the argument supplied, or Integer start position of a regex match, or an empty string ('') if no match was found.

Notes

  • The test pattern must be literal string. Column references or complex expressions are currently unsupported.

  • If the value is NULL, the result is NULL.

Examples

For these examples, assume a table named nba, with the following structure:

CREATE TABLE nba
(
   "Name" text(40),
   "Team" text(40),
   "Number" tinyint,
   "Position" text(2),
   "Age" tinyint,
   "Height" text(4),
   "Weight" real,
   "College" text(40),
   "Salary" float
 );

Here’s a peek at the table contents (Download nba.csv):

nba.csv

Name

Team

Number

Position

Age

Height

Weight

College

Salary

Avery Bradley

Boston Celtics

0.0

PG

25.0

6-2

180.0

Texas

7730337.0

Jae Crowder

Boston Celtics

99.0

SF

25.0

6-6

235.0

Marquette

6796117.0

John Holland

Boston Celtics

30.0

SG

27.0

6-5

205.0

Boston University

R.J. Hunter

Boston Celtics

28.0

SG

22.0

6-5

185.0

Georgia State

1148640.0

Jonas Jerebko

Boston Celtics

8.0

PF

29.0

6-10

231.0

5000000.0

Amir Johnson

Boston Celtics

90.0

PF

29.0

6-9

240.0

12000000.0

Jordan Mickey

Boston Celtics

55.0

PF

21.0

6-8

235.0

LSU

1170960.0

Kelly Olynyk

Boston Celtics

41.0

C

25.0

7-0

238.0

Gonzaga

2165160.0

Terry Rozier

Boston Celtics

12.0

PG

22.0

6-2

190.0

Louisville

1824360.0

Find players with ‘o’ in their name

nba=> SELECT "Name", REGEXP_SUBSTR("Name", '([a-zA-Z]+o[a-zA-Z]+)+') FROM nba ORDER BY 2 DESC LIMIT 10;
Name               | regexp_substr
-------------------+--------------
James Young        | Young
Thaddeus Young     | Young
Nick Young         | Young
Metta World Peace  | World
Christian Wood     | Wood
Justise Winslow    | Winslow
Wilson Chandler    | Wilson
C.J. Wilcox        | Wilcox
Shayne Whittington | Whittington
Russell Westbrook  | Westbrook

Using the return_position argument

Get the last name (or middle name) for players with ‘o’ in their first and last name. We set start_index to 1 (the default)

nba=> SELECT "Name", REGEXP_SUBSTR("Name", '([a-zA-Z]+o[a-zA-Z]+)+', 1, 2) FROM nba ORDER BY 2 DESC LIMIT 10;
Name               | regexp_substr
-------------------+--------------
Joe Young          | Young
Tony Wroten        | Wroten
Noah Vonleh        | Vonleh
Karl-Anthony Towns | Towns
Anthony Tolliver   | Tolliver
Hollis Thompson    | Thompson
Jason Thompson     | Thompson
Donald Sloan       | Sloan
Jonathon Simmons   | Simmons
Ramon Sessions     | Sessions