POSIX Operators

Pattern-matching conditions

Description

A POSIX regular expression is a sequence of characters that specifies a match pattern. A string matches a regular expression if it is a member of the regular set described by the regular expression. POSIX regular expression patterns can match any portion of a string. (Redshift SQL Language reference POSIX Operators).

This grammar is partially supported in Snowflake. POSIX Operators are transformed to REGEXP_COUNT in Snowflake.

Grammar Syntax

expression [ ! ] ~ pattern

POSIX pattern-matching metacharacters

POSIX pattern matching supports the following metacharacters (all the cases are supported in Snowflake):

POSIX
Description

.

Matches any single character.

*

Matches zero or more occurrences.

+

Matches one or more occurrences.

?

Matches zero or one occurrence.

|

Specifies alternative matches.

^

Matches the beginning-of-line character.

$

Matches the end-of-line character.

$

Matches the end of the string.

[ ]

Brackets specify a matching list, that should match one expression in the list.

( )

Parentheses group items into a single logical item.

{m}

Repeat the previous item exactly m times.

{m,}

Repeat the previous item m or more times.

{m,n}

Repeat the previous item at least m and not more than n times.

[: :]

Matches any character within a POSIX character class. In the following character classes, Amazon Redshift supports only ASCII characters, just like Snowflake: [:alnum:], [:alpha:], [:lower:], [:upper:]

The parameters 'm' (enables multiline mode) and 's' (allows the POSIX wildcard character . to match new lines) are used to achieve full equivalence in Snowflake. For more information please refer to Specifying the parameters for the regular expression in Snowflake.

Sample Source Patterns

Setup data

CREATE TABLE posix_test_table (
    id INT,
    column_name VARCHAR(255)
);

INSERT INTO posix_test_table (id, column_name)
VALUES
    (1, 'abc123\nhello world'),
    (2, 'test string\nwith multiple lines\nin this entry'),
    (3, '123abc\nanother line\nabc123'),
    (4, 'line1\nline2\nline3'),
    (5, 'start\nmiddle\nend'),
    (6, 'a@b#c!\nmore text here'),
    (7, 'alpha\nbeta\ngamma'),
    (8, 'uppercase\nlowercase'),
    (9, 'line1\nline2\nline3\nline4'),
    (10, '1234567890\nmore digits'),
    (11, 'abc123\nabc456\nabc789'),
    (12, 'start\nend\nmiddle'),
    (13, 'this is the first line\nthis is the second line'),
    (14, 'special characters\n!@#$%^&*()');

. : Matches any character

Input Code:

IN -> Redshift_01.sql
SELECT id, column_name
FROM posix_test_table
WHERE column_name ~ 'a.c';

Output Code:

OUT -> Redshift_01.sql
SELECT id, column_name
FROM posix_test_table
WHERE REGEXP_COUNT(column_name, 'a.c', 1, 'ms') > 0;

* : Matches zero or more occurrences.

Input Code:

IN -> Redshift_02.sql
SELECT id, column_name
FROM posix_test_table
WHERE column_name ~ 'a*b';

Output Code:

OUT -> Redshift_02.sql
SELECT id, column_name
FROM posix_test_table
WHERE REGEXP_COUNT(column_name, 'a*b', 1, 'ms') > 0;

? : Matches zero or one occurrence

Input Code:

IN -> Redshift_03.sql
SELECT id, column_name
FROM posix_test_table
WHERE column_name !~ 'a?b';

Output Code:

OUT -> Redshift_03.sql
SELECT id, column_name
FROM posix_test_table
WHERE REGEXP_COUNT(column_name, 'a?b', 1, 'ms') = 0;

^ : Matches the beginning-of-line character

Input Code:

IN -> Redshift_04.sql
SELECT id, column_name
FROM posix_test_table
WHERE column_name ~ '^abc';

Output Code:

OUT -> Redshift_04.sql
SELECT id, column_name
FROM posix_test_table
WHERE REGEXP_COUNT(column_name, '^abc', 1, 'ms') > 0;

$ : Matches the end of the string.

Input Code:

IN -> Redshift_05.sql
SELECT id, column_name
FROM posix_test_table
WHERE column_name !~ '123$';

Output Code:

OUT -> Redshift_05.sql
SELECT id, column_name
FROM posix_test_table
WHERE REGEXP_COUNT(column_name, '123$', 1, 'ms') = 0;

Usage of collate columns

Arguments with COLLATE specifications are not currently supported in the RLIKE function. As a result, the COLLATE clause must be disabled to use this function. However, this may lead to differences in the results.

Input Code:

IN -> Redshift_06.sql
CREATE TABLE collateTable (
col1 VARCHAR(20) COLLATE CASE_INSENSITIVE,
col2 VARCHAR(30) COLLATE CASE_SENSITIVE);

INSERT INTO collateTable values ('HELLO WORLD!', 'HELLO WORLD!');

SELECT
col1 ~ 'Hello.*' as ci,
col2 ~ 'Hello.*' as cs
FROM collateTable;

Output Code:

OUT -> Redshift_12.sql
CREATE TABLE collateTable (
col1 VARCHAR(20) COLLATE 'en-ci',
col2 VARCHAR(30) COLLATE 'en-cs'
)
COMMENT = '{ "origin": "sf_sc", "name": "snowconvert", "version": {  "major": 0,  "minor": 0,  "patch": "0" }, "attributes": {  "component": "redshift",  "convertedOn": "01/16/2025",  "domain": "test" }}';

INSERT INTO collateTable
values ('HELLO WORLD!', 'HELLO WORLD!');

SELECT
REGEXP_COUNT(COLLATE(
--** SSC-FDM-PG0011 - THE USE OF THE COLLATE COLUMN CONSTRAINT HAS BEEN DISABLED FOR THIS PATTERN-MATCHING CONDITION. **
col1, ''), 'Hello.*', 1, 'ms') > 0 as ci,
REGEXP_COUNT(COLLATE(
--** SSC-FDM-PG0011 - THE USE OF THE COLLATE COLUMN CONSTRAINT HAS BEEN DISABLED FOR THIS PATTERN-MATCHING CONDITION. **
col2, ''), 'Hello.*', 1, 'ms') > 0 as cs
FROM
collateTable;

If you require equivalence for these scenarios, you can manually add the following parameters to the function to achieve functional equivalence:

Parameter

Description

c

Case-sensitive matching

i

Case-insensitive matching

Known Issues

Known Issues

  1. The behavior of fixed char types may differ. Click here for more information.

  2. Arguments with COLLATE specifications are not currently supported in the REGEXP_COUNT function.

  • SSC-FDM-PG0011: The use of the COLLATE column constraint has been disabled for this pattern-matching condition.

Last updated