POSIX Operators
Pattern-matching conditions
Description
A POSIX regular expression is a sequence of characters that specifies a match pattern. A string matches a regular expression if it is a member of the regular set described by the regular expression. POSIX regular expression patterns can match any portion of a string. (Redshift SQL Language reference POSIX Operators).
This grammar is partially supported in Snowflake. POSIX Operators are transformed to REGEXP_COUNT in Snowflake.
Grammar Syntax
expression [ ! ] ~ pattern
POSIX pattern-matching metacharacters
POSIX pattern matching supports the following metacharacters (all the cases are supported in Snowflake):
.
Matches any single character.
*
Matches zero or more occurrences.
+
Matches one or more occurrences.
?
Matches zero or one occurrence.
|
Specifies alternative matches.
^
Matches the beginning-of-line character.
$
Matches the end-of-line character.
$
Matches the end of the string.
[ ]
Brackets specify a matching list, that should match one expression in the list.
( )
Parentheses group items into a single logical item.
{m}
Repeat the previous item exactly m times.
{m,}
Repeat the previous item m or more times.
{m,n}
Repeat the previous item at least m and not more than n times.
[: :]
Matches any character within a POSIX character class. In the following character classes, Amazon Redshift supports only ASCII characters, just like Snowflake: [:alnum:]
, [:alpha:]
, [:lower:]
, [:upper:]
The parameters 'm' (enables multiline mode) and 's' (allows the POSIX wildcard character .
to match new lines) are used to achieve full equivalence in Snowflake. For more information please refer to Specifying the parameters for the regular expression in Snowflake.
Sample Source Patterns
Setup data
CREATE TABLE posix_test_table (
id INT,
column_name VARCHAR(255)
);
INSERT INTO posix_test_table (id, column_name)
VALUES
(1, 'abc123\nhello world'),
(2, 'test string\nwith multiple lines\nin this entry'),
(3, '123abc\nanother line\nabc123'),
(4, 'line1\nline2\nline3'),
(5, 'start\nmiddle\nend'),
(6, 'a@b#c!\nmore text here'),
(7, 'alpha\nbeta\ngamma'),
(8, 'uppercase\nlowercase'),
(9, 'line1\nline2\nline3\nline4'),
(10, '1234567890\nmore digits'),
(11, 'abc123\nabc456\nabc789'),
(12, 'start\nend\nmiddle'),
(13, 'this is the first line\nthis is the second line'),
(14, 'special characters\n!@#$%^&*()');
. : Matches any character
Input Code:
SELECT id, column_name
FROM posix_test_table
WHERE column_name ~ 'a.c';
Output Code:
SELECT id, column_name
FROM posix_test_table
WHERE REGEXP_COUNT(column_name, 'a.c', 1, 'ms') > 0;
* : Matches zero or more occurrences.
Input Code:
SELECT id, column_name
FROM posix_test_table
WHERE column_name ~ 'a*b';
Output Code:
SELECT id, column_name
FROM posix_test_table
WHERE REGEXP_COUNT(column_name, 'a*b', 1, 'ms') > 0;
? : Matches zero or one occurrence
Input Code:
SELECT id, column_name
FROM posix_test_table
WHERE column_name !~ 'a?b';
Output Code:
SELECT id, column_name
FROM posix_test_table
WHERE REGEXP_COUNT(column_name, 'a?b', 1, 'ms') = 0;
^ : Matches the beginning-of-line character
Input Code:
SELECT id, column_name
FROM posix_test_table
WHERE column_name ~ '^abc';
Output Code:
SELECT id, column_name
FROM posix_test_table
WHERE REGEXP_COUNT(column_name, '^abc', 1, 'ms') > 0;
$ : Matches the end of the string.
Input Code:
SELECT id, column_name
FROM posix_test_table
WHERE column_name !~ '123$';
Output Code:
SELECT id, column_name
FROM posix_test_table
WHERE REGEXP_COUNT(column_name, '123$', 1, 'ms') = 0;
Usage of collate columns
Arguments with COLLATE specifications are not currently supported in the RLIKE function. As a result, the COLLATE clause must be disabled to use this function. However, this may lead to differences in the results.
Input Code:
CREATE TABLE collateTable (
col1 VARCHAR(20) COLLATE CASE_INSENSITIVE,
col2 VARCHAR(30) COLLATE CASE_SENSITIVE);
INSERT INTO collateTable values ('HELLO WORLD!', 'HELLO WORLD!');
SELECT
col1 ~ 'Hello.*' as ci,
col2 ~ 'Hello.*' as cs
FROM collateTable;
Output Code:
CREATE TABLE collateTable (
col1 VARCHAR(20) COLLATE 'en-ci',
col2 VARCHAR(30) COLLATE 'en-cs'
)
COMMENT = '{ "origin": "sf_sc", "name": "snowconvert", "version": { "major": 0, "minor": 0, "patch": "0" }, "attributes": { "component": "redshift", "convertedOn": "01/16/2025", "domain": "test" }}';
INSERT INTO collateTable
values ('HELLO WORLD!', 'HELLO WORLD!');
SELECT
REGEXP_COUNT(COLLATE(
--** SSC-FDM-PG0011 - THE USE OF THE COLLATE COLUMN CONSTRAINT HAS BEEN DISABLED FOR THIS PATTERN-MATCHING CONDITION. **
col1, ''), 'Hello.*', 1, 'ms') > 0 as ci,
REGEXP_COUNT(COLLATE(
--** SSC-FDM-PG0011 - THE USE OF THE COLLATE COLUMN CONSTRAINT HAS BEEN DISABLED FOR THIS PATTERN-MATCHING CONDITION. **
col2, ''), 'Hello.*', 1, 'ms') > 0 as cs
FROM
collateTable;
If you require equivalence for these scenarios, you can manually add the following parameters to the function to achieve functional equivalence:
Parameter
Description
c
Case-sensitive matching
i
Case-insensitive matching
Known Issues
Known Issues
The behavior of fixed char types may differ. Click here for more information.
Arguments with COLLATE specifications are not currently supported in the REGEXP_COUNT function.
Related EWIs
SSC-FDM-PG0011: The use of the COLLATE column constraint has been disabled for this pattern-matching condition.
Last updated