SIMILAR TO

Pattern-matching conditions

Description

The SIMILAR TO operator matches a string expression, such as a column name, with a SQL standard regular expression pattern. A SQL regular expression pattern can include a set of pattern-matching metacharacters, including the two supported by the LIKE operator. (Redshift SQL Language reference SIMILAR TO).

This grammar is partially supported in Snowflake. SIMILAR TO is transformed to RLIKE in Snowflake.

Grammar Syntax

expression [ NOT ] SIMILAR TO pattern [ ESCAPE 'escape_char' ]

Pattern-matching metacharacters

Redshift
Snowflake
Notes
%
.*

Matches any sequence of zero or more characters. To achieve full equivalence in Snowflake, we need to replace the '%' operator with '.*' in the pattern.

_
.

Matches any single character. To achieve full equivalence in Snowflake, we need to replace the _ operator with . and add the s parameter to enable the POSIX wildcard character . to match newline characters.

|
|

Denotes alternation. This case is fully supported in Snowflake.

*
*

Repeat the previous item zero or more times. This can have a different behavior when newline characters are included.

+
+

Repeat the previous item one or more times. This can have a different behavior when newline characters are included.

?
?

Repeat the previous item zero or one time. This can have a different behavior when newline characters are included.

{m}
{m}

Repeat the previous item exactly m times and it is fully supported in Snowflake.

{m,}
{m,}

Repeat the previous item at least m and not more than n times and it is fully supported in Snowflake.

{m,n}
{m,n}

Repeat the previous item m or more times and it is fully supported in Snowflake.

()
()

Parentheses group items into a single logical item and it is fully supported in Snowflake.

[...]
[...]

A bracket expression specifies a character class, just as in POSIX regular expressions.

Sample Source Patterns

Setup data

CREATE TABLE similar_table_ex (
    column_name VARCHAR(255)
);

INSERT INTO similar_table_ex (column_name)
VALUES
    ('abc_123'),
    ('a_cdef'),
    ('bxyz'),
    ('abcc'),
    ('start_hello'),
    ('apple'),
    ('banana'),
    ('xyzabc'),
    ('abc\ncccc'),
    ('\nabccc'),
    ('abc%def'),
    ('abc_xyz'),
    ('abc_1_xyz'),
    ('applepie'),
    ('start%_abc'),
    ('ab%_xyz'),
    ('abcs_123_xyz'),
    ('aabc123'),
    ('xyzxyz'),
    ('123abc\nanother line\nabc123');

% : Matches any sequence of zero or more characters

Input Code:

IN -> Redshift_01.sql
SELECT column_name
FROM similar_table_ex
WHERE column_name SIMILAR TO '%abc%';

Output Code:

OUT -> Redshift_01.sql
SELECT column_name
FROM similar_table_ex
WHERE RLIKE (column_name, '.*abc.*', 's');

_ : Matches any single character

Input Code:

IN -> Redshift_02.sql
SELECT column_name
FROM similar_table_ex
WHERE column_name SIMILAR TO 'a_c%';

Output Code:

OUT -> Redshift_02.sql
SELECT column_name
FROM similar_table_ex
WHERE RLIKE (column_name, 'a.c.*', 's'); 

| : Denotes alternation

Input Code:

IN -> Redshift_03.sql
SELECT column_name
FROM similar_table_ex
WHERE column_name SIMILAR TO 'a|b%';

Output Code:

OUT -> Redshift_03.sql
SELECT column_name
FROM similar_table_ex
WHERE RLIKE (column_name, 'a|b.*', 's');

{m, n} : Repeat the previous item exactly m times.

Input Code:

IN -> Redshift_04.sql
SELECT column_name
FROM similar_table_ex
WHERE column_name SIMILAR TO 'abc{2,4}';

Output Code:

OUT -> Redshift_04.sql
SELECT column_name
FROM similar_table_ex
WHERE RLIKE (column_name, 'abc{2,4}', 's');

+ : Repeat the previous item one or more times

Input Code:

IN -> Redshift_05.sql
SELECT column_name
FROM similar_table_ex
WHERE column_name SIMILAR TO 'abc+';

Output Code:

OUT -> Redshift_05.sql
SELECT column_name
FROM similar_table_ex
WHERE RLIKE (column_name, 'abc+', 's');

* : Repeat the previous item zero or more times

Input Code:

IN -> Redshift_06.sql
SELECT column_name
FROM similar_table_ex
WHERE column_name SIMILAR TO 'abc*c';

Output Code:

OUT -> Redshift_06.sql
SELECT column_name
FROM similar_table_ex
WHERE RLIKE (column_name, 'abc*c', 's');

? : Repeat the previous item zero or one time

Input Code:

IN -> Redshift_07.sql
SELECT column_name
FROM similar_table_ex
WHERE column_name SIMILAR TO 'abc?c';

Output Code:

OUT -> Redshift_07.sql
SELECT column_name
FROM
similar_table_ex
WHERE
RLIKE( column_name, 'abc?c', 's');

() : Parentheses group items into a single logical item

Input Code:

IN -> Redshift_08.sql
SELECT column_name
FROM similar_table_ex
WHERE column_name SIMILAR TO '(abc|xyz)%';

Output Code:

OUT -> Redshift_08.sql
SELECT column_name
FROM similar_table_ex
WHERE RLIKE (column_name, '(abc|xyz).*', 's');

[...] : Specifies a character class

Input Code:

IN -> Redshift_09.sql
SELECT column_name
FROM similar_table_ex
WHERE column_name SIMILAR TO '[a-c]%';

Output Code:

OUT -> Redshift_09.sql
SELECT column_name
FROM similar_table_ex
WHERE RLIKE (column_name, '[a-c].*', 's');

Escape characters

The following characters will be escaped if they appear in the pattern and are not the escape character itself:

  • .

  • $

  • ^

Input Code:

IN -> Redshift_10.sql
SELECT column_name
FROM similar_table_ex
WHERE column_name SIMILAR TO '%abc^_%' ESCAPE '^';

SELECT '$0.87' SIMILAR TO '$[0-9]+(.[0-9][0-9])?' r1;

Output Code:

OUT -> Redshift_10.sql
SELECT column_name
FROM
similar_table_ex
WHERE
RLIKE( column_name, '.*abc\\_.*', 's');

SELECT
RLIKE( '$0.87', '\\$[0-9]+(\\.[0-9][0-9])?', 's') r1;

Pattern stored in a variable

If these patterns are stored in a variable, the required adjustments for equivalence will not be applied. You can refer to the recommendations outlined in the table at the beginning of this document for additional equivalence guidelines.

Input Code:

IN -> Redshift_11.sql
WITH pattern AS (
    SELECT '%abc%'::VARCHAR AS search_pattern
)
SELECT column_name
FROM similar_table_ex, pattern
WHERE column_name SIMILAR TO pattern.search_pattern;

Output Code:

OUT -> Redshift_11.sql
WITH pattern AS (
    SELECT '%abc%'::VARCHAR AS search_pattern
)
SELECT column_name
FROM
similar_table_ex,
pattern
WHERE
RLIKE( column_name,
                    --** SSC-FDM-0032 - PARAMETER 'search_pattern' IS NOT A LITERAL VALUE, TRANSFORMATION COULD NOT BE FULLY APPLIED **
                    pattern.search_pattern, 's');

Usage of collate columns

Arguments with COLLATE specifications are not currently supported in the RLIKE function. As a result, the COLLATE clause must be disabled to use this function. However, this may lead to differences in the results.

Input Code:

IN -> Redshift_12.sql
CREATE TABLE collateTable (
col1 VARCHAR(20) COLLATE CASE_INSENSITIVE,
col2 VARCHAR(30) COLLATE CASE_SENSITIVE);

INSERT INTO collateTable values ('HELLO WORLD!', 'HELLO WORLD!');

SELECT
col1 SIMILAR TO 'Hello%' as ci,
col2 SIMILAR TO 'Hello%' as cs
FROM collateTable;

Output Code:

OUT -> Redshift_12.sql
CREATE TABLE collateTable (
col1 VARCHAR(20) COLLATE 'en-ci',
col2 VARCHAR(30) COLLATE 'en-cs'
)
COMMENT = '{ "origin": "sf_sc", "name": "snowconvert", "version": {  "major": 0,  "minor": 0,  "patch": "0" }, "attributes": {  "component": "redshift",  "convertedOn": "01/16/2025",  "domain": "test" }}';

INSERT INTO collateTable
values ('HELLO WORLD!', 'HELLO WORLD!');

SELECT
RLIKE(COLLATE(
--** SSC-FDM-PG0011 - THE USE OF THE COLLATE COLUMN CONSTRAINT HAS BEEN DISABLED FOR THIS PATTERN-MATCHING CONDITION. **
col1, ''), 'Hello.*', 's') as ci,
RLIKE(COLLATE(
--** SSC-FDM-PG0011 - THE USE OF THE COLLATE COLUMN CONSTRAINT HAS BEEN DISABLED FOR THIS PATTERN-MATCHING CONDITION. **
col2, ''), 'Hello.*', 's') as cs
FROM
collateTable;

If you require equivalence for these scenarios, you can manually add the following parameters to the function to achieve functional equivalence:

Parameter

Description

c

Case-sensitive matching

i

Case-insensitive matching

Known Issues

  1. The behavior of fixed char types may differ. Click here for more information.

  2. The RLIKE function uses POSIX extended regular expressions, which may result in different behavior in certain cases, especially when line breaks are involved. It appears that when line breaks are present in the string and a match occurs on one line, it returns a positive result for the entire string, even though the match only occurred on a single line and not across the whole string. For example:

IN -> Redshift_13.sql
CREATE TABLE table1 (
col1 VARCHAR(20)
);

INSERT INTO table1 values ('abcccc'), ('abc\neab'), ('abc\nccc');

SELECT col1
FROM table1
WHERE col1 SIMILAR TO 'abc*c';
  1. To achieve maximum equivalence, some modifications are made to the pattern operators.

  2. If these patterns are stored in a variable, SnowConvert does not apply the necessary adjustments for equivalence.

  3. Arguments with COLLATE specifications are not currently supported in the RLIKE function.

  • SSC-FDM-0032: Parameter is not a literal value, transformation could not be fully applied.

  • SSC-FDM-PG0011: The use of the COLLATE column constraint has been disabled for this pattern-matching condition.

Last updated