REGEXP LIKE UDF

Description

REGEXP_LIKE performs regular expression matching. This condition evaluates strings using characters as defined by the input character set. (Oracle Language Regerence REGEXP_LIKE Condition)

REGEXP_LIKE(source_char, pattern [, match_param ])

Oracle REGEXP_LIKE and Snowflake REGEXP_LIKE condition, have some functional differences, to minimize these differences and replicate Oracle REGEXP_LIKE function better, we added a custom UDF. The main idea is to escape the backslash symbol from the regular expression where it is required. These are the special characters that need to be escaped when they come with a backslash: 'd', 'D', 'w', 'W', 's', 'S', 'A', 'Z', 'n'. Also, the backreference expression (matches the same text as most recently matched by the "number specified" capturing group) needs to be escaped.

Custom UDF overloads

REGEXP_LIKE_UDF(string, string)

Parameters

  1. COL: is the character expression that serves as the search value.

  2. PATTERN: is the regular expression.

CREATE OR REPLACE FUNCTION REGEXP_LIKE_UDF(COL STRING, PATTERN STRING) 
RETURNS BOOLEAN LANGUAGE JAVASCRIPT IMMUTABLE AS
$$
return COL.match(new RegExp(PATTERN));
$$;

Oracle

SELECT first_name, last_name
FROM hr.employees
WHERE REGEXP_LIKE (first_name, '^Ste(v|ph)en$')
ORDER BY first_name, last_name;

Snowflake

SELECT first_name, last_name
FROM hr.employees
WHERE /*** MSC-WARNING - MSCEWI1020 - CUSTOM UDF 'REGEXP_LIKE_UDF' INSERTED. ***/
 REGEXP_LIKE_UDF(first_name, '^Ste(v|ph)en$')
ORDER BY first_name, last_name;

REGEXP_LIKE_UDF(string, string, string)

Parameters

  1. COL: is the character expression that serves as the search value.

  2. PATTERN: is the regular expression.

  3. MATCHPARAM: is a character expression that let's change the default matching behavior of the condition. In the following table, there are the Oracle characters with their description and their equivalent in the UDF.

Match ParameterDescriptionUDF Equivalent

'i'

Specifies case-insensitive matching, even if the determined collation of the condition is case-sensitive.

'i'

'c'

Specifies case-sensitive and accent-sensitive matching, even if the determined collation of the condition is case-insensitive or accent-insensitive.

Does not have an equivalent. It is being removed from the parameter..

'n'

Allows the period (.), which is the match-any-character wildcard character, to match the newline character. If you omit this parameter, then the period does not match the newline character.

's'

'm'

Treats the source string as multiple lines. Oracle interprets ^ and $ as the start and end, respectively, of any line anywhere in the source string, rather than only at the start or end of the entire source string. If you omit this parameter, then Oracle treats the source string as a single line.

'm'

'x'

Ignores whitespace characters. By default, whitespace characters match themselves.

Does not have an equivalent. It is being removed from the parameter.

The next query is only to show the change of the flags into the match parameter. It does not have functionality:

Oracle

SELECT col1 FROM TABLE1 WHERE REGEXP_LIKE(col1, 'regular_expression', 'icnmx');

Snowflake

SELECT col1 FROM PUBLIC.TABLE1 WHERE /*** MSC-WARNING - MSCEWI1020 - CUSTOM UDF 'REGEXP_LIKE_UDF' INSERTED. ***/
 /*** MSC-WARNING - MSCEWI3089 - REGEXP_LIKE_UDF MATCH PARAMETER MAY HAVE SOME FUNCTIONAL DIFFERENCES COMPARED TO ORACLE. ***/
 PUBLIC.REGEXP_LIKE_UDF(col1, 'regular_expression', 'ism');
CREATE OR REPLACE FUNCTION REGEXP_LIKE_UDF(COL STRING, PATTERN STRING, MATCHPARAM STRING) 
RETURNS BOOLEAN LANGUAGE JAVASCRIPT AS
$$
return COL.match(new RegExp(PATTERN, MATCHPARAM));
$$;

Oracle

SELECT last_name
FROM hr.employees
WHERE REGEXP_LIKE (last_name, '([aeiou])\1', 'i')
ORDER BY last_name;

Snowflake

SELECT last_name
FROM hr.employees
WHERE /*** MSC-WARNING - MSCEWI1020 - CUSTOM UDF 'REGEXP_LIKE_UDF' INSERTED. ***/
 /*** MSC-WARNING - MSCEWI3089 - REGEXP_LIKE_UDF MATCH PARAMETER MAY HAVE SOME FUNCTIONAL DIFFERENCES COMPARED TO ORACLE. ***/
 REGEXP_LIKE_UDF(last_name, '([aeiou])\\1', 'i')
ORDER BY last_name;

Known Issues

1. UDF match parameter may not behave as expected

Due to all the characters available in the Oracle match parameter does not have their equivalent in the user-defined function, the query result may have some functional differences compared to Oracle.

2. UDF pattern parameter does not allow only '\\' as a regular expression

If as a pattern parameter the regular expression used is only '\\' an exception will be thrown like this: JavaScript execution error: Uncaught SyntaxError: Invalid regular expression: //: \ at end of pattern in REGEXP_LIKE_UDF at 'return COL.match(new RegExp(PATTERN));' position 17 stackstrace: REGEXP_LIKE_UDF

  1. MSCEWI1020: CUSTOM UDF INSERTED.

Last updated