REGEXP LIKE UDF

Some parts in the output code are omitted for clarity reasons.

Description

REGEXP_LIKE performs regular expression matching. This condition evaluates strings using characters as defined by the input character set. (Oracle Language Regerence REGEXP_LIKE Condition)

REGEXP_LIKE(source_char, pattern [, match_param ])

Oracle REGEXP_LIKE and Snowflake REGEXP_LIKE condition, have some functional differences, to minimize these differences and replicate Oracle REGEXP_LIKE function better, we added a custom UDF. The main idea is to escape the backslash symbol from the regular expression where it is required. These are the special characters that need to be escaped when they come with a backslash: 'd', 'D', 'w', 'W', 's', 'S', 'A', 'Z', 'n'. Also, the backreference expression (matches the same text as most recently matched by the "number specified" capturing group) needs to be escaped.

Custom UDF overloads

REGEXP_LIKE_UDF(string, string)

Parameters

  1. COL: is the character expression that serves as the search value.

  2. PATTERN: is the regular expression.

CREATE OR REPLACE FUNCTION REGEXP_LIKE_UDF(COL STRING, PATTERN STRING) 
RETURNS BOOLEAN
LANGUAGE JAVASCRIPT
IMMUTABLE
COMMENT = '{"origin":"sf_sc","name":"snowconvert","version":{"major":1, "minor":0},"attributes":{"component":"udf"}}'
AS
$$
return COL.match(new RegExp(PATTERN));
$$;

Oracle

Snowflake

REGEXP_LIKE_UDF(string, string, string)

Parameters

  1. COL: is the character expression that serves as the search value.

  2. PATTERN: is the regular expression.

  3. MATCHPARAM: is a character expression that let's change the default matching behavior of the condition. In the following table, there are the Oracle characters with their description and their equivalent in the UDF.

Match ParameterDescriptionUDF Equivalent

'i'

Specifies case-insensitive matching, even if the determined collation of the condition is case-sensitive.

'i'

'c'

Specifies case-sensitive and accent-sensitive matching, even if the determined collation of the condition is case-insensitive or accent-insensitive.

Does not have an equivalent. It is being removed from the parameter..

'n'

Allows the period (.), which is the match-any-character wildcard character, to match the newline character. If you omit this parameter, then the period does not match the newline character.

's'

'm'

Treats the source string as multiple lines. Oracle interprets ^ and $ as the start and end, respectively, of any line anywhere in the source string, rather than only at the start or end of the entire source string. If you omit this parameter, then Oracle treats the source string as a single line.

'm'

'x'

Ignores whitespace characters. By default, whitespace characters match themselves.

Does not have an equivalent. It is being removed from the parameter.

CREATE OR REPLACE FUNCTION REGEXP_LIKE_UDF(COL STRING, PATTERN STRING, MATCHPARAM STRING) 
RETURNS BOOLEAN
LANGUAGE JAVASCRIPT
IMMUTABLE
COMMENT = '{"origin":"sf_sc","name":"snowconvert","version":{"major":1, "minor":0},"attributes":{"component":"udf"}}'
AS
$$
return COL.match(new RegExp(PATTERN, MATCHPARAM));
$$;

Oracle

Snowflake

Known Issues

1. UDF match parameter may not behave as expected

Due to all the characters available in the Oracle match parameter does not have their equivalent in the user-defined function, the query result may have some functional differences compared to Oracle.

2. UDF pattern parameter does not allow only '\\' as a regular expression

If as a pattern parameter the regular expression used is only '\\' an exception will be thrown like this: JavaScript execution error: Uncaught SyntaxError: Invalid regular expression: //: \ at end of pattern in REGEXP_LIKE_UDF at 'return COL.match(new RegExp(PATTERN));' position 17 stackstrace: REGEXP_LIKE_UDF

Last updated