A subquery is a query within another query. Subqueries in a FROM or WHERE clause are used to provide data that will be used to limit or compare/evaluate the data returned by the containing query. (Snowflake subqueries documentation).
Subqueries can be correlated/uncorrelated as well as scalar/non-scalar.
Correlated subqueries reference columns from the outer query. In Snowflake, correlated subqueries execute for each row in the query. On the other hand, Uncorrelated subqueries do not reference the outer query and are executed once for the entire query.
Scalar subqueries return a single value as result, otherwise the subquery is non-scalar.
The following patterns are based on these categories.
Sample Source Patterns
Setup data
Teradata
CREATE TABLE tableA( col1 INTEGER, col2 VARCHAR(20));CREATE TABLE tableB( col3 INTEGER, col4 VARCHAR(20));INSERT INTO tableA VALUES (50, 'Hey');INSERT INTO tableA VALUES (20, 'Example');INSERT INTO tableB VALUES (50, 'Hey');INSERT INTO tableB VALUES (20, 'Bye');
Snowflake evaluates correlated subqueries at compile time to determine if they are scalar and therefore valid in the context were a single return value is expected. To solve this, the ANY_VALUE aggregate function is added to the returned column when the result is not an aggregate function. This allows the compiler to determine a single value return is expected. Since scalar subqueries are expected to return a single value the function ANY_VALUE will not change the result, it will just return the original value as is.
Teradata
SELECT col2FROM tableAWHERE col1 = (SELECT col3 FROM tableB WHERE col2 = col4);
+------+
| col2 |
+------+
| Hey |
+------+
Snowflake
SELECT col2FROM tableAWHERE col1 =--** SSC-FDM-0002 - CORRELATED SUBQUERIES MAY HAVE SOME FUNCTIONAL DIFFERENCES. ** ( SELECT ANY_VALUE(col3) FROM tableB WHERE RTRIM( col2) = RTRIM(col4));
SELECT col2 FROM tableA WHERE col1 IN ( SELECT col3 FROM tableB ); SELECT col2 FROM tableA WHERE col1 >= ALL( SELECT col3 FROM tableB ); SELECT col2, myDerivedTable.col4 FROM tableA, ( SELECT* FROM tableB ) AS myDerivedTable WHERE col1 = myDerivedTable.col3;
1. Subqueries with FETCH first that are not uncorrelated scalar
Oracle allows using the FETCH clause in subqueries, Snowflake only allows using this clause if the subquery is uncorrelated scalar, otherwise an exception will be generated.
SnowConvert will mark any inalid usage of FETCH in subqueries with SSC-EWI-0108
Oracle:
-- Correlated scalarSELECT col2FROM tableAWHERE col2 = (SELECT col4 FROM tableB WHERE col3 = col1 FETCH FIRST ROW ONLY);-- Uncorrelated scalarSELECT col2FROM tableAWHERE col2 = (SELECT col4 FROM tableB FETCH FIRST ROW ONLY);
Snowflake:
-- Correlated scalarSELECT col2FROM tableA WHERE col2 =--** SSC-FDM-0002 - CORRELATED SUBQUERIES MAY HAVE SOME FUNCTIONAL DIFFERENCES. ** !!!RESOLVE EWI!!! /*** SSC-EWI-0108 - THE FOLLOWING SUBQUERY MATCHES AT LEAST ONE OF THE PATTERNS CONSIDERED INVALID AND MAY PRODUCE COMPILATION ERRORS ***/!!! (SELECT ANY_VALUE( col4) FROM tableB WHERE col3 = col1 FETCH FIRST 1 ROW ONLY);-- Uncorrelated scalarSELECT col2FROM tableA WHERE col2 = (SELECT col4 FROM tableB FETCH FIRST 1 ROW ONLY);
Related EWIs
SSC-FDM-0002: Correlated subquery may have functional differences
SSC-EWI-0108: The following subquery matches at least one of the patterns considered invalid and may produce compilation errors