SAMPLE clause
Description
The SAMPLE clause in Teradata reduces the number of rows to be processed and it returns one or more samples of rows as a list of fractions or as a list of numbers of rows. The clause is used in the SELECT query. Please review the following Teradata documentation for more information.
Teradata syntax
SAMPLE
[ WITH REPLACEMENT ]
[ RANDOMIZED LOCALIZATION ]
{ { fraction_description | count_description } [,...] |
when_clause ]
}Snowflake syntax
Review the following Snowflake documentation for more information. SAMPLE and TABLESAMPLE are synonymous.
SELECT ...
FROM ...
{ SAMPLE | TABLESAMPLE } [ samplingMethod ]
[ ... ]Where:
In Snowflake, the following keywords can be used interchangeably:
SAMPLE | TABLESAMPLEBERNOULLI | ROWSYSTEM | BLOCKREPEATABLE | SEED
Review the following table to check on key differences.
Sample by probability
Also known as fraction description. It must be a fractional number between 0,1 and 1.
Decimal number between 0 and 100.
Fixed number of rows
Also known as count description. It is a positive integer that determines the number of rows to be sampled.
It specifies the number of rows (up to 1,000,000) to sample from the table. Can be any integer between 0 (no rows selected) and 1000000 inclusive.
Repeated rows
It is known as WITH REPLACEMENT. This is used to query more samples than there are rows in the table.
It is known as REPEATABLE or SEED. This is used to make the query deterministic. It means that the same set of rows will be the same for each query run.
Sampling methods
Proportional and RANDOMIZED ALLOCATION.
BERNOULLI or SYSTEM.
Sample Source Patterns
Sample data
Teradata
Snowflake
SAMPLE clause
Fixed number of rows
Notice that for this example, the number of rows are a fixed number but not necessarily are the same result for each run.
Teradata
2 rows.
Snowflake
2 rows.
Rows number based on probability
This option will return a variety of rows depending on the probability set.
Teradata
25% of probability for each row: 1 output row.
Snowflake
25% of probability for each row: 1 output row.
Known Issues
Fixed number of rows with replacement
This option will return a fixed number of rows and will allows the repetition of the rows. In Snowflake, it is not possible to request more samples than rows in a table.
Teradata sample
5
Eve
100
5
Eve
100
5
Eve
100
4
David
200
4
David
200
3
Charlie
500
1
Alice
100
1
Alice
100
SAMPLEID related functionality
In Teradata, it is possible to assign a unique ID to each sample that is specified. It helps to identify which belongs to which sample. This is not ANSI grammar, instead it is an extension of Teradata.
Teradata sample
Eve
3
Charlie
1
Alice
1
David
2
Bob
1
In Snowflake, there is not a SAMPLEID function. A possible workaround may be the following, but it has to be adaptaed to each single case:
Snowflake possible workaround
Alice
1
Bob
1
Charlie
2
David
3
Eve
3
Conditional sampling
In Snowflake there is not conditional sampling. This can be achieve by using CTE's.
Teradata sample
3
Charlie
500
4
David
200
2
Bob
300
Related EWIs
SSC-EWI-0021: Syntax not supported in Snowflake.
Last updated
