Working with ASTs
A brief descriptiong of ASTs and how to use them in your additional translation rules
Different SQL dialects ==> Different Parsers
The SnowConvert parser is actually a family of parsers, in which the ANSI parser is located at the top of the hierarchy. Following the same logic as in the SQL dialects (Teradata, Oracle, SqlServer, and others), SnowConvert has an ANSI parser class that is inherited by every language-specific parser class, in order to share all the common syntax among the multiple SQL dialects.
As mentioned before, the parser recognizes the input source code and creates an intermediate representation for it in the form of ASTs (Abstract Syntax Trees). Every parser of SnowConvert creates a specific type of ASTs for the respective SQL dialect it represents. Every kind of ASTs follows a naming prefix in order to easily identify them. For example, each node that is created by the ANSI parser starts with the "Sql" prefix, and every node that is created by the DB2 parser starts with the "Db2" prefix.
So, when the IBM DB2 parser finds the code
it will create the following AST
Notice there is a mix of ANSI and DB2 specific nodes. Each one of the nodes above is an instance of a class that has multiple fields and properties which can be explored in Visual Studio using SnowConvert Extensibility API.
AST Viewer Tool
When writing additional translation rules it is important to know the name of the AST that represents a given syntax. One way to do this is by using an internal Windows-only tool called AST Viewer, which can be downloaded at the following link
To use it, just select the source code language at the top, then enter your code and hit the "Parse" button. The corresponding AST will be shown at the bottom of the application.
As shown in the image, the FOO(1)
part of the syntax is represented by an SqlFunctionExpr
as has been mentioned in previous sections.
Things to know about ASTs
Bottom-Up Traverse of the Tree
Once the parser has built the complete AST for a file, SnowConvert begins traversing the three by executing translation rules for each one of its nodes. The order in which the process takes place is bottom-up, meaning the children of a node are executed before their parent. For example, let's consider the following grammar:
In this case, before executing the translation rule for the main node that represents a CREATE TABLE
, the tableOptions
will be the first elements to be processed by a translation rule. The columns would be the next ones, and since each column can have three children, the order in which they are executed is the following:
colOptions
if presentcolName
colType
So, before a whole column is considered translated all its children must have been traversed by a translation rule; and the same applies to each node in the AST.
ASTs are immutable
Once an AST is created by a SnowConvert parser it cannot be modified. This is a design choice. ASTs are the internal representation of the input source code, and since this does not change throughout the application's life cycle nor should it change its representation. When SnowConvert needs to generate an output code different than the input one, new ASTs should be created to represent it. Translation rules are exactly the time and place where this happens. It is in the translation rules that new ASTs representing the output source code are created.
Creating new ASTs
Since ASTs represent source code, SnowConvert must first generate ASTs for the Snowflake output code that will be printed to disk. In order to do this in the translation rules, emitting instances are used. Emitting classes are singleton classes used to generate specific types of ASTs; for each SQL dialect, there is a specific emitting class. For example, the SqlEmit
class produces ASTs with the Sql
prefix representing the ANSI syntax and the SfEmit
class, on the other hand, produces ASTs with the Sf
prefix representing the Snowflake-specific syntax. Sometimes, the Snowflake output code that needs to be printed to disk can be fully represented via Sql ASTs, because it is ANSI compliant, but this is not always the case.
In order to facilitate the use of Emitting classes in additional translation rules, the SnowConvert Extensibility API exposes the following two properties in the BaseExtensibleTranslationRule
class definition.
That means that in every additional translation rule class, these instances can be used in the replace method to generate new ASTs, if needed.
For a better understanding of the use of Emitter classes, let's say we are creating a translation rule for function calls, represented by the SqlFuntionExpr
AST, which will generate a new function call as the output AST, with a new name and using some of the arguments from the input function call. We can do all that with the following code in the Replace
method.
Notice how there are 5 uses of the SqlEmitter
instance.
Last updated