Working with ASTs

A brief descriptiong of ASTs and how to use them in your additional translation rules

Different SQL dialects ==> Different Parsers

The SnowConvert parser is actually a family of parsers, in which the ANSI parser is located at the top of the hierarchy. Following the same logic as in the SQL dialects (Teradata, Oracle, SqlServer, and others), SnowConvert has an ANSI parser class that is inherited by every language-specific parser class, in order to share all the common syntax among the multiple SQL dialects.

As mentioned before, the parser recognizes the input source code and creates an intermediate representation for it in the form of ASTs (Abstract Syntax Trees). Every parser of SnowConvert creates a specific type of ASTs for the respective SQL dialect it represents. Every kind of ASTs follows a naming prefix in order to easily identify them. For example, each node that is created by the ANSI parser starts with the "Sql" prefix, and every node that is created by the DB2 parser starts with the "Db2" prefix.

So, when the IBM DB2 parser finds the code

SELECT * FROM T1 ORDER BY INPUT SEQUENCE

it will create the following AST

SqlSelect
	SqlKeyword='SELECT'
	Db2SelectBody
		SqlSingleNameExpr
			SqlIdent='*'
		SqlFromClause
			SqlKeyword='FROM'
			Db2TableOrQueryFactor
				SqlSimpleName
					SqlIdent='T1'
		SqlOrderBy
			SqlKeyword='ORDER'
			SqlKeyword='BY'
			Db2InputSequence
				SqlKeyword='INPUT'
				SqlKeyword='SEQUENCE'

Notice there is a mix of ANSI and DB2 specific nodes. Each one of the nodes above is an instance of a class that has multiple fields and properties which can be explored in Visual Studio using SnowConvert Extensibility API.

AST Viewer Tool

When writing additional translation rules it is important to know the name of the AST that represents a given syntax. One way to do this is by using an internal Windows-only tool called AST Viewer, which can be downloaded at the following link

To use it, just select the source code language at the top, then enter your code and hit the "Parse" button. The corresponding AST will be shown at the bottom of the application.

As shown in the image, the FOO(1) part of the syntax is represented by an SqlFunctionExpr as has been mentioned in previous sections.

Things to know about ASTs

Bottom-Up Traverse of the Tree

Once the parser has built the complete AST for a file, SnowConvert begins traversing the three by executing translation rules for each one of its nodes. The order in which the process takes place is bottom-up, meaning the children of a node are executed before their parent. For example, let's consider the following grammar:

CREATE TABLE {tableName} ( 
   { {colType} {colName} [colOptions] } ... 
) [tableOptions]

In this case, before executing the translation rule for the main node that represents a CREATE TABLE, the tableOptions will be the first elements to be processed by a translation rule. The columns would be the next ones, and since each column can have three children, the order in which they are executed is the following:

  1. colOptions if present

  2. colName

  3. colType

So, before a whole column is considered translated all its children must have been traversed by a translation rule; and the same applies to each node in the AST.

ASTs are immutable

Once an AST is created by a SnowConvert parser it cannot be modified. This is a design choice. ASTs are the internal representation of the input source code, and since this does not change throughout the application's life cycle nor should it change its representation. When SnowConvert needs to generate an output code different than the input one, new ASTs should be created to represent it. Translation rules are exactly the time and place where this happens. It is in the translation rules that new ASTs representing the output source code are created.

Creating new ASTs

Since ASTs represent source code, SnowConvert must first generate ASTs for the Snowflake output code that will be printed to disk. In order to do this in the translation rules, emitting instances are used. Emitting classes are singleton classes used to generate specific types of ASTs; for each SQL dialect, there is a specific emitting class. For example, the SqlEmit class produces ASTs with the Sql prefix representing the ANSI syntax and the SfEmit class, on the other hand, produces ASTs with the Sf prefix representing the Snowflake-specific syntax. Sometimes, the Snowflake output code that needs to be printed to disk can be fully represented via Sql ASTs, because it is ANSI compliant, but this is not always the case.

In order to facilitate the use of Emitting classes in additional translation rules, the SnowConvert Extensibility API exposes the following two properties in the BaseExtensibleTranslationRule class definition.

/// <summary>
/// The SQL emit instance.
/// </summary>
public static SqlEmit SqlEmitter => SqlEmit.Instance;

/// <summary>
/// The Snowflake emit instance.
/// </summary>
public static SfEmit SfEmitter => SfEmit.Instance;

That means that in every additional translation rule class, these instances can be used in the replace method to generate new ASTs, if needed.

For a better understanding of the use of Emitter classes, let's say we are creating a translation rule for function calls, represented by the SqlFuntionExprAST, which will generate a new function call as the output AST, with a new name and using some of the arguments from the input function call. We can do all that with the following code in the Replace method.

protected override void Replace(SqlFunctionExpr node, out SqlFunctionExpr result)
{
    // the first argument of the original function call
    var firstArgument = node.ArgumentList.Arguments[0];
    // emits an AST representing an argument with a literal expression: 20
    var secondArgument = SqlEmitter.Argument(SqlEmitter.LiteralExpr(20));
    
    // emits an AST used to represent an argument list
    var newArguments = SqlEmitter.ArgumentList(
        new List<SqlArgumentBase>() { firstArgument, secondArgument });

    // emits a new function call with a new name and the newArguments
    result = SqlEmitter.FunctionExpr(
        SqlEmitter.SingleNameExpr("NewFunctionName"), 
        newArguments);
}

Notice how there are 5 uses of the SqlEmitter instance.

Last updated