SQL Embedded code
Transform the SQL Code embedded in your Scala / Python Code
There are instances where the Python or Scala code has embedded SQL code that requires transformation. SMA parses embedded SQL code in the following extensions:
Python files (.py)
Scala files (.scala)
Jupyter Notebook (.ipynb)
Databricks (.python, .scala)
Databricks Notebooks (.dbc)
Embedded SQL Code transformation Samples
Supported Case
Using a spark.sql function on Python:
# Original in Spark
spark.sql("""MERGE INTO people_target pt
USING people_source ps
ON (pt.person_id1 = ps.person_id2)
WHEN NOT MATCHED BY SOURCE THEN DELETE""")
Unsupported Cases
When the SMA identifies an unsupported case, it generates an EWI in the output code.
Some unsupported scenarios are:
Using string variables with SQL Code:
query = "SELECT COUNT(COUNTRIES) FROM SALES"
dfSales = spark.sql(query)
Using basic concatenation to create SQL code:
base = "SELECT "
criteria_1 = " COUNT(*) "
criteria_2 = " * "
fromClause = " FROM COUNTRIES"
df1 = spark.sql(bas + criteria_1 + fromClause)
df2 = spark.sql(bas + criteria_2 + fromClause)
Using interpolations to create SQL code:
# Old Style interpolation
UStbl = "SALES_US"
salesUS = spark.sql("SELECT * FROM %s" % (UStbl))
# Using format function
COLtbl = "COL_SALES WHERE YEAR(saleDate) > 2023"
salesCol = spark.sql("SELECT * FROM {}".format(COLtbl))
# New Style
UKTbl = " UK_SALES_JUN_18"
salesUk = spark.sql(f"SELECT * FROM {UKTbl}")
Using functions that create SQL queries:
def ByMonth(month):
query = f"SELECT * LOGS WHERE MONTH(access_date) = {month}"
return spark.sql(query)
Unsupported Cases and EWI messages
In Scala code the error code for unsupported embedded SQL is SPRKSCL1173
/*Scala*/
class SparkSqlExample {
def main(spark: SparkSession) : Unit = {
/*EWI: SPRKSCL1173 => SQL embedded code cannot be processed.*/
spark.sql("CREATE VIEW IF EXISTS My View AS Select * From my Table WHERE date < current_date() ")
}
For Python code the error code for unsupported embedded SQL is SPRKPY1077
# Python Output
#EWI: SPRKPY1077 => SQL embedded code cannot be processed.
b = spark.sql("CREATE VIEW IF EXISTS My View AS Select * From my Table WHERE date < current_date() ")
Last updated