Create a table on serverless database with specified DATA_SOURCE and FILE_FORMAT from PySpark.

Afroz Shaik 0 Reputation points
2023-09-26T16:45:06.38+00:00

Hello Community,

I wanted to create a table form PySpark with specifying LOCATION, DATA_SOURCE and FILE_FORMAT on a serverless database. But below syntax is not working

spark.sql("""Create External Table dbo.table1(col1 varchar(400),
	col2 varchar(400),
	col3 varchar(400),
	col4 datetime2(7)
	)
	WITH (
	LOCATION = 'folder/folder1/table/**',
	DATA_SOURCE = silver,
	FILE_FORMAT = SynapseDeltaFormat
	)""")


Here is the error:

ParseException: 
Syntax error at or near '(': extra input '('(line 6, pos 6)
== SQL ==
Create External Table dbo.table1(col1 varchar(400),
	col2 varchar(400),
	col3 varchar(400),
	col4 datetime2(7)
	)
	WITH (
------^^^
	LOCATION = 'folder/folder1/table/**',
	DATA_SOURCE = silver,
	FILE_FORMAT = SynapseDeltaFormat
	)

but the same script works if i run as a sql script.

Is there a way to create table like this without using pyodbc?

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,927 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Amira Bedhiafi 24,711 Reputation points
    2023-09-27T14:54:24.0533333+00:00

    I think here PySpark has an issue when it interprets the SQL command. First make sure that you are using the right connector to work with Azure Synapse Analytics from PySpark. You might need the JDBC connector for Azure Synapse Analytics.

    I tried to update your code like below (but I didn't execute it ):

    spark.sql("""
    CREATE EXTERNAL TABLE dbo.table1(
        col1 VARCHAR(400),
        col2 VARCHAR(400),
        col3 VARCHAR(400),
        col4 DATETIME2(7)
    )
    USING DELTA
    OPTIONS (
        'LOCATION' = 'folder/folder1/table/**',
        'DATA_SOURCE' = 'silver',
        'FILE_FORMAT' = 'SynapseDeltaFormat'
    )
    """)
    

    If you're still encountering issues, you can wrap your command inside a try-except block to capture and analyze the exception more effectively:

    try:
        # Your spark.sql command here
    except Exception as e:
        print(e)
    

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.