MongoDB database copy to Azure Databricks

Question

MongoDB database copy to Azure Databricks

Rajeev Singh 21

Hi,

I have setup MongoDB to Azure Databricks data tranfer. I am able to retrive all table name from database but unable to get data from those tables.

I am receiving error message as "Data used in creating the Delta table doesn't have any columns.". Please help to resolve this issue.

Regards,

Rajeev

Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator

2023-05-04T20:09:44.4366667+00:00
Hello Rajeev Singh,

Welcome to the MS Q&A platform.

As per the error message, it could be due to a mismatch between the schema of the data and the schema of the Delta table.

Can you please check the below?

Check if the schema of the data matches the schema of the Delta table. If not, update the schema of the data to match the schema of the Delta table.

Check if the data is in the correct format. If not, convert the data to the correct format before creating the Delta table.

Check if the Delta table is created with the correct schema. If not, update the schema of the Delta table to match the schema of the data.

If you could provide the code you are using to retrieve the data from the MongoDB tables in Azure Databricks would be helpful to troubleshoot the issue.

Also, please provide the full error message.

Rajeev Singh 21

Hi Bhargava,

I will share few queries so that you can have brief about the ideas.

from pyspark.sql import SparkSession
import pandas as pd

#print(collectionNames)

for table_name in collectionNames:
    # Create a SparkSession
    spark = SparkSession.builder.master('local').appName("Get all columns").config('org.mongodb.spark:mongo-spark-connector_2.12:3.0.1').getOrCreate()

    df = spark.read.format("mongo").option("database", "xxxxxxx").option("spark.mongodb.input.uri", jdbc_hostname).option("collection","yyyyyyyy").load()
    df.show()
    print(f"Data count for the table {table_name} from {prev_time} to {current_time} is: {df.count()}")

    tgt_db_name = tgt_db_name
    tgt_table_name = table_name

    if action == 'upsert':
        write_delta_v2(type = 'data_copy', df=df, mode=action, dbname= tgt_db_name, targetTable=tgt_table_name, sourceTable=tgt_table_name, keyCol='id')
    else:
        write_delta_v2(type = 'data_copy', df=df, mode=action, dbname= tgt_db_name, targetTable=tgt_table_name)

Error: AnalysisException: Data used in creating the Delta table doesn't have any columns.

1 answer

Your answer

Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator

2023-05-04T20:09:44.4366667+00:00

Hello Rajeev Singh,

Welcome to the MS Q&A platform.

As per the error message, it could be due to a mismatch between the schema of the data and the schema of the Delta table.

Can you please check the below?

Check if the schema of the data matches the schema of the Delta table. If not, update the schema of the data to match the schema of the Delta table.

Check if the data is in the correct format. If not, convert the data to the correct format before creating the Delta table.

Check if the Delta table is created with the correct schema. If not, update the schema of the Delta table to match the schema of the data.

If you could provide the code you are using to retrieve the data from the MongoDB tables in Azure Databricks would be helpful to troubleshoot the issue.

Also, please provide the full error message.
Rajeev Singh 21 Reputation points

2023-05-05T06:46:16.68+00:00

Hi Bhargava,

I will share few queries so that you can have brief about the ideas.

from pyspark.sql import SparkSession import pandas as pd #print(collectionNames) for table_name in collectionNames: # Create a SparkSession spark = SparkSession.builder.master('local').appName("Get all columns").config('org.mongodb.spark:mongo-spark-connector_2.12:3.0.1').getOrCreate() df = spark.read.format("mongo").option("database", "xxxxxxx").option("spark.mongodb.input.uri", jdbc_hostname).option("collection","yyyyyyyy").load() df.show() print(f"Data count for the table {table_name} from {prev_time} to {current_time} is: {df.count()}") tgt_db_name = tgt_db_name tgt_table_name = table_name if action == 'upsert': write_delta_v2(type = 'data_copy', df=df, mode=action, dbname= tgt_db_name, targetTable=tgt_table_name, sourceTable=tgt_table_name, keyCol='id') else: write_delta_v2(type = 'data_copy', df=df, mode=action, dbname= tgt_db_name, targetTable=tgt_table_name) Error: AnalysisException: Data used in creating the Delta table doesn't have any columns.

Answer 1

Hello Rajeev Singh,

Sorry for the delayed response.

It seems like the DataFrame df is empty or has no columns, causing the error when creating a Delta table. To resolve this issue, you can add a condition to check if the DataFrame has columns before proceeding with the write_delta_v2 function

The below code will skip the tables with no columns.
please try and let me know.

from pyspark.sql import SparkSession
import pandas as pd

for table_name in collectionNames:
    # Create a SparkSession
    spark = SparkSession.builder.master('local').appName("Get all columns").config('org.mongodb.spark:mongo-spark-connector_2.12:3.0.1').getOrCreate()

    df = spark.read.format("mongo").option("database", "xxxxxxx").option("spark.mongodb.input.uri", jdbc_hostname).option("collection","yyyyyyyy").load()
    df.show()
    print(f"Data count for the table {table_name} from {prev_time} to {current_time} is: {df.count()}")

    tgt_db_name = tgt_db_name
    tgt_table_name = table_name

    # Check if the DataFrame has columns
    if len(df.columns) > 0:
        if action == 'upsert':
            write_delta_v2(type = 'data_copy', df=df, mode=action, dbname= tgt_db_name, targetTable=tgt_table_name, sourceTable=tgt_table_name, keyCol='id')
        else:
            write_delta_v2(type = 'data_copy', df=df, mode=action, dbname= tgt_db_name, targetTable=tgt_table_name)
    else:
        print(f"Skipping table {table_name} as it has no columns.")

Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator

2023-05-15T22:12:32.13+00:00

Hello Rajeev Singh,

I am checking to see if you have any further questions here.

Share via

MongoDB database copy to Azure Databricks

1 answer

Your answer