azure databricks table join returns no rows but query works for a single where predicate

Vivek Singh 1 Reputation point
2021-05-12T03:52:05.877+00:00

I have two azure databricks tables,table one having 10 million rows and tabletwo having five thousand rows. there is common key - itemcode
The big table was created from parquet files and the other from Excel file using azure databricks GUI. Both table have columns as STRINGS.

when i run the query as -->

%sql
select * from tableone where itemcode = <some value> without quotes --- it returns the rows expected

but when i use a join as shown below it returns no rows, strange

select * from tableone join tabletwo on tableone.itemcode = tabletwo.itemcode

2) tried dataframe method same no results

sqlDF1 = spark.sql("SELECT * FROM tableone")

display(sqlDF1)

sqlDF1.printSchema()

sqlDF2 = spark.sql("SELECT * FROM tabletwo")

display(sqlDF2)

sqlDF2.printSchema()

df = sqlDF1.join(sqlDF2, on=['ItemBarcode'], how='inner')
display(df)

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. PRADEEPCHEEKATLA 91,656 Reputation points Moderator
    2021-05-12T07:31:14.483+00:00

    Hello @Vivek Singh ,

    Welcome to the Microsoft Q&A platform.

    As per my repro, above two methods works as excepted. When you say it doesn't provide any result - could you please provide more detailed screenshot along with the sample data which you are trying and details about the databricks runtime version.

    I had tested this with the sample data on databricks runtime version - 8.2 (includes Apache Spark 3.1.1, Scala 2.12)

    Method1: To explain you in detail, I had created two tables named tableone as employee , tabletwo as department , and itemcode as deptno with the following data as shown below:

    95892-image.png

    Result of the method1 are as shown below:

    Note: When you the join statement it will return deptno from both the tables.

    95946-image.png

    In order to avoid the deptno from both the tables, you need to use Inner Join as shown below:

    -- Use employee and department tables to demonstrate inner join.  
    SELECT id, name, employee.deptno, deptname  
        FROM employee INNER JOIN department ON employee.deptno = department.deptno;  
    

    95926-image.png

    For more details, refer Azure Databricks - Joins examples.

    Method2: Using dataframes, to explain you in detail, I had created two tables named tableone as employee , tabletwo as department , and itemcode as deptno with the following data as shown below:

    95864-image.png

    Result of the method2 are as shown below:

    95857-image.png

    Hope this helps. Do let us know if you any further queries.

    ------------

    Please don’t forget to Accept Answer and Up-Vote wherever the information provided helps you, this can be beneficial to other community members.

    0 comments No comments

  2. Vivek Singh 1 Reputation point
    2021-05-12T19:53:10.463+00:00

    Hi Pradeep,

    Thanks for the reply and uploading the pics. In fact I did follow the same steps as mentioned by you for both SQL & dataframes.

    When joined it returns "no results" as shown below

    96105-image.png

    But when I try the same value against the table i get results as shown

    96093-image.png


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.