What is the Polybase Architecture?

Flash Gordon 176 Reputation points
2021-03-31T21:47:52.113+00:00

Hi, I recently connected MS SQL Community 2019 with a local Oracle 19c database using Polybase. Polybase seems really insistent on using "Create External Table" to get access to the external database. So do I have to Create External Table for every foreign table I want to read? That seems really tedious and you have to deal with column data type issues between ms sql and the foreign database. So is the architecture of Polybase to individually connect every table in the foreign database one at a time? Prior approaches like Linked Server let you make one connection to Oracle and you could then just use queries to refer to all the table resources in the other database.

SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
12,756 questions
0 comments No comments
{count} votes

Accepted answer
  1. Erland Sommarskog 101.4K Reputation points MVP
    2021-03-31T22:11:45.617+00:00

    Yeah, if you are connecting to say an Oracle database, it's kind of tedious.

    But if the data source is Hadoop or Spark or something which does not really have tables, it makes more sense, and I think is what has been driving this design.

    Then they have also added this for relational sources so that you can use the same interface all over the place.

    And there are some advantages with external tables over linked servers. One is that you can opt to only expose some of the columns on the remote source. Another is security. A linked server is not a securable, so once you have set up a linked server, every one can access the linked server as such. If you have self-mapping, it's not so bad, because everyone needs to have access on the other server. But if you have set up a login mapping and set it up for everyone - that's really bad. But manage so that only certain users that can access it - that's a nightmare.

    With external tables, you can grant SELECT just like you can on any other table, so that is really neat.

    But what I have found when testing is that latency is too high. The prime area is big data, and if you are to inject a 200 GB log file from Hadoop, there is no issue if there is a startup time of two seconds for the query. But it is unacceptable for a point-lookup query.

    So for traditional database application, Polybase is not that hot. That is my conclusion.


0 additional answers

Sort by: Most helpful