How to pass a custom query to CosmosDB's Synapse Link & Does Synapse Link support predicate pushdown?

Alibek Cholponbaev 20 Reputation points
2024-04-10T16:48:47.57+00:00

Hello!

I have been trying to connect Synapse Notebook to Cosmos DB via Synapse Link, and I have a couple of questions regarding optimizations:

If I want to process one day of data, I managed to find two ways of doing it:

  1. Pass a custom query with Transactional mode
        db = spark.read\
            .format("cosmos.oltp")\
            .option("spark.cosmos.accountEndpoint", COSMOS_DB_ENDPOINT )\
            .option("spark.cosmos.accountKey", COSMOS_DB_ACCOUNT_KEY)\
            .option("spark.cosmos.database", COSMOS_DB_DB_NAME)\
            .option("spark.cosmos.container", COSMOS_DB_CONTAINER_ID)\
            .option("spark.cosmos.read.customQuery", query)\
            .load()
    
  2. Load all data with Analytical mode, then filter it using spark
       db = spark.read\
           .format("cosmos.olap")\
           .option("spark.cosmos.accountEndpoint", COSMOS_DB_ENDPOINT )\
           .option("spark.cosmos.accountKey", COSMOS_DB_ACCOUNT_KEY)\
           .option("spark.cosmos.database", COSMOS_DB_DB_NAME)\
           .option("spark.cosmos.container", COSMOS_DB_CONTAINER_ID)\
           .load()\  
           .where(predicate)
    

Question 1: Is it possible to pass a custom query using Analytical mode?

Question 2: Does Spark and Synapse Link support predicate pushdown, which should lead to the same effect when I need to materialize the dataframe?

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,395 questions
Azure Cosmos DB
Azure Cosmos DB
An Azure NoSQL database service for app development.
1,448 questions
{count} votes

Accepted answer
  1. phemanth 5,840 Reputation points Microsoft Vendor
    2024-04-10T17:25:59.34+00:00

    @Alibek Cholponbaev Thanks for reaching out to Microsoft Q&A.

    I'll address your questions and provide insights for optimization:

    Question 1: Custom Queries in Analytical Mode:

    • No, custom queries aren't directly supported with Analytical mode. It's optimized for aggregation and exploration, not tailored filtering.
    • For specific filtering needs, use Transactional mode with custom queries.

    Question 2: Predicate Pushdown for Materialization:

    Yes, Spark and Synapse Link support predicate pushdown for materializing data frames. Queries are effectively filtered at the Cosmos DB analytical store for optimized data retrieval.

    • Here's how to apply predicate pushdown:
          db = spark.read \
              .format("cosmos.olap") \
              .option("spark.cosmos.accountEndpoint", COSMOS_DB_ENDPOINT) \
              .option("spark.cosmos.accountKey", COSMOS_DB_ACCOUNT_KEY) \
              .option("spark.cosmos.database", COSMOS_DB_DB_NAME) \
              .option("spark.cosmos.container", COSMOS_DB_CONTAINER_ID) \
              .load() \
              .where(predicate) \
              .write \
              .format("parquet") \
              .save("path/to/materialized/data")
        
        
      

    Recommendations for Optimization:

    • Choose the appropriate mode: Transactional mode for granular control with custom queries. Analytical mode for aggregations and exploration over large datasets.
    • Leverage predicate pushdown to minimize data transfer and processing overhead.
    • Consider data partitioning in Cosmos DB for targeted queries and improved performance.
    • Explore Spark optimizations: Caching frequently accessed data. Using efficient data structures and algorithms. Tuning Spark configuration for your workload.
    • Monitor performance: Track query execution times and resource usage. Identify bottlenecks and make necessary adjustments.
    • Keep Cosmos DB RU/s (Request Units per second) in mind, as analytical queries can consume them.
    • Leverage Synapse Link's change feed for incremental updates to materialized data.
    • Stay updated on Azure Synapse Analytics and Cosmos DB for new features and optimizations.

    Hope this helps. Do let us know if you any further queries.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful