https://stackoverflow.com/questions/68151153/cosmos-db-spatial-query-using-spark
https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/quickstart-spark
https://github.com/Azure/azure-cosmosdb-spark/wiki/Configuration-references
You can try using the cosmos.oltp format instead
outlets_cfg = {
"spark.cosmos.accountEndpoint" : cosmosEndpoint,
"spark.cosmos.accountKey" : cosmosMasterKey,
"spark.cosmos.database" : cosmosDatabaseName,
"spark.cosmos.container" : cosmosContainerName,
"spark.cosmos.read.customQuery" : "SELECT * FROM c WHERE ST_DISTANCE(c.location,{\"type\":\"Point\",\"coordinates\": [12.832489, 18.9553242]}) < 1000"
}
df = spark.read.format("cosmos.oltp").options(**outlets_cfg)\
.option("spark.cosmos.read.inferSchema.enabled", "true")\
.load()
Also, in the Microsoft Learn documentation
from pyspark.sql.functions import col
df = spark.read.format("cosmos.oltp").options(**cfg)\
.option("spark.cosmos.read.inferSchema.enabled", "true")\
.load()
df.filter(col("isAlive") == True)\
.show()
In the Azure Cosmos DB Spark Connector documentation, there is a configuration called query_custom that lets you override the default query when fetching data from Cosmos DB. However, it's not clear whether this configuration can be used with the cosmos.olap or cosmos.oltp format, or whether it can be used to execute custom queries like JOIN operations.