I can't use geopandas on Synapse SparkPool

lucas.pontes 11 Reputation points
2022-04-27T11:39:40.967+00:00

I need use some geospatial python packages on my Synapse SparkPool such as geopandas. I was able to install it as described in the documentation geospatial-processing-analytics using the yml file :

name: aoi-env  
channels:  
  - conda-forge  
  - defaults  
dependencies:  
  - gdal>=3.3.0  
  - pip>=20.1.1  
  - azure-storage-file-datalake  
  - libgdal  
  - shapely  
  - pyproj  
  - pip:  
    - rasterio  
    - geopandas  
    - apache-sedona  

However it doesn't work when trying to read data from Data Lake Gen2, it return the following error: 'No such file or directoryDriverError' and I'm pretty shure the path is correct.

I think it may be a dependency problem, so I tried to pass geopandas as a dependency but it doesn't work because it fails to install. After 30 min running the installation process it is cancelled

 name: aoi-env  
    channels:  
      - conda-forge  
      - defaults  
    dependencies:  
      - gdal>=3.3.0  
      - pip>=20.1.1  
      - azure-storage-file-datalake  
      - libgdal  
      - shapely  
      - pyproj  
       - geopandas  
      - pip:  
        - rasterio  
        - apache-sedona  
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
3,100 questions
{count} votes

2 answers

Sort by: Most helpful
  1. lucas.pontes 11 Reputation points
    2022-05-10T11:08:02.967+00:00

    Hi,

    I used a work around. Just open the file with pandas and converted it to geopandas. It's not ideal but works for now

    1 person found this answer helpful.

  2. Jimmy Dobbins 1 Reputation point Microsoft Employee
    2022-12-28T14:57:37.85+00:00

    I am getting the same error. I know I have geopandas installed and working as I can read in sample datasets and explore them using a webmap. I know the permissions are correct, since I can list and read all files that make up the shapefile I am trying to read (.shp, .prj, .dbf, .shx, .cpg). I still get the same "no such file or directory" error when I use a geopandas.read_file command. Please post a workaround if you have one. I like Sedona for processing, but geopandas for visualization. Thanks, this thread has been really helpful!