GraphFrames

GraphFrames is a package for Apache Spark that provides DataFrame-based graphs. It provides high-level APIs in Java, Python, and Scala. It aims to provide both the functionality of GraphX and extended functionality taking advantage of Spark DataFrames. This extended functionality includes motif finding, DataFrame-based serialization, and highly expressive graph queries.

This article includes three example notebooks: a tutorial notebook available in Python and in Scala, and a Python user guide. For additional examples using GraphFrames with Scala, see GraphFrames user guide - Scala.

Databricks recommends using a cluster running Databricks Runtime for Machine Learning, as it includes an optimized installation of GraphFrames.

If you are not using a cluster running Databricks Runtime ML, download the JAR file from the GraphFrames library, load it to a volume, and install it onto your cluster.

GraphFrames tutorial

The following notebooks show you how to use GraphFrames to perform graph analysis.

Graph Analysis with GraphFrames (Python)

Get notebook

Graph Analysis with GraphFrames (Scala)

Get notebook

GraphFrames user guide (Python)

The following notebook includes Python code examples of how to use GraphFrames.

GraphFrames Python notebook

Get notebook