Photon runtime
Photon is the native vectorized query engine on Azure Databricks, written to be directly compatible with Apache Spark APIs so it works with your existing code. It is developed in C++ to take advantage of modern hardware, and uses the latest techniques in vectorized query processing to capitalize on data- and instruction-level parallelism in CPUs, enhancing performance on real-world data and applications-—all natively on your data lake. Photon is part of a high-performance runtime that runs your existing SQL and DataFrame API calls faster and reduces your total cost per workload. Photon is used by default in Databricks SQL warehouses.
Azure Databricks clusters
Photon is available for clusters running Databricks Runtime 9.1 LTS and above.
To enable Photon acceleration, select the Use Photon Acceleration checkbox when you create the cluster. If you create the cluster using the clusters API, set runtime_engine
to PHOTON
.
Photon supports a number of instance types on the driver and worker nodes. Photon instance types consume DBUs at a different rate than the same instance type running the non-Photon runtime. For more information about Photon instances and DBU consumption, see the Azure Databricks pricing page.
Photon advantages
- Supports SQL and equivalent DataFrame operations against Delta and Parquet tables.
- Accelerates queries that process a significant amount of data (100GB+) and include aggregations and joins.
- Faster performance when data is accessed repeatedly from the disk cache.
- More robust scan performance on tables with many columns and many small files.
- Faster Delta and Parquet writing using
UPDATE
,DELETE
,MERGE INTO
,INSERT
, andCREATE TABLE AS SELECT
, especially for wide tables (hundreds to thousands of columns). - Replaces sort-merge joins with hash-joins.
Photon coverage
Operators
- Scan, Filter, Project
- Hash Aggregate/Join/Shuffle
- Nested-Loop Join
- Null-Aware Anti Join
- Union, Expand, ScalarSubquery
- Delta/Parquet Write Sink
- Sort
- Window Function
Expressions
- Comparison / Logic
- Arithmetic / Math (most)
- Conditional (IF, CASE, etc.)
- String (common ones)
- Casts
- Aggregates(most common ones)
- Date/Timestamp
Data types
- Byte/Short/Int/Long
- Boolean
- String/Binary
- Decimal
- Float/Double
- Date/Timestamp
- Struct
- Array
- Map
The following table lists supported Azure Databricks expressions and the minimum Databricks Runtime release version that supports it.
Name |
---|
Abs |
Acos |
Add |
AddMonths |
AesDecrypt |
AesEncrypt |
And |
ArrayContains |
ArrayDistinct |
ArrayExcept |
ArrayExists |
ArrayFilter |
ArrayForAll |
ArrayIntersect |
ArrayJoin |
ArraySize |
ArrayTransform |
ArrayUnion |
Atan |
Atan2 |
Average |
Base64 |
Bin |
BitAndAgg |
BitLength |
BitOrAgg |
BitwiseAnd |
BitwiseNot |
BitwiseOr |
BitwiseReverse |
BitwiseXor |
BitXorAgg |
BoundaryAsGeojson |
BoundaryAsWkb |
BoundaryAsWkt |
Cast |
Cbrt |
CeilExpressionBuilder |
CenterAsGeojson |
CenterAsWkb |
CenterAsWkt |
Chr |
Coalesce |
CollectList |
Concat |
ConcatWs |
Conv |
Cos |
Count |
CreateArray |
CreateMap |
CreateNamedStruct |
CreateStruct |
CurrentCatalog |
CurrentDatabase |
CurrentDate |
CurrentTimestamp |
CurrentTimeZone |
CurrentUser |
DateAdd |
DateDiff |
DateFormatClass |
DateFromUnixDate |
DateSub |
DayOfMonth |
DayOfWeek |
DayOfYear |
Decode |
DenseRank |
Divide |
ElementAt |
EqualNullSafe |
EqualTo |
Exp |
Explode |
Extract |
First |
FloorExpressionBuilder |
FromUnixTime |
FromUTCTimestamp* |
Get |
GetJsonObject |
GreaterThan |
GreaterThanOrEqual |
Greatest |
GridDistance |
H3ToString |
Hex |
Hour |
If |
In |
InitCap |
InputFileBlockLength |
InputFileBlockStart |
InputFileName |
InSet |
IntegralDivide |
IsChildOf |
IsNaN |
IsNotNull |
IsNull |
IsPentagon |
IsValid |
JsonToStructs |
Lag |
Last |
LastDay |
Lead |
Least |
Length |
LengthOfJsonArray |
LessThan |
Levenshtein |
Like |
Log |
Log2 |
LongLatAsH3 |
LongLatAsH3String |
Lower |
LPadExpressionBuilder |
MakeDate |
MakeTimestamp |
Max |
MaxChild |
Md5 |
MicrosToTimestamp |
MillisToTimestamp |
Min |
MinChild |
Minute |
MonotonicallyIncreasingID |
Month |
MonthsBetween |
Multiply |
Murmur3Hash |
NaNvl |
NextDay |
Not |
Now |
NthValue |
NTile |
NullIf |
Nvl |
Nvl2 |
OctetLength |
ParseToDate |
ParseToTimestamp |
Percentile |
PercentRank |
Pi |
Pmod |
PosExplode |
Pow |
Quarter |
Rand |
Rank |
RegExpExtract |
RegExpExtractAll |
RegExpReplace |
RegrAvgX |
RegrAvgY |
Remainder |
Resolution |
Reverse |
Reverse |
RLike |
Round |
RowNumber |
RPadExpressionBuilder |
Second |
SecondsToTimestamp |
Sha1 |
Sha2 |
ShiftLeft |
ShiftRight |
ShiftRightUnsigned |
Sin |
Size |
Slice |
SoundEx |
SparkVersion |
Sqrt |
StddevPop |
StddevSamp |
StringInstr |
StringLocate |
StringRepeat |
StringSpace |
StringSplit |
StringToH3 |
StringTranslate |
StringTrim |
StringTrimBoth |
StringTrimLeft |
StringTrimRight |
StructsToJson |
Substring |
Subtract |
Sum |
Tan |
ToChildren |
ToParent |
ToRadians |
ToUnixTimestamp |
ToUTCTimestamp |
TruncDate |
TruncTimestamp |
TryElementAt |
TryValidate |
UnaryMinus |
UnBase64 |
Unhex |
UnixDate |
UnixMicros |
UnixMillis |
UnixSeconds |
UnixTimestamp |
Upper |
Uuid |
Validate |
VarianceSamp |
WeekDay |
WeekOfYear |
XxHash64 |
Year |
*from_utc_timestamp is not fully supported by Photon. See from_utc_timestamp for more information. |
Limitations
- Structured Streaming: Photon currently supports stateless streaming with Delta, Parquet, and CSV. Kafka and Kinesis support is in Public Preview
- Does not support UDFs.
- Does not support RDD APIs.
- Not expected to improve short-running queries (<2 seconds), for example, queries against small amounts of data.
Features not supported by Photon run the same way they would with Databricks Runtime; there is no performance advantage for those features.
Feedback
Submit and view feedback for