An Apache Spark-based analytics platform optimized for Azure.
Hi ,
Thanks for reaching out to Microsoft Q&A.
This is a known limitation rather than a bug in your code. In serverless Spark environments (especially in Databricks Serverless), typed Dataset operations (map, filter with case classes, encoders) rely on JVM level serialization (Encoders, closures, bytecode generation). Serverless isolates execution and restricts parts of the JVM execution model, so these encoder-based transformations often fail at runtime, especially in structured streaming.
Why your tests behave this way:
- Test 1 (DataFrame) -> works because it uses Catalyst + Tungsten (no JVM object encoding)
- Test 2 (
as[Type]only) -> works because no transformation is executed yet - Test 3 & 4 (
filter,map) -> fail because they trigger encoder-based execution + closure serialization, which is not fully supported in serverless
The exception you see is typically a wrapper, hiding the real issue (unsupported encoder / serialization path).
Bottom line: Typed Datasets are not reliably supported in serverless streaming jobs. This is by design in current implementations.
What you should do?
Stick to DataFrame APIs (select, withColumn, where) for serverless
Avoid map, flatMap, strongly-typed filter
If you need typed logic, either:
switch to standard (non-serverless) clusters, or
rewrite logic using SQL/DataFrame expressions
Architectural takeaway (important for you as a data architect): serverless Spark is optimised for declarative transformations, not JVM-level functional transformations. Treat it closer to SQL engine + distributed optimizer, not a full Scala runtime
Please 'Upvote'(Thumbs-up) and 'Accept' as answer if the reply was helpful. This will be benefitting other community members who face the same issue.