From the article:
[{"classification":"spark","properties":{"maximizeResourceAllocation":"true"}},{"classification":"spark-defaults","properties":{"spark.network.timeout":"1500"}},{"classification":"hdfs-site","properties":{"dfs.replication":"2"}},{"classification":"livy-conf","properties":{"livy.server.session.timeout":"10h"}},{"classification":"emrfs-site","properties":{"fs.s3.maxConnections":"100"}}]
explanation:
"maximizeResourceAllocation":"true" -- Configures your executors to utilize the maximum resources possible on each node in a cluster. This EMR-specific option calculates the maximum compute and memory resources available for an executor on an instance in the core instance group. It then sets the corresponding spark-defaults settings based on this information. [reference]"livy.server.session.timeout":"10h" -- addresses errors from long-running Spark tasks in a Jupyter/EMR notebook that die after an hour of execution:
An error was encountered:
Invalid status code '400' from https://xxx.xx.x.xxx:18888/sessions/0/statements/20 with error payload: "requirement failed: Session isn't active." [reference]"fs.s3.maxConnections":"100" -- addresses the “Timeout waiting for connection from pool” error [reference]