When using a persistent Curl handle (the handle is not opened and closed on every https request) with TLS, the Curl handle can enter a state where the Sphere OS will open a TCP connection with the remote server and then the Sphere OS will immediately reset the TCP connection without sending any data. Once in this failure mode this sequence repeats every time a https request is made using the Curl handle. When verbose curl output is enabled the following error is reported on each request attempt:
- SSL: SSL_set_session failed: unknown error number
When the Curl handle first enters this failure mode the Sphere OS sends a TLS Rec Layer-1 Encrypted Alert to the server and the server responds with an Encrypted Alert.
If the Curl handle is closed and reopened the https request works immediately and correctly until the next event triggers the failure mode.
Based on observation it seems that this failure mode tends to occur when background scanning takes place, probably because there is a reasonable probability of packet loss. See:
https://learn.microsoft.com/en-us/answers/questions/234972/azure-sphere-os-80211-null-data-frames-and-wifi-ba.html?childToView=236318#comment-236318
- The TLS connection has a problem, perhaps due to a missed frame while the Sphere OS is background scanning, and sends one TLS Encrypted Alert to the server.
- After the TLS Encrypted Alert is sent any https requests on the curl handle will open a TCP connection with the server and then the Sphere OS immediately resets the TCP connection without sending any data to the server. The opening of the TCP connection and subsequent reset occurs on every subsequent request.
- Closing and reopening the Curl handle allows the connection to the server to be made immediately and correctly until the failure mode occurs again.
I have monitored Application memory usage and there are no memory leaks occurring in the Application code.
The failure mode is generally reproducible every 1000-200000 https requests.
There seems likely a bug in the Curl/WolfSSL integration that causes this persistent failure mode.
Although closing and reopening the Curl handle appears to be a workaround I am concerned that there may be other resource leaks in the Sphere OS layer that would eventually cause the workaround to fail to recover successfully.
I believe that packet loss triggers this problem, so it would probably occur in challenging RF environment even without packet loss caused by background scanning.