Azure Sphere OS Persistent Curl Handle using TLS Enters Failure Mode That Keeps Resetting Its Own TCP Connection

James Higgins 336 Reputation points
2021-01-28T13:59:41.233+00:00

When using a persistent Curl handle (the handle is not opened and closed on every https request) with TLS, the Curl handle can enter a state where the Sphere OS will open a TCP connection with the remote server and then the Sphere OS will immediately reset the TCP connection without sending any data. Once in this failure mode this sequence repeats every time a https request is made using the Curl handle. When verbose curl output is enabled the following error is reported on each request attempt:

  • SSL: SSL_set_session failed: unknown error number

When the Curl handle first enters this failure mode the Sphere OS sends a TLS Rec Layer-1 Encrypted Alert to the server and the server responds with an Encrypted Alert.

If the Curl handle is closed and reopened the https request works immediately and correctly until the next event triggers the failure mode.

Based on observation it seems that this failure mode tends to occur when background scanning takes place, probably because there is a reasonable probability of packet loss. See:
https://learn.microsoft.com/en-us/answers/questions/234972/azure-sphere-os-80211-null-data-frames-and-wifi-ba.html?childToView=236318#comment-236318

  1. The TLS connection has a problem, perhaps due to a missed frame while the Sphere OS is background scanning, and sends one TLS Encrypted Alert to the server.
  2. After the TLS Encrypted Alert is sent any https requests on the curl handle will open a TCP connection with the server and then the Sphere OS immediately resets the TCP connection without sending any data to the server. The opening of the TCP connection and subsequent reset occurs on every subsequent request.
  3. Closing and reopening the Curl handle allows the connection to the server to be made immediately and correctly until the failure mode occurs again.

I have monitored Application memory usage and there are no memory leaks occurring in the Application code.
The failure mode is generally reproducible every 1000-200000 https requests.

There seems likely a bug in the Curl/WolfSSL integration that causes this persistent failure mode.

Although closing and reopening the Curl handle appears to be a workaround I am concerned that there may be other resource leaks in the Sphere OS layer that would eventually cause the workaround to fail to recover successfully.

I believe that packet loss triggers this problem, so it would probably occur in challenging RF environment even without packet loss caused by background scanning.

Azure Sphere
Azure Sphere
An Azure internet of things security solution including hardware, operating system, and cloud components.
166 questions
{count} votes

1 answer

Sort by: Most helpful
  1. António Sérgio Azevedo 7,666 Reputation points Microsoft Employee
    2021-03-11T23:34:10.52+00:00

    @James Higgins thank you so much for your great questions and follow-up offline. Sharing with broader community what we concluded.

    We are not aware of any issue with workaround described. We also use a similar model in the OS where we cleanup/re-init a curl handle. To our knowledge, there is no leak in libcurl or the OS with this pattern. One thing to be aware of, is that libcurl by default will try to reuse an established TCP connection as long as it has not been idle for more than 118 secs. See CURLOPT_MAXAGE_CONN . It would also be good to confirm that the appropriate libcurl cleanup and init APIs (eg. curl_easy_cleanup/curl_easy_init) are being used for successive connections.

    **Update May 13 2021**
    Azure Sphere Product Team is working on a fix so we don't need to cleanup/re-init a curl handle. We will prioritize this in the future releases and update this thread when the fix is done.