1.3 Overview

This protocol is used to establish media flow between a callee endpoint and a caller endpoint. In typical deployments, a network address translation (NAT) device or firewall might exist between the two endpoints that are intended to communicate. NATs and firewalls are deployed to provide private address space and to secure the private networks to which the endpoints belong. This type of deployment blocks incoming traffic. If the endpoint advertises its local interface address, the remote endpoint might not be able to reach it.

The address exposed by a NAT or firewall is not exactly what the endpoints need to determine the external routable mapping address created by the NAT, or the NAT-mapped address, for its local interface address. Moreover, NATs and firewalls exhibit differing behavior in the way they create the NAT-mapped addresses. ICE provides a generic mechanism to assist media in traversing NATs and firewalls without requiring the endpoints to be aware of their network topologies. ICE assists media in traversing NATs and firewalls by gathering one or more transport addresses, which the two endpoints can potentially use to communicate, and then determining which transport address is best for both endpoints to use to establish a media session.

The following figure shows a typical deployment scenario with two endpoints that establish a media session.

ICE deployment scenario

Figure 1: ICE deployment scenario

To facilitate ICE, a communication channel through which the endpoints can exchange messages, such as Session Description Protocol (SDP), using a signaling protocol, such as Session Initiation Protocol (SIP), is necessary. ICE assumes that such a channel exists and is not intended to be used for NAT traversal for these signaling protocols. ICE is often deployed in conjunction with Simple Traversal of UDP through NAT (STUN) and Traversal Using Relay NAT (TURN) servers. The endpoints can share the same STUN and TURN servers or use different servers.

The sequence diagram in the following figure outlines the various phases involved in establishing a session between two endpoints using this protocol. These phases are:

  1. Candidates gathering and the exchange of gathered transport addresses between the caller and callee endpoints.

  2. Connectivity checks.

  3. The exchange of candidates selected by the connectivity checks.

ICE sequence diagram

Figure 2: ICE sequence diagram

During the candidates gathering phase, the caller attempts to establish a media session and gathers transport addresses that can potentially be used to communicate with its peer. These potential transport addresses include:

  • Transport addresses obtained by binding to attached network interfaces. These include both physical interfaces and virtual interfaces such as virtual private network (VPN), which is a Host Candidate.

  • Transport addresses that are mappings on the public side of a NAT, which is a Server Reflexive Candidate.

  • Transport addresses allocated from a TURN server, which is a Relayed Candidate.

The gathered transport addresses are used to form candidates. A candidate is a set of transport addresses that can potentially be used for media flow. For example, in the case of real-time media flow using Real-Time Transport Protocol (RTP), each candidate consists of two components, one for RTP and another for Real-Time Transport Control Protocol (RTCP).

Each gathered candidate is assigned a foundation and a priority value based on how they were obtained. This priority indicates the preference of an endpoint to use one candidate over another if both candidates are reachable from the peer. The foundation is a string associated with each candidate. Two candidates have the same foundation if they are of the same type. Types of candidates are Host Candidates, Server Reflexive Candidates, Relayed Candidates, or peer-derived candidates. In addition to matching types, to have the same foundation the two candidates have the same base and are derived from the same STUN or TURN server. Candidates obtained from local network interfaces are often given a higher priority than the candidates obtained from TURN servers. The endpoint also designates one of the gathered candidates as the default candidate based on local policy.

The gathered candidates are then sent to the peer in the offer. The offer can be encoded into an SDP offer and exchanged over a signaling protocol such as SIP. The caller endpoint serves as the controlling agent and is responsible for selecting the final candidates for media flow.

The callee, after receiving the offer, follows the same procedure to gather its candidates. The gathered candidates are encoded and sent to the caller in the answer. With the exchange of candidates complete, both the endpoints are now aware of their peer's candidates.

The start of the connectivity checks phase is triggered at an endpoint when it is aware of its peer's candidates. Both endpoints pair up the local candidates and remote candidates to form a Check List of candidate pairs that are ordered based on the priorities of the candidate pairs. Each candidate pair consists of constituent component pairs and has the same foundation as the candidate pair. In the case of RTP, each candidate pair has an RTP component pair and an RTCP component pair. The candidate pair priorities are computed using the priorities of the local candidate and the remote candidate so that both endpoints have the same ordering of candidate pairs. Each candidate pair has an associated foundation that is formed as a concatenation of the foundations of the local candidate and the remote candidate that constitute the candidate pair. Candidate pairs with the same foundations have similar network properties, and this is leveraged to reduce the number of connectivity checks. If connectivity checks for a component pair fail, it is very likely that connectivity checks for other component pairs with the same foundation will also fail. Each endpoint goes through the candidate pair Check List and sets the state of the higher component pair, or the RTCP component pair, to a frozen state. If more than one candidate pair has the same foundation, all candidate pairs except for the highest priority candidate pair with the same foundation are set to a frozen state. When the connectivity check for a component pair succeeds, all component pairs with the same foundations are unfrozen. The callee serves as the controlled agent and waits for the controlling agent to select the final candidate pair for media flow.

Both endpoints systematically perform connectivity checks, starting from the top of the candidate pair Check List to determine the highest priority candidate pair that can be used by the endpoints for establishing a media session. Connectivity checks involve sending peer-to-peer STUN binding request messages and responses from the local transport addresses to the remote transport addresses of each candidate pair in the list. Once a STUN binding request message is received, and it generates a successful STUN binding response message for a component pair, the component pair is considered to be in successful state.

The endpoints can begin streaming media from the local default candidate to the remote default candidate after the exchange of candidates is finished, even before the default candidate pair is validated by connectivity checks, but there is no guarantee that the media will reach the peer during this time.

The connectivity checks for the candidate pairs are spaced at regular intervals to avoid flooding the network. Depending on the topology, many of the candidate pairs might fail connectivity checks. For example, in the topology illustrated in the preceding figure titled "ICE deployment scenario", the transport addresses obtained from the local network interfaces cannot be used directly to establish a connection, because both endpoints are behind NATs. These connectivity checks, sent periodically to validate the candidate pairs, are called Ordinary Checks. In addition, to optimize the connectivity checks, an endpoint, on receiving a STUN binding request for a candidate pair, immediately schedules a connectivity check for that candidate pair. These connectivity checks are called triggered checks.

The endpoints can also discover new candidates during the connectivity check phase. This can happen in either of two scenarios:

  • The STUN binding request message is received from a transport address that does not match any of the remote candidates.

  • The STUN binding response message has a mapped address that does not match the transport address of any of the local candidates.

These scenarios arise if new external mappings are created by the NATs residing between the endpoints. Connectivity checks are sent out on candidate pairs formed using these newly created candidates. These candidates can potentially be used for media flow as well.

The controlling agent concludes the connectivity checks by nominating a valid candidate pair found by the connectivity checks for media flow. The controlling agent can follow either Regular Nomination or Aggressive Nomination to nominate the validated candidate pairs. If the controlling agent is following Regular Nomination, it allows connectivity checks to continue until at least one valid candidate pair has been found. At the end of the connectivity checks, the controlling agent picks the best valid candidate pair from the Valid List and sends another round of STUN binding requests for this candidate pair with a flag set to notify the peer that this candidate pair has been nominated for media flow. In the case of Aggressive Nomination, the controlling agent sets this flag on every STUN binding request. With Aggressive Nomination, the ICE processing completes when connectivity checks succeed for the first candidate pair, and the controlling agent does not have to send a second STUN binding request to nominate the candidate pair. Aggressive Nomination is faster than Regular Nomination but does not always select the optimal path that has the lowest latency. At the end of the connectivity checks phase, the controlling agent sends a final offer with only the best local and remote candidate selected during the connectivity checks phase. The peer acknowledges the final offer with an answer, and both endpoints begin using the selected candidate pair for media flow.