1.3 Overview

This protocol is used to establish media flow between a caller endpoint and a callee endpoint. In typical deployments, network address translators (NATs) or firewalls exist between the two endpoints that are intended to communicate. NATs and firewalls are deployed to provide private address space and to secure the private networks to which the endpoints belong. This type of deployment blocks incoming traffic. If the endpoint advertises its local interface address, the remote endpoint might not be able to reach it.

Advertising the address exposed by the NAT or firewall is not as straightforward because the endpoints need to determine the external routable mapping address created by the NAT, which is called a NAT-mapped address, for its local interface address. Moreover, NATs and firewalls are different in the way they create the NAT-mapped addresses. For more information about NAT types, see [IETFDRAFT-STUN-02] section 5. ICE provides a generic mechanism to assist media in traversing NATs and firewalls without requiring the endpoints to be aware of their network topologies. ICE assists media in traversing NATs and firewalls by gathering one or more transport addresses, which the two endpoints can potentially use to communicate, and then determining which transport address is best for both endpoints to use to establish a media session.

The following figure shows a typical deployment scenario with two endpoints that establish a media session.

ICE deployment scenario

Figure 1: ICE deployment scenario

To facilitate ICE, a communication channel using a signaling protocol, such as Session Initiation Protocol (SIP), through which the endpoints exchange messages is necessary. One example is Session Description Protocol (SDP), as described in [RFC3264]. ICE assumes that such a channel exists and is not intended to be used for NAT traversal for these signaling protocols. ICE is typically deployed in conjunction with Simple Traversal of UDP through NAT (STUN) and Traversal Using Relay NAT (TURN) servers. The endpoints can share the same STUN and TURN servers or use different servers. For more information, see [IETFDRAFT-STUN-02] and [MS-TURN].

The sequence diagram in the following figure outlines the various phases involved in establishing a session between two endpoints using this protocol. These phases are:

  1. The candidates gathering phase.

  2. The exchange of gathered transport addresses between the caller and callee endpoints.

  3. The connectivity checks phase.

  4. The exchange of candidates selected by the connectivity checks phase.

ICE sequence diagram

Figure 2: ICE sequence diagram

During the candidates gathering phase, the caller attempts to establish a media session and gathers transport addresses that can potentially be used to communicate with its peer. These potential transport addresses include:

  • Transport addresses obtained by binding to attached network interfaces. These include both physical interfaces and virtual interfaces such as virtual private network (VPN), which is a "local" transport address.

  • Transport addresses that are mappings on the public side of a NAT, which is also called a STUN-derived transport address.

  • Transport addresses allocated from a TURN server, which are also called TURN-derived transport addresses.

The gathered transport addresses are used to form candidates. A candidate is a set of transport addresses that can be potentially used for media flow. For example, in the case of real-time media flow using Real-Time Transport Protocol (RTP), each candidate consists of two transport addresses, one for RTP and another for Real-Time Transport Control Protocol (RTCP). Each gathered candidate is assigned a unique identifier, called the candidate identifier, and a priority value based on how it was obtained. This priority indicates the preference of an endpoint to use one candidate over another, if both candidates are reachable from the peer. Typically, candidates obtained from local network interfaces are given a higher priority than the candidates obtained from TURN servers. The endpoint also designates one of the gathered candidates as the default candidate, based on local policy. The gathered candidates are then sent to the peer in the offer. The offer is typically encoded into an SDP message and exchanged over a signaling protocol such as SIP.

The callee, after receiving the offer, follows the same procedure and gathers its candidates. The gathered candidates are encoded and sent to the caller in the answer. With the exchange of transport addresses complete, both the endpoints are now aware of their peer's transport addresses. The start of the connectivity checks phase is triggered at an endpoint when it is aware of its peer's candidates. Both endpoints pair up the local and remote candidates to form a list of candidate pairs that are ordered based on the priorities of the candidates. The candidate pair that consists of the default local candidate and default remote candidate is designated as the default candidate pair. The default candidate pair is moved to the top of the candidate pair Check List.

Both endpoints systematically perform connectivity checks starting from the top of the candidate pair list to determine the highest priority candidate pair that can be used by the endpoints for establishing a media session. Connectivity checks involve sending peer-to-peer STUN binding request messages and responses from the local transport addresses to the remote transport addresses of each candidate pair in the list. Once a STUN binding request message is received and it generates a successful STUN binding response message for a candidate pair, it is considered valid for sending. Once a successful STUN binding response message is received for a STUN binding request message sent for the candidate pair, it is considered valid for receiving. A connectivity check for a candidate pair is considered to be valid if a candidate pair is considered both valid for sending and valid for receiving. The endpoints can start streaming media from the local default candidate to the remote default candidate after the exchange of candidates is finished, even before the default candidate pair is validated by connectivity checks, but there is no guarantee that the media will reach the peer during this time.

The connectivity checks for the transport address pairs are spaced at regular intervals to avoid flooding the network. Depending on the topology, many of the possible candidate pairs might fail connectivity checks. For example, in the topology illustrated in the preceding figure titled "ICE deployment scenario," the transport addresses obtained from the local network interfaces cannot be used directly to establish a connection because both endpoints are behind NATs.

The endpoints can also discover new candidates during the connectivity check phase. This can happen in either of two scenarios:

  • The STUN binding request message is received from a new transport address.

  • The STUN binding response message was from a request received from a new mapped transport address.

These scenarios arise if new external mappings are created by the NATs residing between the endpoints. Connectivity checks are sent out on candidate pairs formed using these newly created candidates. These candidates can potentially be used for media flow as well. At the end of the connectivity checks phase, the caller sends a final offer with only the best local and remote candidate selected during the connectivity checks phase. The peer acknowledges the final offer with an answer and both endpoints start using the selected transport addresses for sending media.