Office 365, latency and GEO-DNS or (The Unexpected Virtue of Cloud Resiliency Knowledge)
By: Caio Ribeiro César and Robson Elias da Silva
Latency can occur in different scenarios. Sending emails in Outlook, attaching files to OWA, bulk actions done by the administrator in the EAC and so on. Not always the root cause is what we are going to describe below, so always start your troubleshooting by understanding your scenario (different users, networks, computers).
The villain is well known, and the solution is simple. Before we jump to the solution, let’s discuss what customers, partners and sysadmins need to know about Cloud, DNS, roundtrip and latency.
First, the cloud.
In Exchange Online, the understanding of the tenant structure can be done through the “Get-OrganizationConfig” cmdlet.
According to the table below, we can match the location for each tenant:
- Tenant caiocbr15.onmicrosoft.com was created in the North America structure (NAMPR), OriginatingServer is located in “CY1”: Cheyenne, US;
- Tenant c4iocesar.onmicrosoft.com was created in the Latin America structure (LAMPR), OriginatingServer is located in “BLU”: Virginia, US.
Datacenter | Region | Location |
CP1 | LAM | Brazil |
GRU | LAM | Brazil |
GRX | LAM | Brazil |
HKN | APC | Hong Kong |
HKX | APC | Hong Kong |
HK2 | APC | Hong Kong |
SIX | APC | Singapore |
SIN | APC | Singapore |
SG2 | APC | Singapore |
KAW | JPN | Japan |
OS1 | JPN | Japan |
OS2 | JPN | Japan |
TY1 | JPN | Japan |
AM3 | EUR | Amsterdam, Netherlands |
AM2 | EUR | Amsterdam, Netherlands |
AMS | EUR | Amsterdam, Netherlands |
AMX | EUR | Amsterdam, Netherlands |
DB3 | EUR | Dublin, Ireland |
DB4 | EUR | Dublin, Ireland |
DBX | EUR | Dublin, Ireland |
HE1 | EUR | Finland |
VI1 | EUR | Austria |
BL2 | NAM | Virginia, USA |
BL3 | NAM | Virginia, USA |
BLU | NAM | Virginia, USA |
SN1 | NAM | San Antonio, USA |
SN2 | NAM | San Antonio, USA |
BN1 | NAM | Virginia, USA |
BN3 | NAM | Virginia, USA |
DM1 | NAM | Des Moines, Iowa, USA |
DM2 | NAM | Des Moines, Iowa, USA |
BY1 | NAM | Bay Area, USA |
BY2 | NAM | Bay Area, USA |
CY1 | NAM | Cheyenne, Wyoming, USA |
CY2 | NAM | Cheyenne, Wyoming, USA |
CO1 | NAM | Quincy, Washington, USA |
CO2 | NAM | Quincy, Washington, USA |
CH1 | NAM | Chicago, USA |
So does this mean that all mailboxes hosted in “c4iocesar.onmicrosoft.com” are hosted in BLU (Virginia, US)? No.
As we can see in the cmdlet below, mailboxes are hosted in different servers/locations:
get-mailbox | fl iden*,servername*
Identity : aux.pg
ServerName : cp1pr80mb0453
Identity : Barbosa BB. JJ
ServerName : blupr80mb1106
Identity : chuck
ServerName : grupr80mb172
Identity : DiscoverySearchMailbox
ServerName : cp1pr80mb0391
Identity : stevbios
ServerName : grupr80mb0396
Identity : diretor
ServerName : grupr80mb345
Identity : testefrederico
ServerName : grupr80mb0747
Identity : usercloud
ServerName : grupr80mb201
Exchange provides reliability and availability for O365 clients. This means that for each tenant, there are replicas hosted in different physical locations.
https://o365datacentermap.azurewebsites.net/
Most of the companies that provide cloud solutions do resiliency. Some deliver GEO-resiliency, meaning a better high-availability (or an increase for the availability level).
If we filter region-specific, we can list Exchange Online structure avilable for two available countries in Latin America:
Although we list these countries as EXO-enabled, it doesn’t mean that mailboxes will be hosted only in those datacenters. For service assurance and quality, we have reliability and high availability (consequently replicas in other locations).
Now that we have explained a little bit more about the structure, let’s talk about something that most of the administrators are concerned about: my mailbox is not located in my country, thus I will have latency.
Ruling out any other latency-issue that might affect your organization, if your DNS points to the same country you are located, latency is not expected. The reason is due to the fact that the connection is done via the closest datacenter (in this scenario, Brazil means GRU). If your DNS points to another location, such as US, you will face a larger roundtrip and consequently, latency.
We can use simple tools to help us to understand this behavior: nslookup and ping.
Using a US-hosted DNS and the mailbox in BLU/San Antonio (SN), the connection will point to “outlook-namnorthwest2”. Since the end user is physically located in Brazil, we will face latency:
Below we have the latency for a simple ping request.
DNS is pointing to US (171ms) vs. DNS pointing to Brazil (15ms). This happens due to the fact that we use the same FQDN (outlook.office365.com), but DNS servers will respond to different records based on their location.
Latency occurs because the roundtrip is bigger. Consequently, products that use O365 FQDNs will have delay.
Mailbox below is located in “cp1”. Computer is using a DNS hosted in Brazil:
Using OWA to attach a 25MB file, upload takes 27 seconds:
When we switch to a US DNS, the same upload takes 36 seconds:
Let’s compare this to a 20MB file being attached to Outlook and the computer is using a US DNS:
It takes approximately 5 minutes for the end user to upload it. This happens due to the fact that Outlook uses RPC/HTTP or MAPI/HTTP – meaning it will encapsulate the request.
This splits the request in several small packages – in this example, 640 packages with approximately 32kb each. Therefore, using a DNS from the region that the connection is being done will improve product performance and bring a better experience to the end user.
There are scenarios which the time difference is perceptible to the end user (OneDrive upload, OST download) and also for the administrator (Slow Mailbox Migration, Exchange Server with latency to reach the endpoint).
“My DNS is already set to a NS from my country, but I have the same delay and the FQDN responds to US/another region”. Some ISP providers use forwarders to Unites States or other countries. You can either ask the ISP to exclude forwarders to O365 FQDNs, or use another DNS to those connections.
This roundtrip scenario is also explained here.