Troubleshooting M365 Tenant Latency Issues
Latency Issues Accessing Microsoft 365 Sites - SharePoint, Admin Center
When using a SaaS (Software as a Service) product like Microsoft 365, troubleshooting performance issues can be challenging due to the lack of server-level access. This limitation makes it difficult to analyze network traffic or monitor server resources like CPU, RAM, or storage, which could impact the performance of SharePoint sites.
We were experiencing errors, timeouts, delays, and non-functional features for users. It took weeks to identify the root cause, during which business teams frequently asked if resource-intensive processes could be disabled. The issue was finally nailed down to an update to Firewall which caused an issue with the Domain Name Service. This article covers troubleshoot techniques which may help.
From the network tab, we observed frequent ERR_CONNECTION_RESET
errors during office hours, which affected all users.
Cause: Delays Loading CDN Files
Pages from SharePoint, SharePoint Admin Center, Microsoft 365 Admin Center, Viva Engage, and other M365 applications were taking an unusually long time to load. Debugging the issue was a nightmare.
Example Errors
https://res-1.cdn.office.net/files/odsp-web-prod_2024-02-09.011/splistwebpack/1302.js
GET https://res-1.cdn.office.net/files/sp-client/sp-pages-assembly_en-us_dabadef8fce89e16a072e1ef7c166a4d.js?1709042329628 net::ERR_CONNECTION_RESET 200 (OK)
GET https://shell.cdn.office.net/shellux/en/shellstrings.52af792134b43bb66ac6fb020ec0b324.json net::ERR_HTTP2_PING_FAILED 200 (OK)
Troubleshooting Steps
CDN investigation
The business users do not control the CDN profiles for res-1.cdn.office.net
or res.cdn.office.net
, as these are managed by Microsoft. We needed to determine why latency issues were occurring in the background.
Common CDN Paths and URLs
Frequently Accessed CDN Paths:
/USERPHOTO.ASPX
/SITEASSETS
/SITES/CONNECTDESIGNASSETS/SITETEMPLATES
/SITES/CONNECTDESIGNASSETS/APPROVEDPHOTOS
/MASTERPAGE
/STYLE LIBRARY
/CLIENTSIDEASSETS
Example CDN URLs:
https://tenant.sharepoint.com/SITES/CONNECTDESIGNASSETS/APPROVEDPHOTOS
https://tenant.sharepoint.com/SITES/CONNECTDESIGNASSETS/SITETEMPLATES
Findings from CDN Investigation
Using tracert
to res-1.cdn.office.net
, delays were identified at an intermediate node. Below are the key findings:
SPO CDN Investigation Results:
- Delays were observed for requests to:
https://shell.cdn.office.net
https://res-1.cdn.office.net
Example Analysis for https://res-1.cdn.office.net
:
- Server Timing: 40ms
- Client Network Delay: 70% of the delay was due to queuing and stalling.
Reasons for Queuing and Stalling
According to the Microsoft Edge Developer Documentation, queuing and stalling can occur due to:
- Higher priority requests.
- The browser reaching the limit of six TCP connections for the origin (applies to HTTP/1.0 and HTTP/1.1).
- Temporary allocation of space in the disk cache.
Network Behavior Observed
The TCP and TLS handshake between the client and server completed successfully. However, the following issues were observed:
- Multiple duplicate acknowledgments (dup-acks) were sent by the client.
- Fast retransmissions were sent by the server.
- Eventually, the server sent a TCP reset, aborting the connection.
Root Cause
The delays were attributed to:
- Network congestion.
- Packet loss.
- Errors on intermediate devices involved in the communication.
SharePoint-Specific Troubleshooting
Page Diagnostics Tool
Use the Page Diagnostics Tool for SharePoint to analyze page performance and identify potential bottlenecks.
Backend Debugging
Investigate SharePoint backend logs to uncover errors or performance issues that may be causing delays with the help of Microsoft engineers.
Networking Tools
1. Command-Line Tracert
To trace the route to your SharePoint tenant, run the following command in the command prompt:
tracert tenant.sharepoint.com
Sample output:
1 11 ms 4 ms 5 ms 000.000.00.1
2 8 ms 8 ms 7 ms 000.000.00.1
3 21 ms 20 ms 23 ms 00.000.00.97
4 19 ms 17 ms 17 ms aaaa-core-aa-aaaa-aaa.network.virginmedia.net [00.000.00.81]
5 * * * Request timed out.
6 * * * Request timed out.
7 27 ms 25 ms 34 ms aaaa-core-aa-aaaa-ddd.network.virginmedia.net [00.000.000.186]
8 * * * Request timed out.
9 43 ms 26 ms 26 ms aaaa-core-aa-aaaa-aaa.ntwk.msn.net [000.00.00.160]
10 * 28 ms 25 ms aaaa-core-aa-aaaa-aaa.ntwk.msn.net [000.00.00.127]
11 25 ms 25 ms 27 ms 00.000.000.000
12 * * * Request timed out.
13 * * * Request timed out.
14 * * * Request timed out.
15 * * * Request timed out.
16 22 ms 19 ms 17 ms 00.000.000.10
2. WinMTR Tool:
Use WinMTR to identify packet loss and latency issues.
3. Netsh Trace:
To capture detailed network traces, follow these steps:
Step 1: Start the trace: Open the command prompt as an administrator and run:
Netsh trace start scenario=NetConnection,wfp-ipsec capture=yes report=yes filemode=circular overwrite=yes maxsize=1024 persistent=yes tracefile=c:\%computername%_nettrace.etl
Step 2: Reproduce the issue: Perform the actions that are causing the latency or errors.
Step 3: Stop the trace: Once the issue has been reproduced, stop the trace by running:
Netsh trace stop
Step 4: Upload the trace: Analyze the trace file or share it with your network team for further investigation.
Observations
In this scenario, delays were observed starting from the VPN, and an intermittent hop was identified as the cause of the issue.
Resolution
The issue was caused by a combination of an update of the domain name service (PDNS) and firewall and reverting the update temporarily until the issue was fixed
References
Microsoft 365 network connectivity test
Use Microsoft 365 cdn with spo
Microsoft 365 network connectivity principles
Protective Domain Name Service