As organizations begin piloting Windows 11 24H2 deployments across their enterprise environments, unexpected anomalies may surface during early adoption. One such issue has emerged in a recent pilot: a workstation, upgraded via Windows Update for Business (WUfB) from Windows 10 22H2 to Windows 11 24H2, experienced a loss of trust relationship with the Active Directory domain.
While this issue affected only 1 out of 12 devices in the pilot cohort, it raises significant concerns for wider rollouts. This post provides a deep dive into the root cause analysis, troubleshooting methodology, and a preventive configuration to mitigate the risk in future deployments.
Observed Symptoms
In the context of a Windows 11 24H2 pilot upgrade via Intune’s WUfB mechanism:
-
One workstation became unable to authenticate to the domain post-upgrade.
-
Users were presented with a “The trust relationship between this workstation and the primary domain failed” error upon login.
-
Rejoining the domain temporarily resolved the issue, but concerns remained about recurrence across the fleet.
Given that the upgrade path was formally supported, this behaviour was unexpected and warranted a thorough investigation.
Technical Background
This problem typically indicates that the secure channel between the workstation and the domain controller (DC) has been broken. The secure channel is used for machine authentication and is established during domain join. Failures can occur due to:
-
A mismatch in the machine account password (rotated every 30 days by default).
-
Time synchronization issues.
-
Kerberos authentication failures, especially in response size or protocol negotiation.
In this case, evidence pointed to an issue during the post-upgrade reinitialization of secure communications, particularly Kerberos ticket exchanges that failed under specific network transport conditions.
Troubleshooting and Root Cause Analysis
Upon examining the affected workstation’s event logs and comparing them with network captures and domain controller diagnostics, the root cause was identified:
The Kerberos Key Distribution Center (KDC) on the DC attempted to respond to the workstation’s request, but the response packet size exceeded the limit for UDP transport, triggering a protocol-level error.
Specifically:
-
The client initiated a Kerberos Authentication Service Request (AS_REQ) over UDP.
-
The KDC’s response (AS_REP) was too large to fit within the default
MaxDatagramReplySize
(1465 bytes). -
The KDC responded with KRB_ERR_RESPONSE_TOO_BIG, signaling that the client should switch to TCP.
-
The workstation failed to retry over TCP in a timely or expected manner, leading to authentication failure and a broken secure channel.
This behaviour was not present in prior builds (e.g., 22H2), suggesting a change in network stack behaviour, Kerberos token size inflation (possibly due to group memberships or SID history), or handling differences introduced in Windows 11 24H2.
Network Layer Insight
Network analysis revealed that UDP fragmentation was being mishandled in some parts of the network. If the network drops fragmented UDP packets or improperly reassembles them, Kerberos replies exceeding the size threshold fail to reach the client.
Furthermore, many enterprise environments use VPN tunnels, MTU restrictions, or firewalls that interfere with large UDP responses, making Kerberos over TCP the more reliable transport.
In this pilot, the issue only surfaced on one device likely due to:
-
A unique token size for that machine/user combination.
-
A path through the network with suboptimal UDP handling.
-
A Windows 11 24H2 client configuration that did not gracefully fallback to TCP.
Recommended Workaround: Forcing Kerberos to Prefer TCP
To mitigate this, a registry modification can be implemented on domain controllers to allow larger Kerberos responses over UDP—or encourage fallback to TCP.
Registry Key: MaxDatagramReplySize
Path:HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Kdc
Entry:MaxDatagramReplySize
Type:REG_DWORD
Default Value:1465
(decimal, bytes)
Guidance:
-
Increasing this value may delay triggering the TCP fallback, but can increase UDP fragmentation risk.
-
Conversely, reducing this value will encourage clients to use TCP more often, at the cost of slight latency increase during authentication.