Well we finally got the green light at one of my customers to upgrade them to Windows 2012 R2 Active Directory. We went through all of our due diligence and planned out the project accordingly. We took a phased approach when we would take the following high level steps to complete our Migration:
- Create a new Conceptual Design for Active Directory 2012 R2 – Done!
- Create a new Detailed Design for Active Directory 2012 R2 – Done!
-
Create a test plan to ensure we could validate everything before moving to production – Not Done!
- Customer didn’t have a Pre-Prod environment à So, time to cowboy up! This would come back to bite us later as you will see.
- Create a Method of Procedure document listing all of the steps required to build our new Windows 2012 R2 Domain Controller – Done!
- Build the Base Windows 2012 R2 Servers per the Method of Procedure documents – Done!
-
Check for Hotfixes on the Windows 2012 R2 Servers running the Active Directory and DNS Roles – Done!
- What I didn’t check for was client hotfixes!! à Wait for it I will explain in a little while.
So let me jump ahead a bit here. We successfully performed the following activities:
- Backed up the existing Active Directory DC’s.
- Extended the SCHEMA using ADPREP on the existing Schema Master running Active Directory 2003 R2
- Forced Replication and ensured the Schema Attributes had changed properly
- Raised the Forest Functional Level to Windows 2003
- Installed the Active Directory and DNS Roles on my First new Windows 2012 R2 Domain Controller
- Ran the Active Directory (DCPromo) process from Server Manager on my new Windows 2012 Domain Controller
- Added the Domain Controller as a Replica DC in an existing domain
- After the Reboot – Changed the IPV6 DNS Entry that is created to local host back to dynamic
- Stopped and Started the Netlogon Service to force registration of the Service Records that AD uses
- Triggered the Active Directory Knowledge Consistency Checker (KCC) to create AD Site Connection Objects between the domain controllers
- Forced replication with Repadmin
- Validated that AD Integrated DNS had replicated and that AD Replication was working
The above steps are all pretty normal for an AD Migration Project. I have done a lot of these over my career so this is kind of second hand knowledge to me now.
What I didn’t expect was what happened next…..
We performed the Migration on a Sunday evening and I normally don’t receive any phone calls on Monday… Everything is normally pretty transparent to the end users.
There it is the dreaded Monday morning phone call…. Umm Dave we have a problem here…
Customer: We have a couple of web servers that are throwing exceptions on the user logons… These are critical to the business and it needs to be fixed ASAP.
Yikes… That is not the kind of call a consultant like me wants to hear…
So we dug in a little bit and found that the error message they were receiving on the Windows 2008 R2 Web Server was this:
“An error (1301) occurred while Enumerating the groups. The group’s SID Could not be resolved.”
when performing a UserPrincipal.GetAuthorizationGroups()
When performing a UserPrincipal.GetAuthorizationGroups ()
Error 1301 (ERROR_SOME_NOT_MAPPED) returned.
Error 1301 (ERROR_SOME_NOT_MAPPED) returned.
We realized that this had to be an issue with the new Domain Controller so we used a technique that I have used for years to have it stop authenticating users.
You can change the LdapSrvPrioroty by creating a DWORD value in HKLM\System\CurrentControlSet\Services\Netlogon\Parameters
Set the Value to something other than 0 which is the default for all domain controllers. Stop and start the netlogon service and force AD Replication.
This change was made on our new Windows 2012 R2 Domain Controller. In essence it simply changed it to a higher priority (Higher the value means that it won’t be used until the other domain controllers are not available).
Active Directory always chooses the lowest priority when looking at Service Records in DNS.
Now we rebooted the affected Web Server and checked logon server by opening a command prompt and typing SET.
The logon server was now showing the old Windows 2003 Domain Controller and users were able to get back to work.
Emergency averted… not quite… I now have a nice new Windows 2012 R2 Domain Controller that I can’t use because it breaks this mission critical web site.
So our next step was to involve the development team to see if there would be a way to change their code or do something to fix this problem.
As part of our research we stumbled across this hotfix which had just been released on June 30, 2014.
Hotfix # 2830145… Seemed like it addressed our exact issues that we were having… So I read the fine print on the Hotfix.
Cause:
This issue occurs because SID S-1-18-1 and SID S-1-18-2 cannot be resolved on Windows 7-based or Windows Server 2008 R2-based computers.
Note: In Windows Server 2012, two new security principal SIDs are introduced to differentiate between proof of possession and Service-for-User-to-Self (S4U2Self) protocol transitions.
For more information about the new SIDs, go to the following Microsoft website:
Resolution:
To resolve this issue, install the hotfix on the Windows 7-based and Windows Server 2008 R2-based computers in the domain.
WHAT!! I need to deploy a hotfix to all 2500 Windows 7 machines in the Domain and all of the Servers???
That won’t be fun… Then I got to thinking that this is likely only an issue where the code is executing.
So… We deployed the hotfix onto the affected Windows 2008 R2 Webserver… Reverted the registry setting for LdapSrvPriority above …
Validated that it was authenticating to the new Windows 2012 R2 Domain Controller and that the Web Application was indeed working.
We didn’t need to deploy this to every machine… Only the ones affected.
Remember to always check for current hotfixes prior to commencing a project like this… It will save you a ton of time.
This was indeed a weird problem and I am glad that we were able to fix it and share the knowledge with you!
Thanks,
Dave