How things work: Group Policy Processing
I have recently seen an error message appear during the first logon following a Windows 10 Anniversary Update. Below is information on how the problem manifests itself, the underlying technology (hardcore Group Policy knowledge) and how to work around it – as it can affect Specops customers.
The error message displayed to the user after the first logon:
After clicking OK, you return to the logon prompt. The next logon works fine, and all logons thereafter are good.
Since this is Group Policy related, we wanted to make sure the cause was not our Group Policy extensions.
After some investigation, we found the cause of the problem related to how the Group Policy Service is configured during a fresh install (installing the Windows 10 Anniversary Update is in essence a full install but with settings and apps still intact), and reconfigured when 3rd party CSEs are detected. While we are not the culprit, we are involved in the process, and depending on how OS images are deployed, there is small chance that customers can see this behavior. The good news is that the workaround is simple, and aside from the annoyance, it does not break anything.
The Group Policy Service every Group Policy runs inside is a generic process (executable) called svchost.exe, as with many other built-in Windows services. Svchost.exe is shipped with the OS, and enables multiple services to have a shared process, and thus use common resources more efficiently. A quick look at the list of running processes gives a good understanding on how common it is for services to run in the generic host. Compare it to the spoolsv.exe (printer services) that runs its own custom service host process.
The Services tab displays the name of the Services, as well as their Status and Group. The Group is very important, as it is part of the root of the problem. In this screenshot, we can see that the Group Policy Service (gpsvc) shares the same process (PID 972 in this case) and group (netsvc) as a bunch of other Windows Services.
Since the Group Policy Service shares memory, resources, etc. with other services in the same process, it can also potentially create problems for any other service if it crashes or writes to memory that it should not. For example, if the Server Service responsible for file sharing in Windows is running in the same process, a faulty Group Policy CSE could potentially kill it. In Windows 2000 and XP (prior to the Group Policy Service) Group Policy actually executed directly inside Winlogon and Windows itself. Back then, a crash in GP could actually BSOD the entire OS, compared to now when only core services die if any Group Policy Extension behaves badly.
There are differences between Group Policy CSEs shipped with Windows, and 3rd party extensions. If a third party CSE like Specops Deploy is installed, during the next GP refresh interval a message will be added to the Group Policy operational log event log with ID 5331. The message text would look like this “Service configuration update to standalone was attempted due to the presence of Group Policy client extension Specops Deploy that is not part of the operating system and completed with status 0x0.”
What happens here is that Microsoft wants protection from a scenario where a poorly written (not applicable to Specops of course…) Group Policy CSE can kill the other Windows Services. Therefore, it changes the configuration of the Group Policy Service to run in its own process where the worst-case scenario would be killing the Gpsvc in case of a poorly programmed CSE.
After installing Specops Deploy, when you look at the services tab, the Group Policy Service is no longer in the netsvc group, but in its own GPSvcGroup, and the process will not include any other services.
You can see that the Type value for the service is 0x10, meaning that it is standalone and not sharing the svchost.exe process with anyone in the Services part of the registry, a value that before the 3rd party CSE was 0x20, meaning shared.
So with this knowledge, we ask; why does the initial error pop up once? Why is Windows the culprit, but our CSEs are still involved? And why do we need to know about this? One scenario is a vanilla Windows installation booting for the first time (where the Group Policy Service is running in a shared mode), and the Specops Deploy CSE is already installed. It sounds impossible, but it really is straight forward if you have a Window 10 box with Specops Deploy already installed, and the Windows 10 Anniversary Update applied. What happens is an in-place upgrade/installation with all previous software still installed. Meaning the first boot after installation the Group Policy Service starts in shared mode, immediately detects that there is a 3rd party CSE, reconfigures, shuts down, and starts up in the correct standalone mode. The problem is that Winlogon has already initiated RPC communication with the shared process Group Policy Service, but that service is stopped, the process is long gone, and replaced with a new single process Group Policy Service. So when Winlogon tries to call the GPsvc that was there during boot, it’s gone, and displays:
The Group Policy Client Service failed the sign-in.
The universal unique identifier (UUID) type is not supported.
This message is likely not even the correct message in regards to UUIDs, but rather a result of this unexpected behavior in the Group Policy Service where a “never booted” vanilla OS already has third party CSEs installed, and performs a foreground GP refresh. As the second logon works, it seems like Winlogon realizes that the RPC calls fail, and reestablishes a connection to the Group Policy Service. This time, the single process Group Policy Service picks up the phone and things just work again.
Fortunately, there is a very simple workaround – a reboot or second logon. The extra boot could be added to the Task Sequence if Specops Deploy is used for imaging, but normally the installation of the Deploy CSE would happen before the capture phase, and the needed boot would most likely already have taken place.
As an interesting side note, the SCCM OS deployment CSE has this exact issue, and since it is forced into their image using DISM, there is a high probability to generate the problem frequently. https://support.microsoft.com/en-us/kb/2976660