Hey Checkyourlogs Fans,
In the previous post, we talked about the essential operator’s report for Storage Spaces Direct in Windows Server 2019. Today, I want to talk to you about the performance penalty that is required to run Deduplication on Windows Server 2019. Nothing in life is free, and this is certainly the case for Deduplication. Don’t get me wrong I love this feature and think that it is one of the best in Windows Server 2019; however, you must have an understanding of what it is that you are configuring.
The first thing that you need to understand is the Event Logs that capture information about Deduplication. These logs carry essential information especially with historical job statistics on Deduplication.
$DedupStatus = Get-DedupStatus | Select-Object * $DedupVolumeStats = Get-dedupvolume D: | Select * $Dedupevents = Get-WinEvent -MaxEvents 10 -LogName Microsoft-Windows-Deduplication/Diagnostic | Select-Object * $DedupDiagevents = Get-WinEvent -MaxEvents 10 -LogName Microsoft-Windows-Deduplication/Scrubbing | Select-Object *
The two logs are:
Microsoft-Windows-Deduplication/Diagnostic
Microsoft-Windows-Deduplication/Scrubbing
As jobs are running, you can check historical statistics about them.
Specifically, you want to watch for EVENT ID 10240. This log shows the maximum amount of RAM, that was used during the deduplication job.
Here is what a 2-node S2D Cluster both nodes with 2 x 2TB NVME SSD Drives looks like:
(As you can see a 3009 MB Max memory and 4 Cores) with 2 x CSV’s
Here is what a 4-node S2D Cluster all nodes with 17 x 2TB NVME SSD Drives looks like:
(As you can see 31416 MB Max memory and 24 Cores) with 4 x CSV’s
Here is what a 4-node S2D Cluster all nodes with 16 x 2TB NVME (Journal) and 96 10TB HDD Drives looks like:
(As you can see 78186 MB Max memory and 32 Cores) with 8 x CSV’s
So, from what I can tell based on my observations the total amount of RAM that is consumed appears to be based on how many drives in the system + number of CSV’s. I’m going to have to confirm with Microsoft about the actual calculation as there is nothing posted online about this.
We can see that there is a performance penalty for Deduplication though.
To view the default settings in Windows Server 2019, you can run the following:
get-dedupschedule | select Type,Priority,Inputoutputthrottlelevel,days,cores,duration,enabled,faststart,full,idletimeout,inputoutputthrottle,memory,name,readonly,scheduledtask | out-gridview
As you can see we are going to consume all of the available CPU Cores and up to 50 % of the available memory on the system. This is why our results were varying above with the performance penalty. The two node S2D Cluster only had 20 % available memory left free on each node. The other production all flash S2D 4 node cluster had about 35 % free. Lastly, the 4 node hybrid S2D Cluster didn’t have many VM’s running on it and had 85 % free.
I think it is a fair statement to say that with Deduplication enabled your performance penalty will vary depending on how much available RAM is left on the system. From an architecture perspective, this could be quite problematic because if we need to drain roles off systems while deduplication jobs are running, there might not be enough free RAM to perform the live migrations. So in your designs, you may want to consider tweaking these values if you find you are having issues.
So, how can we tune this? It is pretty easy, and Microsoft has given us an example in the following document of how to strip out the default jobs and recreate them with your default settings.
https://docs.microsoft.com/en-us/windows-server/storage/data-deduplication/advanced-settings
I like their example where Dedup is scheduled to run during off-peak hours on weekends and weeknights after 7:00 PM with a maximum duration of 11 hours.
5 easy steps get you there:
- Disable the scheduled hourly Optimization Jobs
- Remove the currently scheduled Garbage Collection and Integrity Scrubbing Jobs.
- Create a Nightly Optimization job that run at 7:00 PM with high priority and all the CPUs and memory available on the system
- Create a weekly Garbage Collection job that runs on Saturday starting at 7:00 AM with high priority and all the CPUs and memory available on the system
- Create a weekly integrity scrubbing job that runs on Sunday starting at 7 AM with high priority and all the CPUs and memory available on the system
#1. Disable the scheduled hourly Opimization Jobs Set-DedupSchedule -Name BackgroundOptimization -Enabled $false Set-DedupSchedule -Name PriorityOptimization -Enabled $false #2. Remove the currently scheduled Garbage Collection and Integrity Scrubbing Jobs. Get-DedupSchedule -Type GarbageCollection | ForEach-Object { Remove-DedupSchedule -InputObject $_ } Get-DedupSchedule -Type Scrubbing | ForEach-Object { Remove-DedupSchedule -InputObject $_ } #3. Create a Nightly Optimization jobs that runs at 7:00 PM with high priority and all the CPUs and memory available on the system New-DedupSchedule -Name "NightlyOptimization" -Type Optimization -DurationHours 11 -Memory 100 -Cores 100 -Priority High -Days @(1,2,3,4,5) -Start (Get-Date "2018-08-08 19:00:00") #4. Create a weekly Garbarge Colelction job that runs on Saturday starting at 7:00 AM with high priority and all the CPUs and memory available on the system New-DedupSchedule -Name "WeeklyGarbageCollection" -Type GarbageCollection -DurationHours 23 -Memory 100 -Cores 100 -Priority High -Days @(6) -Start (Get-Date "2016-08-13 07:00:00") #5. Create a weekly integrity scrubbing job that runs on Sunday starting at 7AM with high priority and all the CPUs and memory available on the system New-DedupSchedule -Name "WeeklyIntegrityScrubbing" -Type Scrubbing -DurationHours 23 -Memory 100 -Cores 100 -Priority High -Days @(0) -Start (Get-Date "2016-08-14 07:00:00")
You may want to consider when performance tuning your Deduplication Jobs is whether or not to Enable Partial File Optimization. This setting ensures that the majority of the file gets optimized even though sections of the file change regularly.
#Enable Optimization on Partial Files (Optional) on a per volume level Get-DedupVolume -Volume C:\ClusterStorage\Volume1 | Select * Set-DedupVolume -Volume C:\ClusterStorage\Volume1 -OptimizePartialFiles
Lastly, you should look at your Garbage collection settings. For example, if the dedup type is set to Backup Garbage collection never runs. Here is what Microsoft says about Garbage collection:
What is the difference between full and regular Garbage Collection?
There are two types of Garbage Collection:
Regular Garbage Collection uses a statistical algorithm to find large unreferenced chunks that meet certain criteria (low in memory and IOPs). Regular Garbage Collection compacts a chunk store container only if a minimum percentage of the chunks are unreferenced. This type of Garbage Collection runs much faster and uses fewer resources than full Garbage Collection. The default schedule of the regular Garbage Collection job is to run once a week.
Full Garbage Collection does a much more thorough job of finding unreferenced chunks and freeing more disk space. Full Garbage Collection compacts every container even if just a single chunk in the container is unreferenced. Full Garbage Collection will also free space that may have been in use if there was a crash or power failure during an Optimization job. Full Garbage Collection jobs will recover 100 percent of the available space that can be recovered on a deduplicated volume at the cost of requiring more time and system resources compared to a regular Garbage Collection job. The full Garbage Collection job will typically find and release up to 5 percent more of the unreferenced data than a regular Garbage Collection job. The default schedule of the full Garbage Collection job is to run every fourth-time Garbage Collection is scheduled.
#Manually run Full Garbage Colelction to gain approximately another 5% or so free disk space back #If Dedup type is set to Backup --> Then Garbage colelction is NEVER Run you need to do it manually Start-DedupJob -Type GarbageCollection -Full
You can grab a copy of the script used in this blog here:
https://github.com/dkawula/Operations/blob/master/S2D/PerfTuneDedup.ps1
Thanks, I hope this helps and have a great weekend,
Dave
Dave – Step #2 and Step#3 seem to have the same command for New-DedupSchedule with the exception of the year it becomes active. Was this intentional?
Good Catch Tim I fixed it
Hi Dave, Great series of articles! You seem to be the only one out there who understands how Windows Server 2019 dedup works 🙂 We’re trying to get dedup up and running on our Windows Server 2019 file server. Our file server is configured with 2 cores and 6GB of memory. Although we have a number of disks, we only want to run dedup on one of these disks – it’s a 2TB ReFS disk with 330GB free space used to store FSLogix disks. The scheduled jobs are failing with 0x80565309. A required filter driver is either not installed, not loaded, or not ready for service is reported in the Microsoft-Windows-Deduplication\Operational log. Microsoft-Windows-Deduplication\Diagnostic reports the following: Minimum memory: 1470MB; Maximum memory: 8182MB; Minimum disk: 1024MB; Maximum cores:8. Based on this article, I’m having some trouble understanding what these values mean. How can it be reporting more that 8GB memory use when there is only 6GB in the server? Via de paging file maybe? I could understand if the paging file was included however it reports 8 cores being used when the server only has 2. Is it saying that it needs to have 8GB memory and 8 cores to be able to work? Are the two events linked? I would appreciate your advice. Thanks for your time!
If the Dedup.sys filter driver is missing make sure Dedup is installed