I’m currently working on an Office 365 migration for a client and have been seeing extremely long migration times for mailboxes. We are talking 30+ hours to move a single 20GB mailbox. The client has a 100Mbps symmetrical fiber pipe, so we know that isn’t the issue, and while their hypervisor and SAN setup aren’t the highest performing we should be seeing better performance than that.
First thing we have to do is actually figure out how bad the situation is. To do so we use this “You had me at EHLO” blog post to figure out what is going on. You need to download this TechNet gallery script and save it locally. Next open up an administrative PowerShell window and change directory to the Directory the script you just downloaded and run the following commands (or if you want, dump them into PowerShell ISE and run them from there.
# Enable unsigned scripts. Set-ExecutionPolicy Unrestricted # Create the ProcessStats function. # Notice the space between the two periods. # This is important. It is Period space Period Slash. . .\AnalyzeMoveRequestStats.ps1 # Connect to MSOL Service. Connect-MsolService # Set the variables $moves = Get-MoveRequest | ?{$_.Status -ne 'queued'} $stats = $moves | Get-MoveRequestStatistics –IncludeReport # Generate your report. ProcessStats -stats $stats -name ProcessedStats1
After running this, my results looked like this.
Name : ProcessedStats1 StartTime : EndTime : 3/6/2017 5:29:27 PM MigrationDuration : 1 day(s) 04:08:33 MailboxCount : 61 TotalGBTransferred : 302.20 PercentComplete : 92.01 MaxPerMoveTransferRateGBPerHour : 1.75 MinPerMoveTransferRateGBPerHour : 0.19 AvgPerMoveTransferRateGBPerHour : 1.05 MoveEfficiencyPercent : 80.05 AverageSourceLatency : 1,469.07 AverageDestinationLatency : 929.00 IdleDuration : 214.40 % SourceSideDuration : 71.68 % DestinationSideDuration : 14.10 % WordBreakingDuration : 7.14 % TransientFailureDurations : 0.91 % OverallStallDurations : 0.82 % ContentIndexingStalls : 0.00 % HighAvailabilityStalls : 0.00 % TargetCPUStalls : 0.77 % SourceCPUStalls : 0.05 % MailboxLockedStall : 0.00 % ProxyUnknownStall : 0.00 %
So our problem is the on premise server causing the problem. We can see in both the AverageSourceLatency and the SourceSideDuration for the idle timer that they are significantly higher than the target. We also notice that we are only transferring 1.75GB per hour which is again horrible.
The resolution for all of this is to increase the MaxActiveMovesPerTargetMDB and MaxActiveMovesPerTargetServer settings from their default of 2, and 5 respectively to anything between 10 and 100. Personally I set mine to 20. Tony Redmond does an excellent job of explaining what these settings do, and how to modify them in his blog post here. Additionally he explains why Microsoft sets them so low to begin with. The TL;DR is that by setting these settings higher you use significantly more CPU and Disk IO on your CAS servers, which in smaller environments can cause disruptions in services to your users, and can also overwhelm your CAS to the point where it can’t keep up with the number of migrations it is running and they time out. All the articles and TechNet forum posts I’ve seen have suggested using a setting of 10 for these, which I agree is a good starting point, then you can tune your server from there. Another note on this is that you need to modify each CAS server in your environment and restart the Exchange Mailbox Replication service after modifying the config, on each server so that this will take effect.
Pingback: Cleaning up Lync ADSI attributes for all users in Active Directory for Office 365 migration using PowerShell. – Get-SysadminBlog