Datrium ESXi Microsoft VMware vSphere Windows Server

VMware Paravirtual SCSI adapter: Is it really that much faster?

I asked the same question myself after reading a best practice guide from Datrium that suggested using the VMware PVSCSI controller instead of the default recommendation of the LSI SAS controller that VMware makes when you create a Windows VM.

Out of curiosity I spun up a new server 2016 VM. 4 Cores 8GB of RAM, and a 100GB drive, hosted on my Datrium storage to find out how much of a difference there was.

For this test I ran during a normal production workload, and used Microsoft DiskSpd with a 16k IO size (my current average for my app servers) to test to see what we would get for results. The specific command I used was

diskspd.exe -b16K -d1800 -h -L -o2 -t4 -r -w50 -c10G C:\io.dat

The first run on the VMware LSI SAS controller resulted in this.

Command Line: C:\Users\cjoyce_admin\Downloads\Diskspd-v2.0.17\amd64fre\diskspd.exe -b16K -d1800 -h -L -o2 -t4 -r -w50 -c10G c:\io.dat

Input parameters:

timespan: 1
-------------
duration: 1800s
warm up time: 5s
cool down time: 0s
measuring latency
random seed: 0
path: 'c:\io.dat'
think time: 0ms
burst size: 0
software cache disabled
hardware write cache disabled, writethrough on
performing mix test (read/write ratio: 50/50)
block size: 16384
using random I/O (alignment: 16384)
number of outstanding I/O operations: 2
thread stride size: 0
threads per file: 4
using I/O Completion Ports
IO priority: normal

Results for timespan 1:
*******************************************************************************

actual test time: 1800.00s
thread count: 4
proc count: 4

CPU | Usage | User | Kernel | Idle
-------------------------------------------
0| 8.35%| 1.84%| 6.50%| 91.65%
1| 8.38%| 1.89%| 6.48%| 91.62%
2| 7.78%| 1.79%| 5.99%| 92.22%
3| 7.39%| 1.60%| 5.79%| 92.61%
-------------------------------------------
avg.| 7.97%| 1.78%| 6.19%| 92.03%

Total IO
thread | bytes | I/Os | MB/s | I/O per s | AvgLat | LatStdDev | file
-----------------------------------------------------------------------------------------------------
0 | 15150776320 | 924730 | 8.03 | 513.74 | 3.888 | 3.175 | c:\io.dat (10240MB)
1 | 15089106944 | 920966 | 7.99 | 511.65 | 3.904 | 3.289 | c:\io.dat (10240MB)
2 | 15108947968 | 922177 | 8.00 | 512.32 | 3.899 | 3.140 | c:\io.dat (10240MB)
3 | 15109013504 | 922181 | 8.01 | 512.32 | 3.898 | 3.086 | c:\io.dat (10240MB)
-----------------------------------------------------------------------------------------------------
total: 60457844736 | 3690054 | 32.03 | 2050.03 | 3.897 | 3.173

Read IO
thread | bytes | I/Os | MB/s | I/O per s | AvgLat | LatStdDev | file
-----------------------------------------------------------------------------------------------------
0 | 7574110208 | 462287 | 4.01 | 256.83 | 3.274 | 2.741 | c:\io.dat (10240MB)
1 | 7539032064 | 460146 | 3.99 | 255.64 | 3.297 | 2.966 | c:\io.dat (10240MB)
2 | 7562526720 | 461580 | 4.01 | 256.43 | 3.297 | 2.861 | c:\io.dat (10240MB)
3 | 7543046144 | 460391 | 4.00 | 255.77 | 3.293 | 2.613 | c:\io.dat (10240MB)
-----------------------------------------------------------------------------------------------------
total: 30218715136 | 1844404 | 16.01 | 1024.67 | 3.290 | 2.798

Write IO
thread | bytes | I/Os | MB/s | I/O per s | AvgLat | LatStdDev | file
-----------------------------------------------------------------------------------------------------
0 | 7576666112 | 462443 | 4.01 | 256.91 | 4.501 | 3.448 | c:\io.dat (10240MB)
1 | 7550074880 | 460820 | 4.00 | 256.01 | 4.510 | 3.479 | c:\io.dat (10240MB)
2 | 7546421248 | 460597 | 4.00 | 255.89 | 4.501 | 3.289 | c:\io.dat (10240MB)
3 | 7565967360 | 461790 | 4.01 | 256.55 | 4.503 | 3.389 | c:\io.dat (10240MB)
-----------------------------------------------------------------------------------------------------
total: 30239129600 | 1845650 | 16.02 | 1025.36 | 4.504 | 3.402
%-ile | Read (ms) | Write (ms) | Total (ms)
----------------------------------------------
min | 0.000 | 0.000 | 0.000
25th | 1.360 | 2.258 | 1.709
50th | 2.818 | 3.885 | 3.269
75th | 4.481 | 6.093 | 5.443
90th | 6.259 | 8.370 | 7.195
95th | 7.163 | 9.928 | 8.987
99th | 10.090 | 13.425 | 12.593
3-nines | 23.523 | 30.284 | 27.785
4-nines | 47.191 | 52.535 | 49.878
5-nines | 190.339 | 161.402 | 190.339
6-nines | 534.581 | 534.289 | 534.289
7-nines | 545.593 | 535.040 | 545.593
8-nines | 545.593 | 535.040 | 545.593
9-nines | 545.593 | 535.040 | 545.593
max | 545.593 | 535.040 | 545.593

Overall not terrible. Now lets look at what we get when we replace the LSI SAS with a PVSCSI.

Input parameters:

timespan: 1
-------------
duration: 1800s
warm up time: 5s
cool down time: 0s
measuring latency
random seed: 0
path: 'c:\io.dat'
think time: 0ms
burst size: 0
software cache disabled
hardware write cache disabled, writethrough on
performing mix test (read/write ratio: 50/50)
block size: 16384
using random I/O (alignment: 16384)
number of outstanding I/O operations: 2
thread stride size: 0
threads per file: 4
using I/O Completion Ports
IO priority: normal

Results for timespan 1:
*******************************************************************************

actual test time: 1800.00s
thread count: 4
proc count: 4

CPU | Usage | User | Kernel | Idle
-------------------------------------------
0| 7.37%| 1.53%| 5.84%| 92.63%
1| 7.02%| 1.40%| 5.62%| 92.98%
2| 6.35%| 1.25%| 5.10%| 93.65%
3| 6.04%| 1.22%| 4.82%| 93.96%
-------------------------------------------
avg.| 6.70%| 1.35%| 5.35%| 93.30%

Total IO
thread | bytes | I/Os | MB/s | I/O per s | AvgLat | LatStdDev | file
-----------------------------------------------------------------------------------------------------
0 | 15667019776 | 956239 | 8.30 | 531.24 | 3.760 | 2.938 | c:\io.dat (10240MB)
1 | 15743369216 | 960899 | 8.34 | 533.83 | 3.741 | 3.011 | c:\io.dat (10240MB)
2 | 15789637632 | 963723 | 8.37 | 535.40 | 3.730 | 2.841 | c:\io.dat (10240MB)
3 | 15788425216 | 963649 | 8.36 | 535.36 | 3.731 | 2.914 | c:\io.dat (10240MB)
-----------------------------------------------------------------------------------------------------
total: 62988451840 | 3844510 | 33.37 | 2135.84 | 3.740 | 2.926

Read IO
thread | bytes | I/Os | MB/s | I/O per s | AvgLat | LatStdDev | file
-----------------------------------------------------------------------------------------------------
0 | 7831814144 | 478016 | 4.15 | 265.56 | 2.660 | 2.405 | c:\io.dat (10240MB)
1 | 7862943744 | 479916 | 4.17 | 266.62 | 2.640 | 2.538 | c:\io.dat (10240MB)
2 | 7904346112 | 482443 | 4.19 | 268.02 | 2.632 | 2.247 | c:\io.dat (10240MB)
3 | 7881277440 | 481035 | 4.18 | 267.24 | 2.631 | 2.557 | c:\io.dat (10240MB)
-----------------------------------------------------------------------------------------------------
total: 31480381440 | 1921410 | 16.68 | 1067.45 | 2.641 | 2.440

Write IO
thread | bytes | I/Os | MB/s | I/O per s | AvgLat | LatStdDev | file
-----------------------------------------------------------------------------------------------------
0 | 7835205632 | 478223 | 4.15 | 265.68 | 4.859 | 3.010 | c:\io.dat (10240MB)
1 | 7880425472 | 480983 | 4.18 | 267.21 | 4.840 | 3.045 | c:\io.dat (10240MB)
2 | 7885291520 | 481280 | 4.18 | 267.38 | 4.831 | 2.946 | c:\io.dat (10240MB)
3 | 7907147776 | 482614 | 4.19 | 268.12 | 4.827 | 2.833 | c:\io.dat (10240MB)
-----------------------------------------------------------------------------------------------------
total: 31508070400 | 1923100 | 16.69 | 1068.39 | 4.839 | 2.959
%-ile | Read (ms) | Write (ms) | Total (ms)
----------------------------------------------
min | 0.000 | 0.000 | 0.000
25th | 1.189 | 2.947 | 1.810
50th | 1.868 | 4.126 | 3.120
75th | 3.536 | 6.037 | 4.971
90th | 5.392 | 8.026 | 6.924
95th | 6.269 | 9.628 | 8.417
99th | 9.446 | 13.234 | 12.021
3-nines | 22.655 | 32.422 | 28.825
4-nines | 45.679 | 50.249 | 48.554
5-nines | 158.326 | 159.371 | 159.371
6-nines | 475.470 | 427.329 | 427.329
7-nines | 475.711 | 427.338 | 475.711
8-nines | 475.711 | 427.338 | 475.711
9-nines | 475.711 | 427.338 | 475.711
max | 475.711 | 427.338 | 475.711

So overall we see roughly a 4% performance increase across the board. Not groundbreaking numbers, however if you’re trying to squeeze every last drop of performance out of your VMs this could be a big step in the right direction.

Speaking of squeezing every last drop, lets see what happens when we test against a ReFS formatted disk.

Command Line: C:\Users\cjoyce_admin\Downloads\Diskspd-v2.0.17\amd64fre\diskspd.exe -b16K -d1800 -h -L -o2 -t4
10G E:\io.dat

Input parameters:

timespan: 1
-------------
duration: 1800s
warm up time: 5s
cool down time: 0s
measuring latency
random seed: 0
path: 'E:\io.dat'
think time: 0ms
burst size: 0
software cache disabled
hardware write cache disabled, writethrough on
performing mix test (read/write ratio: 50/50)
block size: 16384
using random I/O (alignment: 16384)
number of outstanding I/O operations: 2
thread stride size: 0
threads per file: 4
using I/O Completion Ports
IO priority: normal

Results for timespan 1:
*******************************************************************************

actual test time: 1800.02s
thread count: 4
proc count: 4

CPU | Usage | User | Kernel | Idle
-------------------------------------------
0| 8.65%| 1.62%| 7.03%| 91.35%
1| 8.69%| 1.49%| 7.20%| 91.31%
2| 7.83%| 1.35%| 6.47%| 92.17%
3| 7.43%| 1.36%| 6.07%| 92.57%
-------------------------------------------
avg.| 8.15%| 1.46%| 6.69%| 91.85%

Total IO
thread | bytes | I/Os | MB/s | I/O per s | AvgLat | LatStdDev | file
-----------------------------------------------------------------------------------------------------
0 | 18047041536 | 1101504 | 9.56 | 611.94 | 3.263 | 2.708 | E:\io.dat (10240MB)
1 | 18078842880 | 1103445 | 9.58 | 613.02 | 3.258 | 3.004 | E:\io.dat (10240MB)
2 | 18066751488 | 1102707 | 9.57 | 612.61 | 3.260 | 2.712 | E:\io.dat (10240MB)
3 | 18132910080 | 1106745 | 9.61 | 614.85 | 3.248 | 2.727 | E:\io.dat (10240MB)
-----------------------------------------------------------------------------------------------------
total: 72325545984 | 4414401 | 38.32 | 2452.42 | 3.257 | 2.791

Read IO
thread | bytes | I/Os | MB/s | I/O per s | AvgLat | LatStdDev | file
-----------------------------------------------------------------------------------------------------
0 | 9020080128 | 550542 | 4.78 | 305.85 | 2.762 | 2.399 | E:\io.dat (10240MB)
1 | 9030025216 | 551149 | 4.78 | 306.19 | 2.760 | 2.927 | E:\io.dat (10240MB)
2 | 9041592320 | 551855 | 4.79 | 306.58 | 2.759 | 2.342 | E:\io.dat (10240MB)
3 | 9050865664 | 552421 | 4.80 | 306.90 | 2.752 | 2.479 | E:\io.dat (10240MB)
-----------------------------------------------------------------------------------------------------
total: 36142563328 | 2205967 | 19.15 | 1225.53 | 2.758 | 2.547

Write IO
thread | bytes | I/Os | MB/s | I/O per s | AvgLat | LatStdDev | file
-----------------------------------------------------------------------------------------------------
0 | 9026961408 | 550962 | 4.78 | 306.09 | 3.764 | 2.899 | E:\io.dat (10240MB)
1 | 9048817664 | 552296 | 4.79 | 306.83 | 3.754 | 2.998 | E:\io.dat (10240MB)
2 | 9025159168 | 550852 | 4.78 | 306.03 | 3.762 | 2.954 | E:\io.dat (10240MB)
3 | 9082044416 | 554324 | 4.81 | 307.96 | 3.742 | 2.870 | E:\io.dat (10240MB)
-----------------------------------------------------------------------------------------------------
total: 36182982656 | 2208434 | 19.17 | 1226.90 | 3.756 | 2.931
%-ile | Read (ms) | Write (ms) | Total (ms)
----------------------------------------------
min | 0.267 | 0.297 | 0.267
25th | 1.252 | 1.773 | 1.403
50th | 2.019 | 3.097 | 2.618
75th | 3.724 | 5.038 | 4.275
90th | 5.581 | 6.998 | 6.240
95th | 6.395 | 8.584 | 7.525
99th | 9.641 | 12.213 | 11.021
3-nines | 20.505 | 26.232 | 23.305
4-nines | 42.971 | 45.559 | 44.280
5-nines | 238.498 | 175.573 | 204.921
6-nines | 502.382 | 359.149 | 435.862
7-nines | 547.128 | 547.124 | 547.128
8-nines | 547.128 | 547.124 | 547.128
9-nines | 547.128 | 547.124 | 547.128
max | 547.128 | 547.124 | 547.128

With a ReFS formatted disk on top of PVSCSI we see a 17% increase!

So if your applications support it, and you truly want to squeeze every last drop out of your storage, ReFS and PVSCSI is the combination to go with!

Cameron Joyce is a full stack engineer, having experience with all aspects of on premise datacenter, virtualization, disaster recovery, cybersecurity, and Cloud based *aaS technologies. Cameron Joyce has spent more than a decade working in managed services, and brings that experience to his role as a Sr Presales Engineer. Cameron Joyce now shares solutions to problems and new technology overviews through his blog and youtube channels.

%d bloggers like this: