aio=native or aio=threads – Intro
In this post we will compare our Disk IO perfonace given different QEMU architectures such as native or io-threads, and given different chache options. The system used in this test is the following:
CPU |
Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz |
-cpu max,kvm=off,check -smp 8,sockets=1,cores=4,threads=2 -enable-kvm |
RAM | 32 GB in system | -m 8 -mem-prealloc -mem-path /dev/hugepages |
Disk | Samsung 850 EVO 500GB SSD | -deticated to client -device driver=virtio-scsi-pci -format=raw |
Client OS |
Windows 10 | -virtio-win-0.1.149.iso drivers |
Benchmark
The SSD in test was dedicated to the client running Windows 10 either:
- as is: file=/dev/sda or
- a LVM partion was created: file=/dev/qemu_vol_1/win10_os
But always as VirtIO and RAW! We also used LVM to try to understand what the overhead would be when using another layer.
Benchmark tool
For our tests we used CrystalDiskMark 6.0.x. inside Windows 10.
Description of Cache Modes
Mode | Host Page Cache | Disk Write Cache | Notes |
none | disabled | enabled | balances performance and safety (better writes) |
directsync | disabled | disabled | safest but slowest (relative to the others) |
writethrough | enabled | disabled | balances performance and safety (better reads) |
writeback | enabled | enabled | fast, can loose data on power outage depending on hardware used |
unsafe | enabled | enabled | doesn’t flush data, fastest and unsafest |
- host doesn’t do caching
- guest disk cache is writeback
- Warning: like writeback, you can loose datas in case of a powerfailure you need to use barrier option in your linux guest fstab if kernel < 2.6.37 to avoid fs corruption in case of powerfailure
This mode causes qemu-kvm to interact with the disk image file or block device with O_DIRECT semantics, so the host page cache is bypassed and I/O happens directly between the qemu-kvm userspace buffers and the storage device. Because the actual storage device may report a write as completed when placed in its write queue only, the guest’s virtual storage adapter is informed that there is a writeback cache, so the guest would be expected to send down flush commands as needed to manage data integrity. Equivalent to direct access to your hosts’ disk, performance wise.
- host does use read cache
- guest disk cache mode is writethrough
Writethrough make a fsync for each write. So it’s the more secure cache mode, you can’t loose data. It’s also the slower. This mode causes qemu-kvm to interact with the disk image file or block device with O_DSYNC semantics, where writes are reported as completed only when the data has been committed to the storage device. The host page cache is used in what can be termed a writethrough caching mode. The guest’s virtual storage adapter is informed that there is no writeback cache, so the guest would not need to send down flush commands to manage data integrity. The storage behaves as if there is a writethrough cache.
- host doesn’t do caching
- guest disk cache mode is writethrough
- similar to writethrough, a fsync is made for each write
This mode causes qemu-kvm to interact with the disk image file or block device with both O_DSYNC and O_DIRECT semantics, where writes are reported as completed only when the data has been committed to the storage device, and when it is also desirable to bypass the host page cache. Like cache=writethrough, it is helpful to guests that do not send flushes when needed. It was the last cache mode added, completing the possible combinations of caching and direct access semantics.
- host do read/write cache
- guest disk cache mode is writeback
- Warning: you can loose datas in case of a powerfailure you need to use barrier option in your linux guest fstab if kernel < 2.6.37 to avoid fs corruption in case of powerfailure
This mode causes qemu-kvm to interact with the disk image file or block device with neither O_DSYNC nor O_DIRECT semantics,
so the host page cache is used and writes are reported to the guest as completed when placed in the host page cache, and the normal page cache management will handle commitment to the storage device. Additionally, the guest’s virtual storage adapter is informed of the writeback cache, so the guest would be expected to send down flush commands as needed to manage data integrity.
Analogous to a raid controller with RAM cache.
This mode is similar to the cache=writeback mode discussed above. The key aspect of this unsafe
mode, is that all flush commands from the guests are ignored. Using this mode implies that the user has accepted the trade-off of performance over risk of data loss in the event of a host failure. Useful, for example, during guest install, but not for production workloads.
Performance Implications of Cache Modes
The choice to make full use of the page cache, or to write through it, or to bypass it altogether can have dramatic performance implications. Other factors which influence disk performance include the capabilities of the actual storage system, what disk image format is used, the potential size of the page cache and the IO scheduler used. Additionally, not flushing the write cache increases performance, but with risk, as noted above. As a general rule, high end systems typically perform best with cache = none, because of the reduced data copying that occurs. The potential benefit of having multiple guests share the common host page cache, the ratio of reads to writes, and the use of aio = native (see below) should also be considered.
Benchmark Results
Read
wdt_ID | aio | cache | 4KiB Q8T8 | 4KiB Q32T1 | 4KiB Q1T1 | Seq Q32T1 | LVM |
---|---|---|---|---|---|---|---|
1 | native | none | 546,60 | 109,00 | 107,80 | 26,61 | no |
2 | threads | none | 1.087,20 | 94,62 | 142,20 | 22,16 | no |
3 | native | directsync | 0,00 | 0,00 | 0,00 | 0,00 | no |
4 | threads | directsync | 1.045,00 | 95,18 | 145,40 | 22,92 | no |
5 | native | writetrhough | 0,00 | 0,00 | 0,00 | 0,00 | no |
6 | threads | writethrough | 4.239,30 | 100,70 | 158,70 | 52,75 | no |
7 | native | writeback | 0,00 | 0,00 | 0,00 | 0,00 | no |
8 | threads | writeback | 3.270,90 | 76,17 | 128,10 | 40,70 | no |
9 | native | unsafe | 0,00 | 0,00 | 0,00 | 0,00 | no |
10 | threads | unsafe | 4.403,50 | 131,00 | 158,40 | 52,17 | no |
aio | cache | 4KiB Q8T8 | 4KiB Q32T1 | 4KiB Q1T1 | Seq Q32T1 | LVM |
Write
QEMU Disk IO Perfomance - Write
wdt_ID | aio | cache | Seq Q32T1 | 4Kib Q8T8 | 4KiB Q32T1 | 4KiB Q1T1 | LVM |
---|---|---|---|---|---|---|---|
1 | native | none | 527,50 | 104,00 | 113,40 | 46,05 | no |
2 | threads | none | 524,10 | 93,03 | 132,50 | 36,69 | no |
3 | native | directsync | 0,00 | 0,00 | 0,00 | 0,00 | no |
4 | threads | directsync | 303,40 | 16,35 | 15,84 | 2,81 | no |
5 | native | writethrough | 0,00 | 0,00 | 0,00 | 0,00 | no |
6 | threads | writethrough | 85,18 | 4,72 | 5,57 | 3,16 | no |
7 | native | writeback | 0,00 | 0,00 | 0,00 | 0,00 | no |
8 | threads | writeback | 3.395,80 | 77,56 | 1,26 | 40,33 | no |
9 | native | unsafe | 0,00 | 0,00 | 0,00 | 0,00 | no |
10 | threads | unsafe | 4.241,20 | 98,45 | 146,40 | 49,51 | no |
aio | cache | Seq Q32T1 | 4Kib Q8T8 | 4KiB Q32T1 | 4KiB Q1T1 | LVM |
LVM Performance
Using LVM actually didn’t add any significant overhead in the Drive IO performance and thus the tables show only benchmarks done without it.