aio=native or aio=threads – Intro

In this post we will compare our Disk IO perfonace given different QEMU architectures such as native or io-threads, and given different chache options. The system used in this test is the following:

CPU	Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz	-cpu max,kvm=off,check -smp 8,sockets=1,cores=4,threads=2 -enable-kvm
RAM	32 GB in system	-m 8 -mem-prealloc -mem-path /dev/hugepages
Disk	Samsung 850 EVO 500GB SSD	-deticated to client -device driver=virtio-scsi-pci -format=raw
Client OS	Windows 10	-virtio-win-0.1.149.iso drivers

Benchmark

The SSD in test was dedicated to the client running Windows 10 either:

as is: file=/dev/sda or
a LVM partion was created: file=/dev/qemu_vol_1/win10_os

But always as VirtIO and RAW! We also used LVM to try to understand what the overhead would be when using another layer.

Benchmark tool

For our tests we used CrystalDiskMark 6.0.x. inside Windows 10.

How to install Windows10 client?

Description of Cache Modes

Mode	Host Page Cache	Disk Write Cache	Notes
none	disabled	enabled	balances performance and safety (better writes)
directsync	disabled	disabled	safest but slowest (relative to the others)
writethrough	enabled	disabled	balances performance and safety (better reads)
writeback	enabled	enabled	fast, can loose data on power outage depending on hardware used
unsafe	enabled	enabled	doesn’t flush data, fastest and unsafest

cache=none

host doesn’t do caching
guest disk cache is writeback
Warning: like writeback, you can loose datas in case of a powerfailure you need to use barrier option in your linux guest fstab if kernel < 2.6.37 to avoid fs corruption in case of powerfailure

This mode causes qemu-kvm to interact with the disk image file or block device with O_DIRECT semantics, so the host page cache is bypassed and I/O happens directly between the qemu-kvm userspace buffers and the storage device. Because the actual storage device may report a write as completed when placed in its write queue only, the guest’s virtual storage adapter is informed that there is a writeback cache, so the guest would be expected to send down flush commands as needed to manage data integrity. Equivalent to direct access to your hosts’ disk, performance wise.

cache=writethrough

host does use read cache
guest disk cache mode is writethrough

Writethrough make a fsync for each write. So it’s the more secure cache mode, you can’t loose data. It’s also the slower. This mode causes qemu-kvm to interact with the disk image file or block device with O_DSYNC semantics, where writes are reported as completed only when the data has been committed to the storage device. The host page cache is used in what can be termed a writethrough caching mode. The guest’s virtual storage adapter is informed that there is no writeback cache, so the guest would not need to send down flush commands to manage data integrity. The storage behaves as if there is a writethrough cache.

cache=directsync

host doesn’t do caching
guest disk cache mode is writethrough
similar to writethrough, a fsync is made for each write

This mode causes qemu-kvm to interact with the disk image file or block device with both O_DSYNC and O_DIRECT semantics, where writes are reported as completed only when the data has been committed to the storage device, and when it is also desirable to bypass the host page cache. Like cache=writethrough, it is helpful to guests that do not send flushes when needed. It was the last cache mode added, completing the possible combinations of caching and direct access semantics.

cache=writeback

host do read/write cache
guest disk cache mode is writeback
Warning: you can loose datas in case of a powerfailure you need to use barrier option in your linux guest fstab if kernel < 2.6.37 to avoid fs corruption in case of powerfailure

This mode causes qemu-kvm to interact with the disk image file or block device with neither O_DSYNC nor O_DIRECT semantics,
so the host page cache is used and writes are reported to the guest as completed when placed in the host page cache, and the normal page cache management will handle commitment to the storage device. Additionally, the guest’s virtual storage adapter is informed of the writeback cache, so the guest would be expected to send down flush commands as needed to manage data integrity.
Analogous to a raid controller with RAM cache.

cache=unsafe

This mode is similar to the cache=writeback mode discussed above. The key aspect of this unsafe mode, is that all flush commands from the guests are ignored. Using this mode implies that the user has accepted the trade-off of performance over risk of data loss in the event of a host failure. Useful, for example, during guest install, but not for production workloads.

Performance Implications of Cache Modes

The choice to make full use of the page cache, or to write through it, or to bypass it altogether can have dramatic performance implications. Other factors which influence disk performance include the capabilities of the actual storage system, what disk image format is used, the potential size of the page cache and the IO scheduler used. Additionally, not flushing the write cache increases performance, but with risk, as noted above. As a general rule, high end systems typically perform best with cache = none, because of the reduced data copying that occurs. The potential benefit of having multiple guests share the common host page cache, the ratio of reads to writes, and the use of aio = native (see below) should also be considered.

Benchmark Results

Read

wdt_ID	aio	cache	4KiB Q8T8	4KiB Q32T1	4KiB Q1T1	Seq Q32T1	LVM
1	native	none	546,60	109,00	107,80	26,61	no
2	threads	none	1.087,20	94,62	142,20	22,16	no
3	native	directsync	0,00	0,00	0,00	0,00	no
4	threads	directsync	1.045,00	95,18	145,40	22,92	no
5	native	writetrhough	0,00	0,00	0,00	0,00	no
6	threads	writethrough	4.239,30	100,70	158,70	52,75	no
7	native	writeback	0,00	0,00	0,00	0,00	no
8	threads	writeback	3.270,90	76,17	128,10	40,70	no
9	native	unsafe	0,00	0,00	0,00	0,00	no
10	threads	unsafe	4.403,50	131,00	158,40	52,17	no
	aio	cache	4KiB Q8T8	4KiB Q32T1	4KiB Q1T1	Seq Q32T1	LVM

Write

QEMU Disk IO Perfomance - Write

wdt_ID	aio	cache	Seq Q32T1	4Kib Q8T8	4KiB Q32T1	4KiB Q1T1	LVM
1	native	none	527,50	104,00	113,40	46,05	no
2	threads	none	524,10	93,03	132,50	36,69	no
3	native	directsync	0,00	0,00	0,00	0,00	no
4	threads	directsync	303,40	16,35	15,84	2,81	no
5	native	writethrough	0,00	0,00	0,00	0,00	no
6	threads	writethrough	85,18	4,72	5,57	3,16	no
7	native	writeback	0,00	0,00	0,00	0,00	no
8	threads	writeback	3.395,80	77,56	1,26	40,33	no
9	native	unsafe	0,00	0,00	0,00	0,00	no
10	threads	unsafe	4.241,20	98,45	146,40	49,51	no
	aio	cache	Seq Q32T1	4Kib Q8T8	4KiB Q32T1	4KiB Q1T1	LVM

LVM Performance

Using LVM actually didn’t add any significant overhead in the Drive IO performance and thus the tables show only benchmarks done without it.

References and further reading

Proxmox – Performance tweaks
Suse – Description of Cache Modes
Keep a limit on it IO Throttling in QEMU (ppt,pdf)
QEMU Disk IO – Which performs better (online ppt)
Multithread device emultion in QEMU (ppt, pdf)
LatencyMon

Post Views: 8,327

#kvm #linux #qemu #windows10

8.9K

Guides

How to install Ubuntu 22.04 on Surface Book 2

July 10, 2023

1.6K

Guides

QEMU command-line arguments

June 7, 2020

1.8K

Guides

One NIC, multiple IPs, two gateways with systemd-networkd

May 3, 2020

Search

Editors

QEMU Disk IO performance comparison: Native or threads?

aio=native or aio=threads – Intro

Benchmark

Benchmark tool

Description of Cache Modes

cache=none

cache=writethrough

cache=directsync

cache=writeback

cache=unsafe

Performance Implications of Cache Modes

Benchmark Results

Read

Write

QEMU Disk IO Perfomance - Write

LVM Performance

References and further reading

You might also like

Share