Asynchronous, Out-of-Band Early Boot Setup of QEMU Guests
Gabriel L. Somlo
Last updated: Tue. Mar. 08, 2016
Feedback to:
somlo at cmu dot edu
I needed a method of communicating early-boot configuration data from host to guest, which is both:
- Asynchronous: No further coordination is required between host and guest beyond the host providing the data at the time it launches the guest VM (e.g., as part of the command line starting the guest process).
- Out-of-band: No (guest-)user-visible elements (e.g., guest floppy or CD-rom devices) should be commandeered as part of the data transfer process.
I considered some combination of the following options, before giving up and deciding to implement something new:
- Using QEMU's Guest Agent fails to satisfy the first requirement: the host would have to synchronize with the guest, ensuring the latter has successfully started its Guest Agent and is ready to receive configuration data.
- Using "-kernel" and "-append" QEMU command line options would only target Linux guests, and failed to satisfy the second requirement (being out-of-band, and refrain from taking over a standard guest configuration element).
- Using something similar to Cloud-init would also require passing data into the guest via a floppy or CD-rom image which the latter would have to mount and read as part of its boot-up process, once again breaking the out-of-band requirement.
Let's assume we want to boot a guest from a generic disk image, and have it start up with a "static" hostname and IP address (i.e., not using DHCP). The generic disk image is used with multiple guest VMs, so we can't simply hard-code the hostname and IP address in the image -- instead, we must pass in these parameters from the host when the VM is created, and there must be a way for the VM to retrieve this information and use it to edit its version of
/etc/hostname and
/etc/sysconfig/network-scripts/ifcfg-eth0. From the host, we need to pass to the guest the following two "environment variables":
HOSTNAME=foobar.example.com
IPADDR=192.168.133.7
NOTE: In reality we'd also need a netmask/prefix and default gateway, but here I'm simply trying to illustrate the data transfer mechanism.
On the guest, we have to somehow retrieve
$HOSTNAME and
$IPADDR and edit the relevant config files
before the guest-specific init system expects to use them. In the near future (likely,
qemu-2.6.0 and
linux-4.6), this could be accomplished by starting the QEMU guest like this:
cat > guestinfo.txt <<- EOT
HOSTNAME=foobar.example.com
IPADDR=192.168.133.7
EOT
qemu-system-x86_64 -machine pc,accel=kvm -m 2048 -hdd fedora_22.img \
-fw_cfg name=opt/GuestInfo,file=./guestinfo.txt
This is as close to asynchronous, "fire-and-forget" as one can get!
Next, from within the newly started guest:
# source the expected "environment variables" given to us by the host:
. /sys/firmware/qemu-fw-cfg/by_name/opt/GuestInfo/raw
# replace locally hard-coded config before init can find it:
echo $HOSTNAME > /etc/hostname
sed -i "s/^IPADDR.*$/IPADDR=$IPADDR/" /etc/sysconfig/network-scripts/ifcfg-eth0
The guest may do this whenever it wants to (or not at all). Either way, the host no longer has to worry about when (and whether, at all) the guest decides to retrieve this information. Also, the guest doesn't have to mount an obvious device such as the floppy or CD-rom: the snippet shown above can be invoked by a carefully placed script or service executed by the init system before any other services are started. From the perspective of the guest VM's users, this is as unobtrusive (out-of-band) as reasonably possible.
It turns out that the most popular QEMU virtual machine types (x86 and the various flavors of ARM) already use a Firmware Configuration device (a.k.a. fw_cfg) intended precisely for asynchronous host-guest data transfers. The fw_cfg device allows QEMU to prepare various data items (or "blobs") in host memory, and offers a guest-side interface allowing the guest to read (using port or memory-mapped I/O operations) the blobs after having been launched by the host.
Until recently, fw_cfg was used exclusively by the guest
firmware (e.g., the
BIOS). Blobs were inserted programmatically by QEMU, and read by hardcoded routines in the guest firmware, which then proceeded to install things such as
SMBIOS and
ACPI tables in guest memory, before proceeding to boot the guest kernel.
This work extends the use of fw_cfg to allow passing arbitrary data to the guest from the QEMU command line on the host machine:
- The guest-side fw_cfg interface (IOPort or MMIO register access) is documented here; The host-side (programmatic) fw_cfg API used by QEMU itself is shown here.
- Based on the above documentation, dumping the content of a named blob (e.g., "opt/GuestInfo") can be accomplished with this (WARNING: x86 specific!) sample program, using e.g. "./FwCfgDump opt/GuestInfo".
- The ability to insert arbitrary host-side files as fw_cfg "blobs" on the QEMU command line was officially introduced in QEMU v2.4.0.
- For linux guests, a much more elegant way to access all fw_cfg blobs (either programmatically inserted by QEMU or as user-provided files on the host-side command line) is available via a SysFS driver, which allows retrieval as illustrated in the example above, by simply accessing a file in /sys/firmware/.... This might ship with linux kernel v4.6, but I don't have a clear idea for how long the commits will "bake" in the driver-core testing branch. In the interim (and also for any current/older kernel which doesn't ship with a built-in fw_cfg/sysfs driver, an out-of-tree driver is available -- simply download, unpack, and run "make" to build a module against your current running kernel)!
- The Linux SysFS driver above can autodetect a fw_cfg device listed in either Device Tree (DT) on ARM, or in ACPI on either x86 or ARM guest machine types. QEMU should start advertising fw_cfg in ACPI beginning with (the upcoming, at the time of this writing) v2.6.0.
Right now, the main item left to solve is how to retrieve fw_cfg blobs from non-linux guest operating systems, specifically Windows. One could build the
FwCfgDump.c program (with e.g. CygWin) and use the resulting binary to retrieve host-provided early boot environment variables. I'd be interested in figuring out how to write a Windows driver for fw_cfg, although I'm not quite sure how the output would be presented (does Windows have anything at all similar to Linux SysFS?).
Please send comments and feedback to
somlo at cmu dot edu