Running Mac OS X as a QEMU/KVM Guest

Gabriel L. Somlo

Last updated: Fri. Dec. 19, 2013
Feedback to: somlo at cmu dot edu

0. I Just Want It Working, Right Now !

OK, here's what you'll need (or skip to the technical details instead):

KVM: Patched version of the current KVM git master:
```
	mkdir -p /home/$(whoami)/OSXGUEST;
	cd /home/$(whoami)/OSXGUEST
	wget http://www.contrib.andrew.cmu.edu/~somlo/OSXKVM/kvm-mwait-nop-20131213.patch
	git clone git://git.kernel.org/pub/scm/virt/kvm/kvm.git
	(cd kvm; patch -p1 < ../kvm-mwait-nop-20131213.patch)
```
To use the standard kernel shipping with your desktop distro (e.g., at the time of this writing I use Fedora 18 with kernel version 3.11.10-100.fc18), additionally get the kvm-kmod "wrapper":
```
	git clone git://git.kiszka.org/kvm-kmod.git
	cd kvm-kmod
	./configure
	make LINUX=../kvm clean sync all
```
Notice that I'm deviating slightly from Jan Kiszka's original instructions. Rather than checking out kvm as a submodule of kvm-kmod, I keep the two as separate stand-alone repositories, and simply point to kvm from within kvm-kmod during build (i.e., by adding "LINUX=../kvm" to the make command line. That way, I can stay on the bona-fide master branch of kvm, rather than being stuck with a detached HEAD matching some old(er) commit from the kvm tree.
Then, as root, while still in the kvm-kmod directory (substitute kvm_amd for kvm_intel, depending on your CPU):
```
	cp ./x86/kvm*.ko /lib/modules/<your-kernel>/kernel/arch/x86/kvm/
	modprobe -r kvm_intel
	modprobe kvm_intel
```
Any hacking and modification of the KVM source will happen under the kvm repo. After making changes, re-run
```
	make clean sync all
```
from under the kvm-kmod folder, and don't forget to copy and reload the kvm*.ko modules as shown above.

QEMU: Patched version of the current QEMU git master. Configure and build using something like:

	cd /home/$(whoami)/OSXGUEST
	git clone git://git.qemu.org/qemu.git;
	wget http://www.contrib.andrew.cmu.edu/~somlo/OSXKVM/qemu-osx-20131110.patch
	cd qemu
	patch -p1 < ../qemu-osx-20131110.patch
	./configure --prefix=/home/$(whoami)/OSXGUEST --target-list=x86_64-softmmu
	make clean; make; make install

SeaBIOS: Patched version of the current SeaBIOS git master. Configure and build using something like:

	cd /home/$(whoami)/OSXGUEST
	git clone git://git.seabios.org/seabios.git
	wget http://www.contrib.andrew.cmu.edu/~somlo/OSXKVM/seabios-osx-20131110.patch
	cd seabios
	patch -p1 < ../seabios-osx-20131110.patch
	make
	cp out/bios.bin /home/$(whoami)/OSXGUEST/share/qemu/bios-mac.bin

Chameleon: an additional bootloader is currently needed to bridge the gap between SeaBIOS and the Apple EFI BIOS expected by Mac OS X. Source is available from the project's SVN repo, but since it requires Mac OS X and Xcode to build, I've uploaded a binary image which you may use until you're ready to build your own.

Once the above components are in place, you'll need a HDD image for your Mac OS X guest:

	cd /home/$(whoami)/OSXGUEST
	qemu-img create -f qcow2 mac_hdd.img 30G

To start your Mac OS X guest in QEMU, use the following command line:

	bin/qemu-system-x86_64 -enable-kvm -m 2048 -cpu core2duo \
	  -usb -device usb-kbd -device usb-mouse \
	  -bios bios-mac.bin -kernel ./chameleon_2.0_boot \
	  -device isa-applesmc,osk="insert-real-64-byte-OSK-string-here" \
	  -device ahci,id=ide \
	  -device ide-drive,bus=ide.2,drive=MacHDD \
	  -drive id=MacHDD,if=none,file=./mac_hdd.img \
	  -monitor stdio

When running the OS X guest for the first time, you'll need to install the operating system to the HDD image, so you'll need to add (and boot from) an install DVD image on the command line.

	  -device ide-drive,bus=ide.0,drive=MacDVD \
	  -drive id=MacDVD,if=none,snapshot=on,file=../DVD/SnowLeopard.10.6.iso

I obtained mine from the original retail SnowLeopard DVD release (version 1.6, a.k.a. 1.6.0), using the following command:

	dd if=/dev/dvd of=../DVD/SnowLeopard.10.6.iso

Another optional command line parameter could be used to start an SMP guest, such as:

	  -smp 8,cores=4

There are still a few remaining issues to be worked out:

You will need to supply the correct 64-character string as the argument of the osk="..." command line parameter above. This string is the concatenated result of two 32-bit keys, OSK0 and OSK1, which can be read from the AppleSMC chip shipping on genuine Apple machines. I wrote a small userspace program based on the Linux kernel applesmc driver, SmcDumpHw.c, which can be used (on a Mac running Linux) to query the SMC for various key values. For the full details, please see here.
Currently, only SnowLeopard (10.6) works. Leopard (10.5) used to work with QEMU v.0.10.6, but is no longer supported. Work is in progress to get (Mountain)Lion (10.7 and 10.8) to boot in QEMU/KVM.
Ethernet "link" on the virtual network card does not come up automatically, as it should. If you used the command line above, you may be able to force link negotiation from the QEMU monitor command line:
```
	set_link eth0 off
	set_link eth0 on
```
The main goal of this work is to get Mac OS X working with upstream KVM/QEMU/etc., so I'm mainly targeting these projects' git master branches. The patches I refer to may work against recently released versions, but please consider that as nothing more than a happy coincidence :)

Legal Disclaimer: I am not a lawyer, and this is not legal advice. Taking into account that OS X is now officially supported on commercial virtualization solutions such as VMWare Fusion and Parallels, and after a careful reading of Apple's OS X EULA (which states that "[...] you are granted a [...] license to install, use and run one (1) copy of the Apple Software on a single Apple-Branded computer at any one time"), it is my belief that it's OK to run Mac OS X as a QEMU guest, provided the host hardware is a genuine, Apple-manufactured Mac computer, running an arbitrary (e.g. Linux) host OS. This happens to be how I'm using it, but YMMV.

Please read on for a detailed overview of all the issues involved !

1. Intro, Motivation, and Overview

This work started out as my OS Practicum project during the Fall of 2012, where I picked Mac OS X on QEMU/KVM as my topic. I'm a "native" Linux user, but my IT department supports quite a few Mac OS X users, and I decided I could be much faster and more productive if I could access a Mac as a VM guest rather than dual-boot or cold-start an additional machine each time it was needed.

The original work to enable OS X to run under QEMU/KVM was done by Alexander Graf. Additional kernel and userspace patches were contributed by René Rebe. Previously available patches and documentation I reviewed include:

René's patches at T2
git://gitorious.org/~drizztbsd/kvm/qemu-kvm-osx.git
http://github.com/saucelabs/mac-osx-on-kvm.git
The by now really out of date Wiki instructions at
http://d4wiki.goddamm.it/index.php?title=Howto:_Mac_OSX_on_KVM

Several different components must successfully interact with each other to facilitate running Mac OS X as a QEMU/KVM guest. Each component is listed below, with a link to its dedicated section covering relevant aspects and outstanding issues related to Mac OS X guest support:

KVM Kernel Module: at the lowest level, this component provides an in-kernel hardware assisted virtualization infrastructure.
QEMU: userspace emulator which takes advantage of the hardware virtualization facilities provided by KVM to run a guest at nearly native hardware performance.
SeaBIOS: default BIOS used with QEMU.
Chameleon: a bootloader used to bridge the gap between SeaBIOS and the EFI-compatible BIOS shipping with Apple hardware.

2. KVM Kernel Module

KVM is the component located at the lowest level, providing an interface to the virtualization support offered by (among a few others) the x86_64 hardware architecture. Let's begin with a brief list of upstream resources:

Project home page: http://www.linux-kvm.org/
Master git repo: git://git.kernel.org/pub/scm/virt/kvm/kvm.git
This is actually a full clone of the mainline kernel git repo, with KVM-specific commits added on top. Therefore, it is slightly less convenient to work with, although any attempts at commiting changes upstream must apply cleanly (and work correctly) against it.
Kmod "wrapper": git://git.kiszka.org/kvm-kmod.git
This allows building KVM kernel modules against (and loadable into) a distro kernel (such as what ships on e.g., F18), as shown here.

Two outstanding issues require out-of-tree patching of the KVM module. The first one is known as "ioapic-polarity", and the second one is related to the use of the MONITOR/MWAIT pair of CPU instructions by Mac OS X.

2.1. The "ioapic-polarity" patch

This patch is a misleadingly simple-looking one liner, which removes a single line from virt/kvm/ioapic.c (or x86/ioapic.c in the kvm-kmod version):

-	irq_level ^= entry.fields.polarity;

Without removing this line, a Mac OS X guest hangs during boot with an error message that reads: "still waiting for root device".

According to Alex Graf, this patch was submitted to (and rejected by) mainline KVM several years ago. There seems to be some inconsistency between Mac OS X and the QEMU/SeaBIOS ACPI DSDT about whether PCI APIC lines should be active-high or active-low (active-low being the way things work in practice). The patch ignores the distinction by always issuing an interrupt whenever the line status changes.

Understanding the source of the discrepancy, and fixing the problem (most likely somewhere in SeaBIOS/ACPI and/or Chameleon) is one of the major outstanding issues preventing out-of-the-box Mac OS X guest support in QEMU/KVM.

2.2. MONITOR/MWAIT instruction support

The MONITOR and MWAIT instructions have become a very popular method for modern operating systems to implement Idle Threads in an energy-efficient way. These instructions started shipping on x86_64 chips (Intel and AMD) in the 2006 time frame. According to specs, the OS must check for their availability via CPUID before utilizing them, to avoid encountering an Invalid Opcode exception.

The problem is that, on one hand, KVM does not support MONITOR and MWAIT, issuing Invalid Opcode exceptions to any guest attempting to execute such an instruction. On the other hand, Mac OS X will invoke the instructions from its default idle thread (AppleIntelCPUPowerManagement.kext) indiscriminately, without checking CPUID. My hypothesis is that since any and all Intel-based Mac computers ever shipped came with CPUs that supported MONITOR and MWAIT, OS developers at Apple simply assume the instructions are always present on any hardware configuration they could possibly care to support.

As explained in more detail in the specs and in my Idle Thread slides, MONITOR will "arm" the CPU monitoring hardware to watch for writes to a given memory location; a write to that location will "trigger" the monitoring hardware; and MWAIT will put the CPU core to sleep if the monitoring hardware is armed, until such time that a write, interrupt, or some other event triggers it. If unarmed, MWAIT simply behaves as a NOP.

Architecturally, both MONITOR and MWAIT are equivalent to a NOP instruction, and the monitoring hardware is invisible to the CPU programmer. MWAIT can be seen as a prolonged, low-power NOP instruction whose duration is determined by the above-mentioned "invisible" state in the CPU silicon. Additionally, the CPU programmer must expect spurious wakeups (MWAIT may wake before it's "supposed to"), and therefore use the instruction from within a polling loop.

The remainder of this section discusses a variety of ways in which KVM can offer sufficient support for MONITOR and MWAIT to enable Mac OS X guest execution.

2.2.1. Allow MONITOR/MWAIT to be executed in guest mode.

Currently, KVM configures the physical CPU in a way that causes it to trap out of guest mode back into host mode (a.k.a. perform a VM exit) when one of a list of privileged instructions (e.g. HLT, but also MONITOR and MWAIT) is encountered. One of the early patches I encountered simply removed MONITOR and MWAIT from that list, preventing a VM exit upon their encounter during guest (a.k.a. VMX-non-root or L>0) mode.

According to the spec (see V3, S25-3, pp25-8), under certain conditions (which happen to be met by the OS X idle thread), guest-mode MWAIT will always default to being treated as a NOP, never entering a low-power sleep state. This causes each guest VCPU to always utilize 100% of a host core, regardless of the actual level of guest activity.

2.2.2. Emulate MONITOR/MWAIT as NOP

As mentioned above, KVM currently requests that, among other instructions, MONITOR and MWAIT generate a VM exit and be handled in host (rather than guest) mode. The current handler for both instructions generates an Invalid Opcode exception, a behavior consistent with the non-support for the instructions advertised via CPUID.

My current stable patch against KVM replaces the handler for MONITOR and MWAIT with one that emulates the NOP instruction (i.e., skips the current MONITOR or MWAIT and re-enters VM guest mode execution from the immediately following instruction). As before, these are short, non-power-saving NOP instructions, and therefore each VCPU will utilize 100% of the available cycles of a physical core on the host.

As a workaround, it is possible to force Mac OS X to revert to a HLT-based idle thread by removing the default MONITOR/MWAIT one:

	sudo rm /System/Library/Extensions/AppleIntelCPUPowerManagement.kext

This reduces host core utilization to single digits during guest idle times, since the guest VCPUs are removed from scheduling and execution on the host while halted.

Due to its simplicity and relative cleanliness, this combined approach may be a viable long term solution: NOP-based MONITOR/MWAIT emulation will allow booting Mac OS X from factory-default install media. Once installed, the guest may be "optimized" for power consumption and host CPU utilization by forcing it to fall back to a HLT-based idle thread.

2.2.3. Emulate MWAIT as HLT

Assuming the requirement to run a completely unmodified OS X guest install, we must support the default MONITOR/MWAIT idle thread in production, and attempt to alleviate host CPU utilization without the option of falling back to the HLT-based version. An interesting observation is that, on single-processor systems, MWAIT behaves very much like HLT: there is no other (V)CPU to trigger the monitoring hardware, and therefore MWAIT will only wake when a hardware interrupt is asserted.

If we attempted to emulate MWAIT as (something similar to) HLT, while continuing to treat MONITOR as a NOP, we might be able to reduce host CPU utilization at the price of having the MWAIT-based idle thread be somewhat "sluggish" (waking up "late", on hardware interrupt, as opposed to "on time" when another VCPU writes to the monitored memory location). This may have a negative impact on e.g. the real-time performance of the OS X guest, but considering we're already running under virtualization, you can't lose what you ain't never had :)

We'd have to pay attention to the fact that the OS X idle thread runs with interrupts disabled (RFLAGS.IF=0), but sets %ecx=1 to make MWAIT wake on interrupt regardless. This experimental patch implements MWAIT as an always-interruptible (regardless of RFLAGS.IF) version of HLT. The patch works well enough on a single-processor guest, reducing host CPU utilization during guest idle to about 15%. However, when booting on an SMP guest, OS X crashes with an "HPET not found" panic, which could indicate any number of problems:

imperfect HPET emulation under SMP
not all VCPUs are guaranteed to receive hardware (e.g., timer) interrupts that would wake them from MWAIT
more ACPI/SeaBIOS/Chameleon bugs

These issues are currently under investigation, so please stay tuned...

2.2.4. Emulate the monitoring hardware

Although the spec strongly warns against assuming any connection between the size of the memory chunk being monitored and the size of a cache line, it is obvious that MONITOR and MWAIT are implemented on top of the processor's cache coherence protocol (e.g., MESI). A plausible approach could work like this:

MONITOR counts as a memory access, ensuring that the monitored memory area counts as a valid (i.e., M, E, or S state) cache entry on the respective CPU core, in addition to setting the "armed" flag.
a write to the monitored memory area from another CPU core will cause everyone else's (including the MONITOR-ing core's) corresponding cache line to be invalidated (I state). When a monitored cache line is invalidated, the "armed" flag is also turned off (i.e., the monitoring hardware is "trigerred" or "disarmed").
MWAIT acts as NOP if finds the monitor is "disarmed"; otherwise, it enters a C-state and waits for a triggering write, or interrupt, etc.

A relatively straightforward way to emulate MONITOR and MWAIT in KVM would be to utilize the virtual MMU module to write-protect MONITOR-ed memory areas, and handle the subsequent write faults (by emulating the actual guest write from within the host, and updating the state of the emulated monitoring hardware accordingly). This approach has one major caveat: memory monitoring necessarily happens at page-level granularity, which is typically significantly larger than the the extent of a (few) cache line(s) typically used on real hardware. It is true that the exact extent of the monitored memory area is advertised via CPUID, but OS X is already known to have a poor track record of honoring CPUID.

This new experimental patch implements MONITOR/MWAIT by emulating the monitoring hardware on top of the KVM MMU, as described above. As an optimization step (to avoid a TLB shootdown each time the write-protection on a monitored page is switched on or off), the patch is implemented as folows:

we assume a (very) limited number of MONITOR-ed memory locations is used by the guest (Mac OS X only utilizes one such location, which is shared by all instances of its idle thread).
only the first MONITOR on a given page causes it to be write-protected.
write operations cause a fault and are handled (i.e., emulated) in host mode, but do not switch the page back to being writable; instead, they set an "recently accessed" flag for the page.
periodically, a cleanup pass will relinguish stale monitored pages that have not had their "recently accessed" flag set since the previous pass. This step is not yet implemented in the current version of the patch.

Similarly to the MWAIT as HLT method, this patch only works reliably on single-VCPU guests. The patch sometimes works with '-smp 2,cores=2' (about 30% of the time) as shown in this screenshot. Some other times, the emulated disk controller (AHCI) hangs. Other times, as well as with any attempt at SMP higher than 2, we get the dreaded HPET panic. I'm currently looking for ways to first explain, then debug this behavior.

3. QEMU

QEMU is a multi-architecture emulator running in user-space. For certain architectures (such as x86_64), QEMU is able to take advantage of hardware virtualization features offered through e.g. KVM, which allow it to run guests at near-native performance levels. Here is roughly how QEMU and KVM work together to implement a guest VM:

QEMU starts as a user-mode process, launching one thread for each VCPU that will be part of the guest VM. The guest system's "physical" memory is allocated as part of the virtual address space of the QEMU process, and various handlers for the emulated virtual hardware of the guest systems are prepared.
Each QEMU VCPU thread makes an ioctl() call into the kernel (where it will be serviced by KVM), requesting that the VCPU be scheduled on a pyhsical core. This ioctl() call will only return to userspace if/when KVM encounters a VM exit it can't handle itself. This typically involves a need for userspace emulation of specific guest hardware. Normally, when the userspace emulation is complete, the QEMU VCPU thread loops back to the spot where it calls into the kernel via the ioctl().
KVM handles the kernel-side of the ioctl() call made by each QEMU VCPU thread. Normally, it causes the physical core on which the thread is scheduled to enter VM guest mode (a.k.a. L>1). Whenever a VM exit (back to host mode) occurs, KVM attempts to handle the exit cause itself, and immediately re-enters VM guest mode if successful. Otherwise, the VM exit reason is passed back to userspace by falling out of the ioctl() call, at which point QEMU must handle it as described above, before calling back into KVM again via the ioctl().

A quick list of QEMU upstream resources might include:

Project home page: http://qemu.org/
Master git repo: git://git.qemu.org/qemu.git

There are four relevant topics regarding support for Mac OS X guests: QEMU's recent inclusion of support for the Q35/ICH9 based architecture; emulation of Apple's System Management Controller (a.k.a. AppleSMC), including automatically advertising its presence via ACPI; and, finally, a few issues related to QEMU's emulation of the e1000 network controller.

3.1. The Q35/ICH9 architecture

Support for the Q35 architecture was recently (Dec. 2012) merged into QEMU mainline. Q35 replaces the old I440FX with Intel's more modern ICH9 chipset, which also happens to be used on most Intel-based Apple hardware. Among other hardware, ICH9 includes an integrated AHCI disk controller, which had to be added explicitly on the pre-Q35 QEMU command line:

	bin/qemu-system-x86_64 -enable-kvm -m 2048 -cpu core2duo \
	  -usb -device usb-kbd -device usb-mouse \
	  -bios bios-mac.bin -kernel ./chameleon_2.0_boot \
	  -device isa-applesmc,osk="insert-real-64-byte-OSK-string-here" \
	  -M q35 \
	  -device ide-drive,bus=ide.2,drive=MacHDD \
	  -drive id=MacHDD,if=none,file=./mac_hdd.img \
	  -monitor stdio

As Q35 is slated to become the new default "machine type" in QEMU in the near future, the bulk of the effort (development, debugging, and testing) to get Mac OS X supported under QEMU will be focused on this platform.

While the Q35 command line boots an up-to-date SnowLeopard (10.6.8 or later) without problems, it hangs (most likely due to some USB/uhci/ehci related disagreements) on earlier 10.8.* versions, which unfortunately includes the retail install image. Recent combinations of QEMU+SeaBIOS sometimes manage to work with the install disk, but only on an SMP guest. For now, installing a new OS X guest from scratch requires the pre-Q35 command line.

3.2. The AppleSMC emulator

The AppleSMC (or System Management Controller) is a chip specific to Intel-based computers manufactured by Apple. Its main purpose is to control (and report on) fan speeds, temperature sensors, screen and keyboard light intensity levels, and miscellaneous other power management features.

From the point of view of the operating system driver, interaction with the SMC happens via port-based I/O: The name of a 4-character key is written to a control port, and the key's numeric or ASCII value is then read from (or written to) a data port. Keys typically represent fan speeds, light intensity levels, or temperatures.

There are currently three outstanding issues with QEMU's AppleSMC emulation which could improve Mac OS X guest support, outlined below.

3.2.1. Automatic OSK "pass-through" on Apple hardware

The AppleSMC is also used to store a 64-byte ASCII string copyrighted by Apple, spread across two 32-byte key values, named OSK0 and OSK1. This string is used by Mac OS X to determine whether it's being booted on genuine Apple hardware. QEMU does not set up AppleSMC emulation by default (since only OS X guests require it at this time). To set it up, the following QEMU command line snippet is required:

	-device isa-applesmc,osk="insert-real-64-byte-OSK-string-here"

The user is required to supply the correct value of the 64-byte OSK string as an argument, and responsible for honoring Apple's OS X EULA (which states that "[...] you are granted a [...] license to instal, use and run one (1) copy of the Apple Software on a single Apple-Branded computer at any one time").

I wrote a small C program, SmcDumpHw.c, which can be used to read various SMC key values (including OSK0 and OSK1) from an Intel Mac running Linux. However, a significant improvement in usability and ease of compliance with the OS X EULA could be accomplished by allowing QEMU's AppleSMC emulator to automatically acquire the OSK strings from the underlying (Apple) host hardware.

Currently, the drivers/hwmon/applesmc.c Linux driver populates a Sysfs directory (/sys/devices/platform/applesmc.768/) which offers access to most SMC key values. Unfortunately, that does not include OSK0 and OSK1. I submitted this patch against the applesmc.c Linux driver, but encountered two main objections (also see the various other replies in the referenced thread):

/sys/devices/platform/applesmc.768/ is (or should be) reserved for hardware-monitoring related keys and values only; This point seems to be contradicted by Documentation/sysfs-rules.txt in the Linux kernel sources, which states that each device should get its own node (directory) in a device tree, and does not recommend spreading any device's entries into separate spots across Sysfs by any sort of "category".
The OSK values are constant, so it makes no sense to query the hardware if we know ahead of time what the returned value will be. My counter argument to that is that it makes perfect sense to query the hardware each time, precisely because Apple claims copyright on the returned string, which can therefore never be legally hardcoded (and distributed) in any open source project such as QEMU.

I'm planning to follow up with the Linux maintainer of applesmc.c again in the near future to reiterate these points, but in the mean time I could really use some advice and feedback from anyone with successful patch submission experience to the Linux kernel :)

3.2.2. OS X fails to write a key value to the emulated SMC

During boot, Mac OS X 10.6 (Snow Leopard, the only version currently working under QEMU) logs a few non-fatal SMC-related errors:

	SMC::smcGetVersWithSMC ERROR: smcReadKey REV failed (0xff)
	SMC::smcInitHelper ERROR: smcPublishVersion failed (0xff)
	SMC::smcInitHelper ERROR: smcPublishShutdownCause failed (0xff)
	SMC::smcPublisVersion ERROR: smcGetVers for SMC 0 failed (0xff)
	SMC::smcNotificationPublishedHandler ERROR: smcWriteKey NTOK failed (0xff), will not receive interrupts

It appears that emulating a few extra SMC keys, as well as allowing the OS X guest to write (as opposed to just read) some of the supported keys might make these errors go away.

3.3. The e1000 virtual network card

By default, QEMU uses the emulated e1000 network controller when starting a new guest, unless a different model is explicitly requested. Older versions of QEMU defaulted to the rtl8139 controller. OS X 10.6 (Snow Leopard) can use either model, but as a QEMU guest was unable to recognize the e1000 on older versions of QEMU, and is unable to recognize the rtl8139 on current QEMU git master. Since the e1000 is a more modern controller (as well as the new default), I decided not to spend any time figuring out why the rtl8139 isn't recognized, and get e1000 to work properly instead.

An already committed patch fixes the failure of OS X to configure the default MAC address (reporting "00:00:00:00:00:00" instead). OS X most likely expects the Apple EFI BIOS to initialize the network controller to the point where its capable of listening for packets targeted at its own factory-default MAC address. Most other operating systems will take care of all this from within their respective drivers, but for now OS X needs QEMU to take care of this part on its behalf.

Even once OS X can detect the e1000 network card with the correct MAC address, we are left with a failure to negotiate a virtual Ethernet "link" once the guest completes its boot sequence. The third (e1000.c) part of my current QEMU patch fixes that problem, but whether it's acceptable for upstream or not is currently under debate. The alternative would be to teach SeaBIOS enough about the e1000 network card and have it perform the initialization steps that Apples UEFI BIOS performs on behalf of OS X "in the wild". If this plan of attack turns out to be workable, the MAC address fix described in the above paragraph (QEMU commit 372254c6e5c078fb13b236bb648d2b9b2b0c70f1) should also be reverted, and its functionality added to SeaBIOS.

3.4. Boot OS X on QEMU without KVM hardware assistance

This part isn't truly necessary for supporting production OS X guests expected to do real work. It would however be interesting for debugging purposes to be able to run OS X in fully emulated mode, without any KVM hardware assistance.

An initial attempt at running OS X on mainline QEMU without KVM resulted in a kernel panic from Darwin:

panic(cpu 0 caller 0xffffff80002d1a80): "Local APIC version 0x11, 0x14 or more
expected\n"@/SourceCache/xnu/xnu-145.1.25/osfmk/i386/lapic.c:215

Once the APIC version was changed from 0x11 to 0x14, a subsequent attempt at running OS X in emulation-only mode resulted in another kernel panic:

panic(cpu 0 caller 0xffffff80002cd439): "commpage no match for last, next address
ffff1000"@/SourceCache/xnu/xnu-1456.1.25/osfmk/i386/commpage/commpage.c:324

when using "-cpu core2duo". When using "-cpu coreduo", the boot process makes it a bit further along, but then gets stuck with this error:

...
MAC Framework successfully initialized
using 10485 buffer headers and 4096 cluster IO buffer headers
IOAPIC: Version 0x11 Vectors 64:87
ACPI: System State [S0 S3 S4 S5] (S3)
RTC: Only single RAM bank (128 byetes)
mbinit: done (64 MB memory set for mbuf pool)
From path: "uuid",
Waiting for boot volume with UUID 038F2F32-EFF9-3A63-A30E-1D8A610BCB42
Waiting on <dict ID="0"><key>IOProviderClass<key><string ID="1">IOResources<string>
<key>IOResourceMatch<key><string ID="2">boot-uuid-media<string><dict>
com.apple.AppleFSCompressionTypeZlib load succeeded
AppleIntelCPUPowerManagementClient: ready

This happens on both mainline and on the q35 tree.

More digging might reveal further interesting information...

3.5. A brief intro to ACPI BIOS hackery

Recently, the QEMU project took over responsibility for generating and exporting ACPI tables from the SeaBIOS project. This is an improvement, since QEMU is responsible for providing the "hardware" for guests, and as such, it is in the best position to provide a hardware specification by way of ACPI tables. ACPI (Advanced Configuration and Power Interface) is an open standard for how modern operating systems should implement device configuration and power management. Its specification defines (among many other things) a set of tables which provide an interface between compliant operating systems and system firmware (such as the DSDT), and a language (ASL) in which so-called "device nodes" are defined in these tables. A controversial feature of ACPI is the requirement that the operating system execute the (externally provided, from its point of view) bytecode compiled into the BIOS from ASL source with full privileges.

The DSDT (Differentiated System Description Table) supplies critical information about the various hardware devices which comprise the base system. We will focus on the entries (or "nodes") of two such devices: the AppleSMC and the HPET. ASL source code for the DSDT of a system running Linux can be extracted using the following steps:

	acpidump > acpidump.out
	acpixtract acpidump.out
	iasl -d DSDT.dat

The last step generates a file named DSDT.dsl, which contains ASL source code. Let's examine the entry corresponding to the SMC:

	Device (SMC) {
	    Name (_HID, EisaId ("APP0001"))
	    Name (_STA, 0x0B)
	    Name (_CRS, ResourceTemplate () {
		IO (Decode16, 0x0300, 0x0300, 0x01, 0x20)
		IRQNoFlags () {6}
	    })
	}

Section 6 of the ACPI spec explains the meaning of each object:

_HID: the device's PnP hardware ID
_STA: current device status as 32-bit integer, encoding the following bitmap:
- Bit 0: present
- Bit 1: enabled
- Bit 2: show device in the (ACPI/BIOS) UI
- Bit 3: functional (passes diagnostics)
- Bit 4: battery present
- Bit 5-31: reserved (must be cleared)
The value 0x0B means that the SMC is present, enabled, and functional (but does not need to be "shown in the UI", and does not have a battery).
_CRS: stands for "current resource settings", and returns a buffer containing a description of the current system resources allocated to the device. In our SMC example above, we have reserved I/O ports starting at 0x300, aligned at 1-byte boundaries, spanning 32 bytes (i.e. 0x20). The device also uses active-high, edge-triggered IRQ line #6. (see pages 310 and 767 of the ACPI spec).

Objects are often specified as constants (e.g., our SMC._STA() always returns a hardcoded value of 0xB, and no computation is performed to actually probe the device). In that case they may be declared as "Named Objects", as shown in the example above. Should we require an actual computation, we would have to declare the relevant object as a "Method". For example, the expression "Name (_STA, 0x0B)" is equivalent to:

	Method (_STA, 0) {
	    Return (0x0B)
	}

This latter form enables the operating system to find out at boot time whether or not a functional SMC device is included with the underlying (virtual) machine.

3.5.1. Mac OS X and the SMC DSDT node

My current QEMU patch adds an SMC node to the DSDT with a hardcoded constant _STA value of 0x0B (very similar to the one found on real hardware). This makes it unacceptable for upstream, because most often QEMU will be started without the "-device isa-applesmc" command line option, and the resulting guest VM will therefore lack a present/enabled/functional SMC. Since QEMU is now entirely responsible for generating and manipulating its guests' ACPI tables, there are two plausible options:

Permanently include an SMC node in the DSDT, but dynamically patch the value associated with _STA during guest initialization (i.e. ensure it's 0x00 unless "-device isa-applesmc" was given on the command line, in which case patch it to be 0x0B.
Do not include an SMC node in the DSDT, and, if "-device isa-applesmc" was given on the command line, dynamically "paste" its entire AML blob into the DSDT before starting the guest.

In either case, the SMC node must be added to the DSDT (i.e, not in some other table like the SSDT, as has been suggested). OS X is sensitive to that, and refuses to start if the SMC is not included specifically in the DSDT.

3.5.2. Mac OS X and the HPET DSDT node

I have submitted an HPET DSDT patch to enable QEMU+SeaBIOS to boot Mac OS X guests. The line:

	    IRQNoFlags() {2, 8}

in the _CRS method is needed for booting an SMP (multi-VCPU) OS X guest. However, it seems to cause trouble for Windows XP, and has since been partially reverted. A solution that addresses all concerns could be to conditionally add the IRQNoFlags line to _CRS only when running an OS X guest, which we may assume is strongly correlated to the presence of an SMC node (or to the success of the SMC._STA() method). One example bases the conditional inclusion of the IRQ resource on the presence of the SMC node:

	Name(RES_MIO, ResourceTemplate() {	/* MMIO resource */
	    Memory32Fixed(ReadOnly, 0xFED00000, 0x00000400)
	})
	Name(RES_IRQ, ResourceTemplate() {	/* IRQ resource */
	    IRQNoFlags() {2, 8}
	})
	Method(_CRS, 0) {
	    If (CondRefOf(\_SB.PCI0.ISA.SMC, Local0)) {
		/* AppleSMC present, include IRQ resource */
		ConcatenateResTemplate(RES_MIO, RES_IRQ, Local1)
		Return (Local1)
	    } else {
		/* AppleSMC not present, omit IRQ resource */
		Return (RES_MIO)
	    }
	}

Another example uses the result of the SMC._STA() method, assuming a SMC node is always present in the DSDT:

	Name(RES_MIO, ResourceTemplate() {	/* MMIO resource */
	    Memory32Fixed(ReadOnly, 0xFED00000, 0x00000400)
	})
	Name(RES_IRQ, ResourceTemplate() {	/* IRQ resource */
	    IRQNoFlags() {2, 8}
	})
	Method(_CRS, 0) {
	    Store(\_SB.PCI0.ISA.SMC._STA(), Local0)
	    If (LEqual(Local0, 0x0B)) {
		/* AppleSMC present, include IRQ resource */
		ConcatenateResTemplate(RES_MIO, RES_IRQ, Local1)
		Return (Local1)
	    } else {
		/* AppleSMC not present, omit IRQ resource */
		Return (RES_MIO)
	    }
	}

Of course, writing an even better HPET DSDT entry (one that works with both XP and OS X, in addition to Linux, other Windows versions, BSDs, etc., without conditional hacks) would be even better, so stay tuned, work is still in progress on this one.

4. SeaBIOS

SeaBIOS is the default BIOS for QEMU/KVM. The QEMU source tree includes a relatively up-to-date snapshot of SeaBIOS (in pre-built, binary form). However, supporting Mac OS X currently requires patching SeaBIOS, and therefore we build a separate binary (bios-mac.bin) from current git master, and use it to start the OS X guest instead. Upstream resources include:

Project home page: http://www.seabios.org/
Master git repo: git://git.seabios.org/seabios.git

The current SeaBIOS patch was written by Alex Graf and/or René Rebe, and adds Mac-specific data structures to the SMBIOS component of SeaBIOS. The main effect of this patch is to enable "About This Mac" functionality from the OS X system menu, which would otherwise result in Finder crashing and being restarted.

The ACPI changes which used to also be part of this patch have now been moved to QEMU, see above.

I don't yet know how I would go about upstreaming this particular bit into SeaBIOS, but suggestions are welcome !

5. Chameleon

Chameleon is a Darwin/XNU boot loader based on Apple's boot-132. It is currently used to override some of the underlying BIOS settings, and generally bridge the gap between the existing BIOS and the EFI BIOS of genuine Apple Mac computers expected by OS X during boot. Project resources include:

Project home page: http://chameleon.osx86.hu/
SVN repository: http://forge.voodooprojects.org/svn/chameleon

The project is meant to be built under OS X itself, and also depends on Xcode. Once these prerequisites are met, simply running 'make' in the appropriate SVN subdirectory will build the bootloader. Luckily, the SnowLeopard install DVD also includes a copy of Xcode, so I used my OS X guest VM to build Chameleon 2.0 from the SVN repository. To initially boot the VM, you may temporarily use my precompiled Chameleon binary. After the Chameleon source build completes, find the resulting "boot" binary and copy it to our working directory as e.g. "chameleon_2.0_boot".

5.1. Chameleon Build Issues

Current (at the time of this writing, 2168) SVN trunk (and also the branch tagged 2.1) fails to build on SnowLeopard+Xcode. When attempting to link boot.sys, we get this error:

	[LD] boot.sys
ld: warning: -segaddr __INIT not page aligned, rounding down
ld: warning: -segaddr __TEXT not page aligned, rounding down
ld: warning: -segaddr __DATA not page aligned, rounding down
ld: segments overlap: __DATA (0x0005DFFE + 0x0000F000) and __LINKEDIT (0x0005E000 + 0x00001000)

It may be possible that later versions of OS X and/or Xcode are required to properly build the latest Chameleon versions.

5.2. Supporting Mac OS X versions beyond 10.7 (*Lion)

When using pre-2.0 versions of Chameleon, attempting to boot (Mountain)Lion from unmodified Apple boot media results in a crash behaving a lot like a triple-fault. With Chameleon 2.0, (Mountain)Lion no longer triple-faults on startup, but instead generates the following kernel panic during boot:

	Unable to find driver for this platform:
	"\ACPI\".\n"@/SourceCache/xnu/xnu-1699.22.73/iokit/Kernel/IOPlatformExpert.cpp:104

In all likelihood, solving this problem may also involve further SeaBIOS fixes, but successfully building and testing with the latest Chameleon SVN trunk would be a good start.

I have received a few reports of MountainLion running as a QEMU guest, but usually also involving a few Hackintosh-inspired steps that are not 100% clear and reproducible to me. I'd be really interested in reports of working (Mountain)Lion QEMU guests, but with (as many of) the following details (as humanly possible):

How was the HDD image generated?
- From install media under QEMU?
- By copying a pre-installed HDD image?
  - How was that HDD image generated ?
- By updating an older (e.g. SnowLeopard) HDD image?
  - Under QEMU?
  - Under some other platform? Which one?
  - What (if any) hacks were "perpetrated" before or after updating to Mountain(Lion)? (e.g., removing or installing various *.kext modules, etc.)
- Other method?
Regarding the boot media used for the original install (QEMU or other platform), and/or used to update an older version HDD image:
- Was it a vanilla image as shipped by Apple?
- Were any hacks or mods performed on the media?
  - Is it a well-known "Hackintosh" installer?
  - Are the mod steps published and reproducible?

6. Conclusion and Future Work

My ultimate goal is to make whatever changes are needed to KVM/QEMU/SeaBIOS/Chameleon to allow installing from unmodified, vanilla Apple install images, and to run the guest with no requirement to perform any post-install "surgery" on the guest HDD image. If absolutely necessary, a small number of well documented mods may be tolerated. As I mentioned elsewhere, I'm not looking to run a Hackintosh, but rather a well-supported OS X guest under QEMU/KVM on an Apple-made machine natively running Linux.

Currently, I'm planning to work on the issues outlined above, in the following order of priority:

MONITOR/MWAIT patches
e1000 link negotiation
AppleSMC issues, then HPET DSDT fixes
Find out what it takes to boot (Mountain)Lion in QEMU

That's a fair amount of work, so ideas, suggestions, or, should you be interested, actual help hacking on this stuff would be most welcome and appreciated!

For the (long-term) future, another interesting idea I saw vehiculated was to completely replace the Chameleon+SeaBIOS tandem with a (U)EFI compliant BIOS such as e.g. TianoCore, and try to get it to boot every OS (OS X included) natively.