This vulnerability is more historical rather than practical, but it caught attention of the Digital Edge security team as we think it is the first hypervisor vulnerability allowing a guest to attack hypervisor host.
The virtualization idea is that virtual instances should be running in their own jail and would not be able to communicate with other virtual instances or the physical host itself. This isolation technique makes people confident going into the “cloud” as in theory that nobody can break the jail. Your “neighbors” cannot damage you.
If the isolation concept fails, a criminal can purchase a virtual machine “next” to you and hack into your machine. Hypervisor software is doing everything to block visibility from one virtual instance to another or to the physical host.
New vulnerability - CVE-2015-7835 was logged today simply stating:
“The mod_l2_entry function in arch/x86/mm.c in Xen 3.4 through 4.6.x does not properly validate level 2 page table entries, which allows local PV guest administrators to gain privileges via a crafted superpage mapping.”
What it actually means is that a hacker can purchase a VM and get control over its physical host and then over VMs running on that physical host. In our opinion it is the worst bug we have seen.
The bug has internal Xen number – XSA 148.
The quote from Xen is:
“The code to validate level 2 page table entries is bypassed when certain conditions are satisfied. This means that a PV guest can create writeable mappings using super page mappings. Such writeable mappings can violate Xen intended invariants for pages which Xen is supposed to keep read-only.”
The above is a political way of stating the bug is a very critical one. Probably the worst we have seen affecting the Xen hypervisor, ever. And in our opinion the worst one that in theory should shake the “cloud” invincibility notion.
What is troubling is that this bug was in the hypervisor for 7 years in a mature virtualization platform. It is exploitable under very specific circumstances. But we are worried that more flaws in hypervisors will be discovered.
Explanation From the Experts:
The bug is in the mod_l2_entry() function which handles requests from the PV guests to update their page table mappings:
/* Update the L2 entry at pl2e to new value nl2e. pl2e is within frame pfn. */
static int mod_l2_entry(l2_pgentry_t *pl2e,
l2_pgentry_t nl2e,
unsigned long pfn,
int preserve_ad,
struct vcpu *vcpu)
{
l2_pgentry_t ol2e;
struct domain *d = vcpu->domain;
struct page_info *l2pg = mfn_to_page(pfn);
unsigned long type = l2pg->u.inuse.type_info;
int rc = 0;
if ( unlikely(!is_guest_l2_slot(d, type, pgentry_ptr_to_slot(pl2e))) )
{
//...
}
if ( unlikely(__copy_from_user(&ol2e, pl2e, sizeof(ol2e)) != 0) )
return -EFAULT;
if ( l2e_get_flags(nl2e) & _PAGE_PRESENT )
{
if ( unlikely(l2e_get_flags(nl2e) & L2_DISALLOW_MASK) )
{
//...
}
/* Fast path for identical mapping and presence. */
if ( !l2e_has_changed(ol2e, nl2e, _PAGE_PRESENT) )
{
adjust_guest_l2e(nl2e, d);
if ( UPDATE_ENTRY(l2, pl2e, ol2e, nl2e, pfn, vcpu, preserve_ad) )
return 0;
return -EBUSY;
}
The above code becomes buggy only if we also look at the definition of the L2_DISALLOW_MASK macro:
#define L2_DISALLOW_MASK (base_disallow_mask & ~_PAGE_PSE)
and also how the base_disallow_mask gets initialized:
void __init arch_init_memory(void)
{
(...)
/* Basic guest-accessible flags: PRESENT, R/W, USER, A/D, AVAIL[0,1,2] */
base_disallow_mask = ~(_PAGE_PRESENT|_PAGE_RW|_PAGE_USER|
_PAGE_ACCESSED|_PAGE_DIRTY|_PAGE_AVAIL);
We see the attacker might request setting of the (PSE | RW) bits in the L2 PDE (which is possible thanks to L2_DISALLOW_MASK not excluding the PSE bit, something which has been added to support the superpage mappings for the PV guests), thus making the whole L1 table accessible to the guest with R/W rights (now seen as a large 2MB page), and modify one or more of the PTEs there to point to an arbitrary MFN the attacker feels like having access to.
Now that would not be all fatal, if the attacker had no way of tricking Xen into treating this (now under the attacker's control) super-page back as a valid table of PTEs. Sadly, there is nothing to stop her from doing that. Thus we end up with Xen now treating the attacker-filled memory as a set of valid PTEs for the (PV) guest. This means the guest, by referencing the addresses mapped by these pages, is now really accessing whatever MFNs the attacker decided to write into the PTEs. In other words, the guest can access now all the system's memory. Reliably.
The attack works irrespectively of whether the opt_allow_superpage is true or not.