From 0eb31ff1b9ea3c91c5d71ec4f4061f1dbd3c0d71 Mon Sep 17 00:00:00 2001
Message-Id: <0eb31ff1b9ea3c91c5d71ec4f4061f1dbd3c0d71.1335172957.git.minovotn@redhat.com>
In-Reply-To: <8e20c74819db1d275690c2a426b9b9c9311ce679.1335172957.git.minovotn@redhat.com>
References: <8e20c74819db1d275690c2a426b9b9c9311ce679.1335172957.git.minovotn@redhat.com>
From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Sun, 22 Apr 2012 15:42:48 +0200
Subject: [PATCH 2/2] virtio: add missing mb() on notification

RH-Author: Michael S. Tsirkin <mst@redhat.com>
Message-id: <20120422154247.GA12494@redhat.com>
Patchwork-id: 39370
O-Subject: [PATCH RHEL6.3] virtio: add missing mb() on notification
Bugzilla: 804578
RH-Acked-by: Xiao Wang <jasowang@redhat.com>
RH-Acked-by: Amos Kong <akong@redhat.com>
RH-Acked-by: Paolo Bonzini <pbonzini@redhat.com>

During normal operation, virtio host first writes a used index
and then checks whether it should interrupt the guest
by reading guest avail flag/used event index values.
Guest does the reverse: writes the index/flag,
then checks the used ring.

The ordering is important: if host avail flag read bypasses the used
index write, we could in effect get this timing:

host avail flag/used event index read
		guest enable interrupts: this performs
			 avail flag/used event index write
		guest check used ring: ring is empty
host used index write

This timing results in a lost interrupt: guest will never be
notified about the used ring update.

This has actually been observed in the field, when using qemu-kvm such
that the guest vcpu and qemu io run on different host cpus,
but only seems to trigger on some specific hardware,
and only with userspace virtio: vhost has the necessary smp_mb() in
place to prevent the reordering, so the same workload stalls forever
waiting for an interrupt with vhost=off but works fine with vhost=on.

Insert an smp_mb() barrier operation in userspace virtio to
ensure the correct ordering.
Applying this patch fixed the race condition we have observed.
Tested on x86_64. I checked the code generated by the new macro
for i386 and ppc but didn't run virtio.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

Upstream status: posted
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=804578
Tested: on x86_64

Note: upstream uses smp_XX style barrier macros,
and has code for architectures besides x86_64.
We don't have any of this on RHEL6 - this is a minimal patch that just
ports the bugfix.

Please-review: Amit Shah <amit.shah@redhat.com>
Please-review: Juan Quintela <quintela@redhat.com>
Please-review: Jason Wang <jasowang@redhat.com>

---
 hw/virtio.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

Signed-off-by: Michal Novotny <minovotn@redhat.com>
---
 hw/virtio.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/hw/virtio.c b/hw/virtio.c
index c7f39f2..2db9b68 100644
--- a/hw/virtio.c
+++ b/hw/virtio.c
@@ -29,6 +29,7 @@
  */
 
 #define wmb() __asm__ __volatile__("": : :"memory")
+#define mb()    __asm__ __volatile__("mfence":::"memory")
 
 typedef struct VRingDesc
 {
@@ -691,6 +692,8 @@ static bool vring_notify(VirtIODevice *vdev, VirtQueue *vq)
 {
     uint16_t old, new;
     bool v;
+    /* We need to expose used array entries before checking used event. */
+    mb();
     /* Always notify when queue is empty (when feature acknowledge) */
     if (((vdev->guest_features & (1 << VIRTIO_F_NOTIFY_ON_EMPTY)) &&
          !vq->inuse && vring_avail_idx(vq) == vq->last_avail_idx)) {
-- 
1.7.7.6

