Bug#860563: marked as done (unblock: xen/4.8.1-1)

To: Niels Thykier <niels@thykier.net>
Subject: Bug#860563: marked as done (unblock: xen/4.8.1-1)
From: owner@bugs.debian.org (Debian Bug Tracking System)
Date: Wed, 19 Apr 2017 09:39:07 +0000
Message-id: <[🔎] handler.860563.D860563.14925945536083.ackdone@bugs.debian.org>
References: <18f1ec24-79db-aec7-3764-c02eafafb66d@thykier.net> <[🔎] 20170418173655.30710.96739.reportbug@mariner.uk.xensource.com>

Your message dated Wed, 19 Apr 2017 09:34:00 +0000
with message-id <18f1ec24-79db-aec7-3764-c02eafafb66d@thykier.net>
and subject line Re: Bug#860563: unblock: xen/4.8.1-1
has caused the Debian Bug report #860563,
regarding unblock: xen/4.8.1-1
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact owner@bugs.debian.org
immediately.)


-- 
860563: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=860563
Debian Bug Tracking System
Contact owner@bugs.debian.org with problems

--- Begin Message ---

To: Debian Bug Tracking System <submit@bugs.debian.org>
Subject: unblock: xen/4.8.1-1
From: Ian Jackson <Ian.Jackson@eu.citrix.com>
Date: Tue, 18 Apr 2017 18:36:55 +0100
Message-id: <[🔎] 20170418173655.30710.96739.reportbug@mariner.uk.xensource.com>

Package: release.debian.org
Severity: normal
User: release.debian.org@packages.debian.org
Usertags: unblock

Please unblock package xen

unblock xen/4.8.1-1


This update includes three security fixes and a large number of other
important bugfixes.

When preparing this update I had to choose between either (i) taking
the upstream 4.8.1 stable point release and reverting any changes I
felt inappropriate, or (ii) cherry picking the commits I felt
appropriate.

Looking at the git log [1] I concluded that the majority of the
non-security fixes were clearly bugfixes.  Many of those bugfixes are
for crashes or races.

I decided that the lower risk approach would be to start with all the
commits from upstream, and revert any that ought to be excluded.  This
reduces the risk of dropping an important bugfix.

Reviewing the commit log in detail there were two commits for which
the justification for backporting seemed quite unclear to me:
"xen/arm: *: Relax hw domain mapping attributes" - two commits, one
for ACPI and one for DT; and "x86/ept: allow write-combining on
!mfn_valid() MMIO mappings again".  I queried these with other
upstream developers and came to the conclusion that they ought to be
included.

There are a number of other commits which are clear bugfixes, with a
low risk of regression, but also a low impact.  I think it is probably
better to include these and ship Xen 4.8.1 in stretch, than to revert
them.

[1] git-log-4.8.1-1.txt, attached.

I'm afraid the debdiff will be hard to read - not because the changes
interact so much, but because there are quite a lot of them.

In the debdiff you will see a change to Config.mk.  That change has no
effect on the Debian package build and could be stripped out.  I chose
not to do this because I felt that messing with things was more likely
to break things than to fix them (see above).

Thanks for your attention and I hope this approach meets with your
approval.

Regards,
Ian.

-- System Information:
Debian Release: 8.6
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: i386 (x86_64)

Kernel: Linux 3.16.0-4-amd64 (SMP w/8 CPU cores)
Locale: LANG=C.UTF-8, LC_CTYPE=C.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: sysvinit (via /sbin/init)

commit 5ebb4de45c501ae12964a244ccd85fe1169a5f7c
Author: Jan Beulich <jbeulich@suse.com>
Date:   Mon Apr 10 15:21:48 2017 +0200

    update Xen version to 4.8.1

commit e1c62cdf782085605ea1186912fc419dd9464027
Author: Thomas Sanders <thomas.sanders@citrix.com>
Date:   Tue Mar 28 18:57:52 2017 +0100

    oxenstored: trim history in the frequent_ops function
    
    We were trimming the history of commits only at the end of each
    transaction (regardless of how it ended).
    
    Therefore if non-transactional writes were being made but no
    transactions were being ended, the history would grow
    indefinitely. Now we trim the history at regular intervals.
    
    Signed-off-by: Thomas Sanders <thomas.sanders@citrix.com>

commit 336afa82ca86fe61f9c46f89ae6726ff94754f34
Author: Thomas Sanders <thomas.sanders@citrix.com>
Date:   Mon Mar 27 14:36:34 2017 +0100

    oxenstored transaction conflicts: improve logging
    
    For information related to transaction conflicts, potentially frequent
    logging at "info" priority has been changed to "debug" priority, and
    once per two minutes there is an "info" priority summary.
    
    Additional detailed logging has been added at "debug" priority.
    
    Reported-by: Juergen Gross <jgross@suse.com>
    Signed-off-by: Thomas Sanders <thomas.sanders@citrix.com>

commit 3ee0d82af271897e7e8f74949a4c50d47d460309
Author: Thomas Sanders <thomas.sanders@citrix.com>
Date:   Fri Mar 24 19:55:03 2017 +0000

    oxenstored: don't wake to issue no conflict-credit
    
    In the main loop, when choosing the timeout for the select function
    call, we were setting it so as to wake up to issue conflict-credit to
    any domains that could accept it. When xenstore is idle, this would
    mean waking up every 50ms (by default) to do no work. With this
    commit, we check whether any domain is below its cap, and if not then
    we set the timeout for longer (the same timeout as before the
    conflict-protection feature was added).
    
    Reported-by: Juergen Gross <jgross@suse.com>
    Signed-off-by: Thomas Sanders <thomas.sanders@citrix.com>
    Reviewed-by: Jonathan Davies <jonathan.davies@citrix.com>

commit 84ee808e363887910984b3eb443466ce42e8010f
Author: Thomas Sanders <thomas.sanders@citrix.com>
Date:   Fri Mar 24 16:16:10 2017 +0000

    oxenstored: do not commit read-only transactions
    
    The packet telling us to end the transaction has always carried an
    argument telling us whether to commit.
    
    If the transaction made no modifications to the tree, now we ignore
    that argument and do not commit: it is just a waste of effort.
    
    This makes read-only transactions immune to conflicts, and means that
    we do not need to store any of their details in the history that is
    used for assigning blame for conflicts.
    
    We count a transaction as a read-only transaction only if it contains
    no operations that modified the tree.
    
    This means that (for example) a transaction that creates a new node
    then deletes it would NOT count as read-only, even though it makes no
    change overall. A more sophisticated algorithm could judge the
    transaction based on comparison of its initial and final states, but
    this would add complexity and computational cost.
    
    Reported-by: Juergen Gross <jgross@suse.com>
    Signed-off-by: Thomas Sanders <thomas.sanders@citrix.com>
    Reviewed-by: Jonathan Davies <jonathan.davies@citrix.com>

commit cb778dee017504505a5f20aea1831abef31a3e97
Author: Thomas Sanders <thomas.sanders@citrix.com>
Date:   Thu Mar 23 19:06:54 2017 +0000

    oxenstored: allow self-conflicts
    
    We already avoid inter-domain conflicts but now allow intra-domain
    conflicts.  Although there are no known practical examples of a domain
    that might perform operations that conflict with its own transactions,
    this is conceivable, so here we avoid changing those semantics
    unnecessarily.
    
    When a transaction commit fails with a conflict and we look through
    the history of commits to see which connection(s) to blame, ignore
    historical commits that were made by the same connection as the
    failing commit.
    
    Reported-by: Juergen Gross <jgross@suse.com>
    Signed-off-by: Thomas Sanders <thomas.sanders@citrix.com>
    Reviewed-by: Jonathan Davies <jonathan.davies@citrix.com>

commit fa0b2b9555366e5836a5fdacb62bb054cdefc3d6
Author: Jonathan Davies <jonathan.davies@citrix.com>
Date:   Thu Mar 23 14:28:16 2017 +0000

    oxenstored: blame the connection that caused a transaction conflict
    
    Blame each connection found to have made a commit that would cause this
    transaction to fail. Each blamed connection is penalised by having its
    conflict-credit decremented.
    
    Note the change in semantics for the replay function: we no longer stop after
    finding the first operation that can't be replayed. This allows us to identify
    all operations that conflicted with this transaction, not just the one that
    conflicted first.
    
    Signed-off-by: Jonathan Davies <jonathan.davies@citrix.com>
    Signed-off-by: Thomas Sanders <thomas.sanders@citrix.com>
    v1 Reviewed-by: Christian Lindig <christian.lindig@citrix.com>
    
    Changes since v1:
     * use correct log levels for informational messages
    Changes since v2:
     * fix the blame algorithm and improve logging
       (fix was reviewed by Jonathan Davies)
    
    Reported-by: Juergen Gross <jgross@suse.com>
    Signed-off-by: Thomas Sanders <thomas.sanders@citrix.com>

commit 9ea503220d33b9efae45405eeac5a3a08a902833
Author: Jonathan Davies <jonathan.davies@citrix.com>
Date:   Mon Mar 27 08:58:29 2017 +0000

    oxenstored: track commit history
    
    Since the list of historic activity cannot grow without bound, it is safe to use
    this to track commits.
    
    Reported-by: Juergen Gross <jgross@suse.com>
    Signed-off-by: Jonathan Davies <jonathan.davies@citrix.com>
    Reviewed-by: Thomas Sanders <thomas.sanders@citrix.com>

commit c68276082ac2bea5caf2bff26cc89771598e0de9
Author: Thomas Sanders <thomas.sanders@citrix.com>
Date:   Thu Mar 23 14:25:16 2017 +0000

    oxenstored: discard old commit-history on txn end
    
    The history of commits is to be used for working out which historical
    commit(s) (including atomic writes) caused conflicts with a
    currently-failing commit of a transaction. Any commit that was made
    before the current transaction started cannot be relevant. Therefore
    we never need to keep history from before the start of the
    longest-running transaction that is open at any given time: whenever a
    transaction ends (with or without a commit) then if it was the
    longest-running open transaction we can delete history up until start
    of the the next-longest-running open transaction.
    
    Some transactions might stay open for a very long time, so if any
    transaction exceeds conflict_max_history_seconds then we remove it
    from consideration in this context, and will not guarantee to keep
    remembering about historical commits made during such a transaction.
    
    We implement this by keeping a list of all open transactions that have
    not been open too long. When a transaction ends, we remove it from the
    list, along with any that have been open longer than the maximum; then
    we delete any history from before the start of the longest-running
    transaction remaining in the list.
    
    Reported-by: Juergen Gross <jgross@suse.com>
    Signed-off-by: Thomas Sanders <thomas.sanders@citrix.com>
    Reviewed-by: Jonathan Davies <jonathan.davies@citrix.com>
    Reviewed-by: Christian Lindig <christian.lindig@citrix.com>

commit 9a2c5b42ad29ea731ed95d7aae5b59df1c526eb3
Author: Jonathan Davies <jonathan.davies@citrix.com>
Date:   Thu Mar 23 14:20:33 2017 +0000

    oxenstored: only record operations with side-effects in history
    
    There is no need to record "read" operations as they will never cause another
    transaction to fail.
    
    Reported-by: Juergen Gross <jgross@suse.com>
    Signed-off-by: Jonathan Davies <jonathan.davies@citrix.com>
    Reviewed-by: Thomas Sanders <thomas.sanders@citrix.com>

commit 567051b61858424ec8725efe23641d12ee69791c
Author: Jonathan Davies <jonathan.davies@citrix.com>
Date:   Tue Mar 14 13:20:07 2017 +0000

    oxenstored: support commit history tracking
    
    Add ability to track xenstore tree operations -- either non-transactional
    operations or committed transactions.
    
    For now, the call to actually retain commits is commented out because history
    can grow without bound.
    
    For now, we call record_commit for all non-transactional operations. A
    subsequent patch will make it retain only the ones with side-effects.
    
    Reported-by: Juergen Gross <jgross@suse.com>
    Signed-off-by: Jonathan Davies <jonathan.davies@citrix.com>
    Signed-off-by: Thomas Sanders <thomas.sanders@citrix.com>
    Reviewed-by: Christian Lindig <christian.lindig@citrix.com>

commit 4f4596a0e90ebf7ed971b1949244e3b2cbed5d11
Author: Jonathan Davies <jonathan.davies@citrix.com>
Date:   Tue Mar 14 12:17:38 2017 +0000

    oxenstored: add transaction info relevant to history-tracking
    
    Specifically:
     * retain the original store (not just the root) in full transactions
     * store commit count at the time of the start of the transaction
    
    Reported-by: Juergen Gross <jgross@suse.com>
    Signed-off-by: Jonathan Davies <jonathan.davies@citrix.com>
    Reviewed-by: Thomas Sanders <thomas.sanders@citrix.com>
    Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
    Reviewed-by: Christian Lindig <christian.lindig@citrix.com>

commit b795db0e3d04dff4fd31b380eb7dbc58c8926964
Author: Thomas Sanders <thomas.sanders@citrix.com>
Date:   Tue Mar 14 12:15:52 2017 +0000

    oxenstored: ignore domains with no conflict-credit
    
    When processing connections, skip those from domains with no remaining
    conflict-credit.
    
    Also, issue a point of conflict-credit at regular intervals, the
    period being set by the configuration option "conflict-max-history-
    seconds".  When issuing conflict-credit, we give a point either to
    every domain at once (one each) or only to the single domain at the
    front of the queue, depending on the configuration option
    "conflict-rate-limit-is-aggregate".
    
    Reported-by: Juergen Gross <jgross@suse.com>
    Signed-off-by: Thomas Sanders <thomas.sanders@citrix.com>
    Reviewed-by: Jonathan Davies <jonathan.davies@citrix.com>
    Reviewed-by: Christian Lindig <christian.lindig@citrix.com>

commit 6636c70b369ada87f08bcb1810d0715687bc1fe8
Author: Thomas Sanders <thomas.sanders@citrix.com>
Date:   Tue Mar 14 12:15:52 2017 +0000

    oxenstored: handling of domain conflict-credit
    
    This commit gives each domain a conflict-credit variable, which will
    later be used for limiting how often a domain can cause other domain's
    transaction-commits to fail.
    
    This commit also provides functions and data for manipulating domains
    and their conflict-credit, and checking whether they have credit.
    
    Reported-by: Juergen Gross <jgross@suse.com>
    Signed-off-by: Thomas Sanders <thomas.sanders@citrix.com>
    Reviewed-by: Jonathan Davies <jonathan.davies@citrix.com>
    Reviewed-by: Christian Lindig <christian.lindig@citrix.com>

commit f2c7ab1f47ea58b7bd397c42185e93ed1f162ac5
Author: Thomas Sanders <thomas.sanders@citrix.com>
Date:   Tue Mar 14 12:15:52 2017 +0000

    oxenstored: comments explaining some variables
    
    It took a while of reading and reasoning to work out what these are
    for, so here are comments to make life easier for everyone reading
    this code in future.
    
    Reported-by: Juergen Gross <jgross@suse.com>
    Signed-off-by: Thomas Sanders <thomas.sanders@citrix.com>
    Reviewed-by: Jonathan Davies <jonathan.davies@citrix.com>
    Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
    Reviewed-by: Christian Lindig <christian.lindig@citrix.com>

commit f3b7100424200938edc49c463e8aa1b8b73f2778
Author: Ian Jackson <ian.jackson@eu.citrix.com>
Date:   Tue Mar 7 16:09:13 2017 +0000

    xenstored: Log when the write transaction rate limit bites
    
    Reported-by: Juergen Gross <jgross@suse.com>
    Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
    
    plus:
    
    xenstore: dont increment bool variable
    Instead of incrementing a bool variable just set it to true.
    
    Signed-off-by: Juergen Gross <jgross@suse.com>
    Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

commit 4cd02a2513dc224e343eaa8a88418a14ade092b3
Author: Ian Jackson <ian.jackson@eu.citrix.com>
Date:   Tue Mar 7 16:09:12 2017 +0000

    xenstored: apply a write transaction rate limit
    
    This avoids a rogue client being about to stall another client (eg the
    toolstack) indefinitely.
    
    This is XSA-206.
    
    Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
    
    Backported to 4.8 (not entirely trivial).
    
    Reported-by: Juergen Gross <jgross@suse.com>
    Signed-off-by: George Dunlap <george.dunlap@citrix.com>
    Acked-by: Ian Jackson <Ian.Jackson@eu.citrix.com>

commit e0354e65fec21a51e573bf82ef930cb97ed11c96
Author: Paul Durrant <paul.durrant@citrix.com>
Date:   Wed Feb 22 13:27:34 2017 +0000

    tools/libxenctrl: fix error check after opening libxenforeignmemory
    
    Checking the value of xch->xcall is clearly incorrect. The code should be
    checking xch->fmem (i.e. the return of the previously called function).
    
    Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
    Acked-by: Wei Liu <wei.liu2@citrix.com>
    Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
    (cherry picked from commit 80a7d04f532ddc3500acd7988917708a536ae15f)

commit a085f0ca12a3db203f9dcfc96dc3722d0f0f3fbf
Author: Juergen Gross <jgross@suse.com>
Date:   Wed Feb 15 12:11:12 2017 +0100

    libxl: correct xenstore entry for empty cdrom
    
    Specifying an empty cdrom device will result in a Xenstore entry
    
    params = aio:(null)
    
    as the physical device path isn't existing. This lets a domain booted
    via OVMF hang as OVMF is checking for "aio:" only in order to detect
    the empty cdrom case.
    
    Use an empty string for the physical device path in this case. As a
    cdrom device for HVM is always backed by qdisk we only need to cover this
    backend.
    
    Signed-off-by: Juergen Gross <jgross@suse.com>
    Acked-by: Wei Liu <wei.liu2@citrix.com>

commit ec7f9e1df2aa6cf8376d26eafca554c6521d2e7c
Author: Juergen Gross <jgross@suse.com>
Date:   Tue Apr 4 14:55:55 2017 +0200

    x86: use 64 bit mask when masking away mfn bits
    
    When using _PAGE_PSE_PAT as base for a negated bit mask make sure it is
    propagated to 64 bits when applied to a 64 bit value.
    
    There seems to be only one place where this is a problem, so fix this
    by casting _PAGE_PSE_PAT to 64 bits there.
    
    Not doing so will probably lead to problems on hosts with more than
    16 TB of memory.
    
    Signed-off-by: Juergen Gross <jgross@suse.com>
    Reviewed-by: Jan Beulich <jbeulich@suse.com>
    Acked-by: George Dunlap <george.dunlap@citrix.com>
    master commit: 4edb1a42e3320757e3559f17edf6903bc1777de3
    master date: 2017-03-30 15:11:24 +0200

commit 06403aa5f28bf697051de0435ef942f4c0d25849
Author: Jan Beulich <jbeulich@suse.com>
Date:   Tue Apr 4 14:55:00 2017 +0200

    memory: properly check guest memory ranges in XENMEM_exchange handling
    
    The use of guest_handle_okay() here (as introduced by the XSA-29 fix)
    is insufficient here, guest_handle_subrange_okay() needs to be used
    instead.
    
    Note that the uses are okay in
    - XENMEM_add_to_physmap_batch handling due to the size field being only
      16 bits wide,
    - livepatch_list() due to the limit of 1024 enforced on the
      number-of-entries input (leaving aside the fact that this can be
      called by a privileged domain only anyway),
    - compat mode handling due to counts there being limited to 32 bits,
    - everywhere else due to guest arrays being accessed sequentially from
      index zero.
    
    This is CVE-2017-7228 / XSA-212.
    
    Reported-by: Jann Horn <jannh@google.com>
    Signed-off-by: Jan Beulich <jbeulich@suse.com>
    Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
    master commit: 938fd2586eb081bcbd694f4c1f09ae6a263b0d90
    master date: 2017-04-04 14:47:46 +0200

commit f3623bdbe5f7ff63e728865a8b986b2312231685
Author: Dario Faggioli <dario.faggioli@citrix.com>
Date:   Fri Mar 31 08:33:20 2017 +0200

    xen: sched: don't call hooks of the wrong scheduler via VCPU2OP
    
    Within context_saved(), we call the context_saved hook,
    and we use VCPU2OP() to determine from what scheduler.
    VCPU2OP uses DOM2OP, which uses d->cpupool, which is
    NULL when d is the idle domain. And in that case,
    DOM2OP just returns ops, the scheduler of cpupool0.
    
    Therefore, if:
    - cpupool0's scheduler defines context_saved (like
      Credit2 and RTDS do),
    - we are not in cpupool0 (i.e., our scheduler is
      not ops),
    - we are context switching from idle,
    
    we call VCPU2OP(idle_vcpu), which means
    DOM2OP(idle->cpupool), which is ops.
    
    Therefore, we both:
    - check if context_saved is defined in the wrong
      scheduler;
    - if yes, call the wrong one.
    
    When using Credit2 at boot, and also Credit2 in
    the other cpupool, this is wrong but innocuous,
    because it only involves the idle vcpus.
    
    When using Credit2 at boot, and Credit1 in the
    other cpupool, this is *totally* wrong, and
    it's by chance it does not explode!
    
    When using Credit2 and other schedulers I'm
    developping, I hit the following assert (in
    sched_credit2.c, on a CPU inside a cpupool that
    does not use Credit2):
    
    csched2_context_saved()
    {
     ...
     ASSERT(!vcpu_on_runq(svc));
     ...
    }
    
    Fix this by dealing explicitly, in VCPU2OP, with
    idle vcpus, returning the scheduler of the pCPU
    they (always) run on.
    
    Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
    Reviewed-by: Juergen Gross <jgross@suse.com>
    Reviewed-by: George Dunlap <george.dunlap@citrix.com>
    master commit: a3653e6a279213ba4e883b2252415dc98633106a
    master date: 2017-03-27 14:28:05 +0100

commit c95bad938f77a863f46bbce6cad74012714776bb
Author: Jan Beulich <jbeulich@suse.com>
Date:   Fri Mar 31 08:32:51 2017 +0200

    x86/EFI: avoid Xen image when looking for module/kexec position
    
    When booting straight from EFI, we don't further try to relocate Xen.
    As a result, so far we also didn't avoid the area Xen uses when looking
    for a location to put modules or the kexec area. Introduce a fake
    module slot to deal with that without having to fiddle with a lot of
    code.
    
    Signed-off-by: Jan Beulich <jbeulich@suse.com>
    Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
    master commit: e22e1c47958a4778cd7baa3980f74e52f525ba28
    master date: 2017-03-20 09:27:12 +0100

commit 4ec1cb0b01332c0bbf0e4d232c1e33390ae1a95c
Author: Jan Beulich <jbeulich@suse.com>
Date:   Fri Mar 31 08:32:22 2017 +0200

    x86/EFI: avoid IOMMU faults on [_end,__2M_rwdata_end)
    
    Commit c9a4a1c419 ("x86/layout: Correct Xen's idea of its own memory
    layout") didn't go far enough with the conversion, causing IOMMU faults
    when memory from that range was handed to a domain. We must not make
    this memory available for allocation (the change is benign to xen.gz at
    this point in time).
    
    Note that the change to tboot_shutdown() is fixing another issue at
    once: As it looks, the function so far skipped all memory below the Xen
    image.
    
    Signed-off-by: Jan Beulich <jbeulich@suse.com>
    Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
    master commit: d522571a408a7dd21a06705f6dd51cdafd2db4fc
    master date: 2017-03-20 09:25:36 +0100

commit 093a1f1b1c894e397f8fe82a1d69d486e4ade33f
Author: Jan Beulich <jbeulich@suse.com>
Date:   Fri Mar 31 08:31:53 2017 +0200

    x86/EFI: avoid overrunning mb_modules[]
    
    Commit 436fb462ab ("x86/microcode: enable boot time (pre-Dom0)
    loading") added a 4th module without providing an array slot for it.
    
    Signed-off-by: Jan Beulich <jbeulich@suse.com>
    Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
    master commit: 02b37b7eff39e40828041b2fe480725ab8443258
    master date: 2017-03-17 15:45:22 +0100

commit 47501b612494b98318079416a25ed6690c41deb1
Author: Roger Pau Monné <roger.pau@citrix.com>
Date:   Fri Mar 31 08:31:14 2017 +0200

    build/clang: fix XSM dummy policy when using clang 4.0
    
    There seems to be some weird bug in clang 4.0 that prevents xsm_pmu_op from
    working as expected, and vpmu.o ends up with a reference to
    __xsm_action_mismatch_detected which makes the build fail:
    
    [...]
    ld    -melf_x86_64_fbsd  -T xen.lds -N prelink.o  \
        xen/common/symbols-dummy.o -o xen/.xen-syms.0
    prelink.o: In function `xsm_default_action':
    xen/include/xsm/dummy.h:80: undefined reference to `__xsm_action_mismatch_detected'
    xen/xen/include/xsm/dummy.h:80: relocation truncated to fit: R_X86_64_PC32 against undefined symbol `__xsm_action_mismatch_detected'
    ld: xen/xen/.xen-syms.0: hidden symbol `__xsm_action_mismatch_detected' isn't defined
    
    Then doing a search in the objects files:
    
    # find xen/ -type f -name '*.o' -print0 | xargs -0 bash -c \
      'for filename; do nm "$filename" | \
      grep -q __xsm_action_mismatch_detected && echo "$filename"; done' bash
    xen/arch/x86/prelink.o
    xen/arch/x86/cpu/vpmu.o
    xen/arch/x86/cpu/built_in.o
    xen/arch/x86/built_in.o
    
    The current patch is the only way I've found to fix this so far, by simply
    moving the XSM_PRIV check into the default case in xsm_pmu_op. This also fixes
    the behavior of do_xenpmu_op, which will now return -EINVAL for unknown
    XENPMU_* operations, instead of -EPERM when called by a privileged domain.
    
    Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
    Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
    master commit: 9e4d116faff4545a7f21c2b01008e94d68e6db58
    master date: 2017-03-14 18:19:29 +0100

commit 2859b25a3ba9ba4eff6dba8d6e60dd9520ebbdb4
Author: Roger Pau Monné <roger.pau@citrix.com>
Date:   Fri Mar 31 08:28:49 2017 +0200

    x86: drop unneeded __packed attributes
    
    There where a couple of unneeded packed attributes in several x86-specific
    structures, that are obviously aligned. The only non-trivial one is
    vmcb_struct, which has been checked to have the same layout with and without
    the packed attribute using pahole. In that case add a build-time size check to
    be on the safe side.
    
    No functional change is expected as a result of this commit.
    
    Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
    Reviewed-by: Jan Beulich <jbeulich@suse.com>
    Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
    master commit: 4036e7c592905c2292cdeba8269e969959427237
    master date: 2017-03-07 17:11:06 +0100

commit ca41491f0507150139fc35ff6c9f076fdbe9487b
Author: Stefano Stabellini <sstabellini@kernel.org>
Date:   Wed Mar 29 11:32:34 2017 -0700

    arm: xen_size should be paddr_t for consistency
    
    Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
    Reviewed-by: Julien Grall <julien.grall@arm.com>

commit 26dec7af0d019ea0ace95421b756235a552a7877
Author: Wei Chen <Wei.Chen@arm.com>
Date:   Mon Mar 27 16:40:50 2017 +0800

    xen/arm: alternative: Register re-mapped Xen area as a temporary virtual region
    
    While I was using the alternative patching in the SErrors patch series [1].
    I used a branch instruction as alternative instruction.
    
            ALTERNATIVE("nop",
                        "b skip_check",
                        SKIP_CHECK_PENDING_VSERROR)
    
    Unfortunately, I got a system panic message with this code:
    
    (XEN) build-id: f64081d86e7e88504b7d00e1486f25751c004e39
    (XEN) alternatives: Patching with alt table 100b9480 -> 100b9498
    (XEN) Xen BUG at alternative.c:61
    (XEN) ----[ Xen-4.9-unstable  arm32  debug=y   Tainted:  C   ]----
    (XEN) CPU:    0
    (XEN) PC:     00252b68 alternative.c#__apply_alternatives+0x128/0x1d4
    (XEN) CPSR:   800000da MODE:Hypervisor
    (XEN)      R0: 00000000 R1: 00000000 R2: 100b9490 R3: 100b949c
    (XEN)      R4: eafeff84 R5: 00000000 R6: 100b949c R7: 10079290
    (XEN)      R8: 100792ac R9: 00000001 R10:100b948c R11:002cfe04 R12:002932c0
    (XEN) HYP: SP: 002cfdc4 LR: 00239128
    (XEN)
    (XEN)   VTCR_EL2: 80003558
    (XEN)  VTTBR_EL2: 0000000000000000
    (XEN)
    (XEN)  SCTLR_EL2: 30cd187f
    (XEN)    HCR_EL2: 000000000038663f
    (XEN)  TTBR0_EL2: 00000000bff09000
    (XEN)
    (XEN)    ESR_EL2: 00000000
    (XEN)  HPFAR_EL2: 0000000000000000
    (XEN)      HDFAR: 00000000
    (XEN)      HIFAR: 00000000
    (XEN)
    (XEN) Xen stack trace from sp=002cfdc4:
    (XEN)    00000000 00294328 002e0004 00000001 10079290 002cfe14 100b9490 00000000
    (XEN)    10010000 10122700 00200000 002cfe1c 00000080 00252c14 00000000 002cfe64
    (XEN)    00252dd8 00000007 00000000 000bfe00 100b9480 100b9498 002cfe1c 002cfe1c
    (XEN)    10010000 10122700 00000000 00000000 00000000 00000000 00000000 00000000
    (XEN)    00000000 00000000 00000000 002ddf30 00000000 003113e8 0030f018 002cfe9c
    (XEN)    00238914 00000002 00000000 00000000 00000000 0028b000 00000002 00293800
    (XEN)    00000002 0030f238 00000002 00290640 00000001 002cfea4 002a2840 002cff54
    (XEN)    002a65fc 11112131 10011142 00000000 0028d194 00000000 00000000 00000000
    (XEN)    bdffb000 80000000 00000000 c0000000 00000000 00000002 00000000 c0000000
    (XEN)    002b8060 00002000 002b8040 00000000 c0000000 bc000000 00000000 c0000000
    (XEN)    00000000 be000000 00000000 00112701 00000000 bff12701 00000000 00000000
    (XEN)    00000000 00000000 00000000 00000000 00000018 00000000 00000001 00000000
    (XEN)    9fece000 80200000 80000000 00400000 00200550 00000000 00000000 00000000
    (XEN)    00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    (XEN)    00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    (XEN)    00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    (XEN)    00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
    (XEN)    00000000 00000000 00000000 00000000 00000000 00000000 00000000
    (XEN) Xen call trace:
    (XEN)    [<00252b68>] alternative.c#__apply_alternatives+0x128/0x1d4 (PC)
    (XEN)    [<00239128>] is_active_kernel_text+0x10/0x28 (LR)
    (XEN)    [<00252dd8>] alternative.c#__apply_alternatives_multi_stop+0x1c4/0x204
    (XEN)    [<00238914>] stop_machine_run+0x1e8/0x254
    (XEN)    [<002a2840>] apply_alternatives_all+0x38/0x54
    (XEN)    [<002a65fc>] start_xen+0xcf4/0xf88
    (XEN)    [<00200550>] arm32/head.o#paging+0x94/0xd8
    (XEN)
    (XEN)
    (XEN) ****************************************
    (XEN) Panic on CPU 0:
    (XEN) Xen BUG at alternative.c:61
    (XEN) ****************************************
    
    This panic was triggered by the BUG(); in branch_insn_requires_update.
    That's because in this case the alternative patching needs to update the
    offset of the branch instruction. But the new target address of the branch
    instruction could not pass the check of is_active_kernel_text();
    
    The reason is that: When Xen is booting, it will call apply_alternatives_all
    to do patching with alternative tables. In this progress, we should update
    the offset of branch instructions if required. This means we should modify
    the Xen text section. But Xen text section is marked as read-only and we
    configure the hardware to not allow a region to be writable and executable at
    the same time. So we re-map Xen in a temporary area for writing. In this case,
    the calculation of the new target address of the branch instruction is based
    on this re-mapped area. The new target address will point to a value in the
    re-mapped area. But we haven't registered this area as an active kernel text.
    So the check of is_active_kernel_text will always return false.
    
    We have to register the re-mapped Xen area as a virtual region temporarily to
    solve this problem.
    
    1. https://lists.xenproject.org/archives/html/xen-devel/2017-03/msg01939.html
    
    Signed-off-by: Wei Chen <Wei.Chen@arm.com>
    Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
    Reviewed-by: Julien Grall <julien.grall@arm.com>

commit eca97a466dc8d8f99fbff8f51a117d6e8255ecdc
Author: Ian Jackson <ian.jackson@eu.citrix.com>
Date:   Tue Mar 21 18:44:24 2017 +0000

    QEMU_TAG update

commit c75fe6473b73705c9b9f7d8ecc3d043afef55727
Author: Stefano Stabellini <sstabellini@kernel.org>
Date:   Fri Feb 10 18:05:22 2017 -0800

    arm: read/write rank->vcpu atomically
    
    We don't need a lock in vgic_get_target_vcpu anymore, solving the
    following lock inversion bug: the rank lock should be taken first, then
    the vgic lock. However, gic_update_one_lr is called with the vgic lock
    held, and it calls vgic_get_target_vcpu, which tries to obtain the rank
    lock.
    
    Coverity-ID: 1381855
    Coverity-ID: 1381853
    
    Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
    Reviewed-by: Julien Grall <julien.grall@arm.com>

commit af18ca94f3fdbea87687c07ba532269dbb483e64
Author: Julien Grall <julien.grall@arm.com>
Date:   Wed Mar 8 18:06:02 2017 +0000

    xen/arm: p2m: Perform local TLB invalidation on vCPU migration
    
    The ARM architecture allows an OS to have per-CPU page tables, as it
    guarantees that TLBs never migrate from one CPU to another.
    
    This works fine until this is done in a guest. Consider the following
    scenario:
        - vcpu-0 maps P to V
        - vpcu-1 maps P' to V
    
    If run on the same physical CPU, vcpu-1 can hit in TLBs generated by
    vcpu-0 accesses, and access the wrong physical page.
    
    The solution to this is to keep a per-p2m map of which vCPU ran the last
    on each given pCPU and invalidate local TLBs if two vPCUs from the same
    VM run on the same CPU.
    
    Unfortunately it is not possible to allocate per-cpu variable on the
    fly. So for now the size of the array is NR_CPUS, this is fine because
    we still have space in the structure domain. We may want to add an
    helper to allocate per-cpu variable in the future.
    
    Signed-off-by: Julien Grall <julien.grall@arm.com>
    Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

commit 30c2dd762bcf938475632e28fcbd8d6592a71d5d
Author: Julien Grall <julien.grall@arm.com>
Date:   Wed Mar 8 18:06:01 2017 +0000

    xen/arm: Introduce INVALID_VCPU_ID
    
    Define INVALID_VCPU_ID as MAX_VIRT_CPUS to avoid casting problem later
    on. At the moment it can always fit in uint8_t.
    
    Signed-off-by: Julien Grall <julien.grall@arm.com>
    Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

commit 1780ea794780cf410fcb857d83add72ee088ff6e
Author: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
Date:   Mon Feb 1 14:56:13 2016 +0530

    xen/arm: Set nr_cpu_ids to available number of cpus
    
    nr_cpu_ids for arm platforms is incorrectly set to NR_CPUS
    irrespective of the number of cpus supported by platform.
    
    Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
    Reviewed-by: Julien Grall <julien.grall@citrix.com>
    Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

commit 42290f02715e62bfe9edf32daac1b224758b7ae4
Author: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Date:   Thu Jan 26 14:16:02 2017 +0100

    xen/arm: acpi: Relax hw domain mapping attributes to p2m_mmio_direct_c
    
    Since the hardware domain is a trusted domain, we extend the
    trust to include making final decisions on what attributes to
    use when mapping memory regions.
    
    For ACPI configured hardware domains, this patch relaxes the hardware
    domains mapping attributes to p2m_mmio_direct_c. This will allow the
    hardware domain to control the attributes via its S1 mappings.
    
    Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
    Acked-by: Julien Grall <julien.grall@arm.com>
    Acked-by: Stefano Stabellini <sstabellini@kernel.org>

commit bd684c2d0aae7edc587f8dfd3dbffef739c853e4
Author: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Date:   Thu Jan 26 14:16:01 2017 +0100

    Revert "xen/arm: Map mmio-sram nodes as un-cached memory"
    
    This reverts commit 1e75ed8b64bc1a9b47e540e6f100f17ec6d97f1b.
    
    The default attribute mapping for MMIO as been relaxed and now rely on
    the hardware domain to set the correct memory attribute
    
    Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
    Acked-by: Stefano Stabellini <sstabellini@kernel.org>

commit 783b67073f4e0348af617a1f470f991814254ae2
Author: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Date:   Thu Jan 26 14:16:00 2017 +0100

    xen/arm: dt: Relax hw domain mapping attributes to p2m_mmio_direct_c
    
    Since the hardware domain is a trusted domain, we extend the
    trust to include making final decisions on what attributes to
    use when mapping memory regions.
    
    For device-tree configured hardware domains, this patch relaxes
    the hardware domains mapping attributes to p2m_mmio_direct_c.
    This will allow the hardware domain to control the attributes
    via its S1 mappings.
    
    Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
    Reviewed-by: Julien Grall <julien.grall@arm.com>
    Acked-by: Stefano Stabellini <sstabellini@kernel.org>

commit 07f9ddfc9abe9d25288168dfe3c4b830b416f33b
Author: Tamas K Lengyel <tamas.lengyel@zentific.com>
Date:   Fri Jan 27 11:25:23 2017 -0700

    xen/arm: flush icache as well when XEN_DOMCTL_cacheflush is issued
    
    When the toolstack modifies memory of a running ARM VM it may happen
    that the underlying memory of a current vCPU PC is changed. Without
    flushing the icache the vCPU may continue executing stale instructions.
    
    Also expose the xc_domain_cacheflush through xenctrl.h.
    
    Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com>
    Acked-by: Wei Liu <wei.liu2@citrix.com>
    Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

commit d31d0cd810b038f4711553d07b26aee6f4b80934
Author: Stefano Stabellini <sstabellini@kernel.org>
Date:   Wed Dec 21 18:15:10 2016 -0800

    xen/arm: fix GIC_INVALID_LR
    
    GIC_INVALID_LR should be 0xff, but actually, defined as ~(uint8_t)0, is
    0xffffffff. Fix the problem by placing the ~ operator before the cast.
    
    Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
    Reviewed-by: Julien Grall <julien.grall@arm.com>

commit b2e678e81dd9635eb33279e2817168d13b78c1fa
Author: Stefano Stabellini <sstabellini@kernel.org>
Date:   Thu Dec 8 17:17:04 2016 -0800

    fix out of bound access to mode_strings
    
    mode == ARRAY_SIZE(mode_strings) causes an out of bound access to
    the mode_strings array.
    
    Coverity-ID: 1381859
    
    Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
    Reviewed-by: Julien Grall <julien.grall@arm.com>

commit 05946b58420c693748366b7c6f71ec2ec2456242
Author: Stefano Stabellini <sstabellini@kernel.org>
Date:   Thu Dec 8 16:59:28 2016 -0800

    missing vgic_unlock_rank in gic_remove_irq_from_guest
    
    Add missing vgic_unlock_rank on the error path in
    gic_remove_irq_from_guest.
    
    Coverity-ID: 1381843
    
    Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
    Reviewed-by: Julien Grall <julien.grall@arm.com>

commit e020ff3fff796459399015460929edefa8c94568
Author: Artem Mygaiev <artem_mygaiev@epam.com>
Date:   Tue Dec 6 16:16:45 2016 +0200

    xen/arm: Fix macro for ARM Jazelle CPU feature identification
    
    Fix macro for ARM Jazelle CPU feature identification: value of 0 indicates
    that CPU does not support ARM Jazelle (ID_PFR0[11:8])
    
    Coverity-ID: 1381849
    
    Signed-off-by: Artem Mygaiev <artem_mygaiev@epam.com>
    Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
    Reviewed-by: Julien Grall <julien.grall@arm.com>

commit 308c646ee6f11fa87d67343005942a3186a69206
Author: Julien Grall <julien.grall@arm.com>
Date:   Mon Dec 5 17:43:23 2016 +0000

    xen/arm: traps: Emulate ICC_SRE_EL1 as RAZ/WI
    
    Recent Linux kernel (4.4 and onwards [1]) is checking whether it is possible
    to enable sysreg access (ICC_SRE_EL1.SRE) when the ID register
    (ID_AA64PRF0_EL1.GIC) is reporting the presence of the sysreg interface.
    
    When the guest has been configured to use GICv2, the hypervisor will
    disable sysreg access for this vm (via ICC_SRE_EL2.Enable) and therefore
    access to system register such as ICC_SRE_EL1 are trapped in EL2.
    
    However, ICC_SRE_EL1 is not emulated by the hypervisor. This means that
    Linux will crash as soon as it is trying to access ICC_SRE_EL1.
    
    To solve this problem, Xen can implement ICC_SRE_EL1 as read-as-zero
    write-ignore. The emulation will only be used when sysreg are disabled
    for EL1.
    
    [1]  963fcd409 "arm64: cpufeatures: Check ICC_EL1_SRE.SRE before
    enabling ARM64_HAS_SYSREG_GIC_CPUIF"
    
    Signed-off-by: Julien Grall <julien.grall@arm.com>
    Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

commit fceae911f6e7af87cd31321385d779b47eff1857
Author: Artem Mygaiev <artem_mygaiev@epam.com>
Date:   Wed Nov 30 15:53:11 2016 +0200

    xen/arm: Fix misplaced parentheses for PSCI version check
    
    Fix misplaced parentheses for PSCI version check
    
    Signed-off-by: Artem Mygaiev <artem_mygaiev@epam.com>
    Reviewed-by: Julien Grall <julien.grall@arm.com>
    Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

commit f66739326c9de51acc15e8b6b335b3781b4e3f48
Author: Oleksandr Tyshchenko <olekstysh@gmail.com>
Date:   Fri Dec 2 18:38:16 2016 +0200

    arm/irq: Reorder check when the IRQ is already used by someone
    
    Call irq_get_domain for the IRQ we are interested in
    only after making sure that it is the guest IRQ to avoid
    ASSERT(test_bit(_IRQ_GUEST, &desc->status)) triggering.
    
    Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
    Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>
    Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

commit 768b250b31361bf8acfef4b7dca61ee37c91f3f6
Author: Jun Sun <jsun@junsun.net>
Date:   Mon Oct 10 12:27:56 2016 -0700

    Don't clear HCR_VM bit when updating VTTBR.
    
    Currently function p2m_restore_state() would clear HCR_VM bit, i.e.,
    disabling stage2 translation, before updating VTTBR register. After
    some research and talking to ARM support, I got confirmed that this is not
    necessary. We are currently working on a new platform that would need this
    to be removed.
    
    The patch is tested on FVP foundation model.
    
    Signed-off-by: Jun Sun <jsun@junsun.net>
    Acked-by: Steve Capper <steve.capper@linaro.org>
    Acked-by: Stefano Stabellini <sstabellini@kernel.org>

commit 049b13dce84655cd73ac4acc051e7df46af00a4f
Author: Andrew Cooper <andrew.cooper3@citrix.com>
Date:   Tue Mar 14 12:43:25 2017 +0100

    x86/emul: Correct the decoding of mov to/from cr/dr
    
    The mov to/from cr/dr behave as if they were encoded with Mod = 3.  When
    encoded with Mod != 3, no displacement or SIB bytes are fetched.
    
    Add a test with a deliberately malformed ModRM byte.  (Also add the
    automatically-generated simd.h to .gitignore.)
    
    Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
    Reviewed-by: Jan Beulich <jbeulich@suse.com>
    master commit: c2e316b2f220af06dab76b1219e61441c31f6ff9
    master date: 2017-03-07 17:29:16 +0000

commit e26a2a00169bad403c9dcc597218080626cee861
Author: Jan Beulich <jbeulich@suse.com>
Date:   Tue Mar 14 12:42:58 2017 +0100

    x86emul: correct decoding of vzero{all,upper}
    
    These VEX encoded insns aren't followed by a ModR/M byte.
    
    Signed-off-by: Jan Beulich <jbeulich@suse.com>
    Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
    master commit: 26735f30dffe1091686bbe921aacbea8ba371cc8
    master date: 2017-03-02 16:08:27 +0100

commit 866f3636f832ecae0260b04e90b8de432aaa3129
Author: Dario Faggioli <dario.faggioli@citrix.com>
Date:   Tue Mar 14 12:42:19 2017 +0100

    xen: credit2: don't miss accounting while doing a credit reset.
    
    A credit reset basically means going through all the
    vCPUs of a runqueue and altering their credits, as a
    consequence of a 'scheduling epoch' having come to an
    end.
    
    Blocked or runnable vCPUs are fine, all the credits
    they've spent running so far have been accounted to
    them when they were scheduled out.
    
    But if a vCPU is running on a pCPU, when a reset event
    occurs (on another pCPU), that does not get properly
    accounted. Let's therefore begin to do so, for better
    accuracy and fairness.
    
    In fact, after this patch, we see this in a trace:
    
     csched2:schedule cpu 10, rq# 1, busy, not tickled
     csched2:burn_credits d1v5, credit = 9998353, delta = 202996
     runstate_continue d1v5 running->running
     ...
     csched2:schedule cpu 12, rq# 1, busy, not tickled
     csched2:burn_credits d1v6, credit = -1327, delta = 9999544
     csched2:reset_credits d0v13, credit_start = 10500000, credit_end = 10500000, mult = 1
     csched2:reset_credits d0v14, credit_start = 10500000, credit_end = 10500000, mult = 1
     csched2:reset_credits d0v7, credit_start = 10500000, credit_end = 10500000, mult = 1
     csched2:burn_credits d1v5, credit = 201805, delta = 9796548
     csched2:reset_credits d1v5, credit_start = 201805, credit_end = 10201805, mult = 1
     csched2:burn_credits d1v6, credit = -1327, delta = 0
     csched2:reset_credits d1v6, credit_start = -1327, credit_end = 9998673, mult = 1
    
    Which shows how d1v5 actually executed for ~9.796 ms,
    on pCPU 10, when reset_credit() is executed, on pCPU
    12, because of d1v6's credits going below 0.
    
    Without this patch, this 9.796ms are not accounted
    to anyone. With this patch, d1v5 is charged for that,
    and its credits drop down from 9796548 to 201805.
    
    And this is important, as it means that it will
    begin the new epoch with 10201805 credits, instead
    of 10500000 (which he would have, before this patch).
    
    Basically, we were forgetting one round of accounting
    in epoch x, for the vCPUs that are running at the time
    the epoch ends. And this meant favouring a little bit
    these same vCPUs, in epoch x+1, providing them with
    the chance of execute longer than their fair share.
    
    Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
    Reviewed-by: George Dunlap <george.dunlap@citrix.com>
    master commit: 4fa4f8a3cd5afd4980ad9517755d002dc316abdc
    master date: 2017-03-01 16:56:34 +0000

commit 354c3e4c728b5e8f04dc8d9eabfa316e7823cbc5
Author: Dario Faggioli <dario.faggioli@citrix.com>
Date:   Tue Mar 14 12:41:54 2017 +0100

    xen: credit2: always mark a tickled pCPU as... tickled!
    
    In fact, whether or not a pCPU has been tickled, and is
    therefore about to re-schedule, is something we look at
    and base decisions on in various places.
    
    So, let's make sure that we do that basing on accurate
    information.
    
    While there, also tweak a little bit smt_idle_mask_clear()
    (used for implementing SMT support), so that it only alter
    the relevant cpumask when there is the actual need for this.
    (This is only for reduced overhead, behavior remains the
    same).
    
    Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
    Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
    master commit: a76645240bd14e964e85dbc975a8989edea6aa27
    master date: 2017-03-01 16:56:34 +0000

commit 8c2da8f4649bf5e29b6f3338132e36369e8f5700
Author: Andrew Cooper <andrew.cooper3@citrix.com>
Date:   Tue Mar 14 12:41:21 2017 +0100

    x86/layout: Correct Xen's idea of its own memory layout
    
    c/s b4cd59fe "x86: reorder .data and .init when linking" had an unintended
    side effect, where xen_in_range() and the tboot S3 MAC were no longer correct.
    
    In practice, it means that Xen's .data section is excluded from consideration,
    which means:
     1) Default IOMMU construction for the hardware domain could create mappings.
     2) .data isn't included in the tboot MAC checked on resume from S3.
    
    Adjust the comments and virtual address anchors used to define the regions.
    
    Reported-by: Jan Beulich <jbeulich@suse.com>
    Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
    Reviewed-by: Jan Beulich <jbeulich@suse.com>
    master commit: c9a4a1c419cebac83a8fb60c4532ad8ccc973dc4
    master date: 2017-02-28 16:18:38 +0000

commit 6289c3b7c4756bca341ba59e4e246706040f7919
Author: Andrew Cooper <andrew.cooper3@citrix.com>
Date:   Tue Mar 14 12:40:36 2017 +0100

    x86/vmx: Don't leak host syscall MSR state into HVM guests
    
    hvm_hw_cpu->msr_flags is in fact the VMX dirty bitmap of MSRs needing to be
    restored when switching into guest context.  It should never have been part of
    the migration state to start with, and Xen must not make any decisions based
    on the value seen during restore.
    
    Identify it as obsolete in the header files, consistently save it as zero and
    ignore it on restore.
    
    The MSRs must be considered dirty during VMCS creation to cause the proper
    defaults of 0 to be visible to the guest.
    
    Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
    Reviewed-by: Jan Beulich <jbeulich@suse.com>
    Reviewed-by: Kevin Tian <kevin.tian@intel.com>
    master commit: 2f1add6e1c8789d979daaafa3d80ddc1bc375783
    master date: 2017-02-21 11:06:39 +0000

commit 2e68fda962226d4de916d5ceab9d9d6037d94d45
Author: Stefano Stabellini <sstabellini@kernel.org>
Date:   Thu Mar 2 17:15:26 2017 -0800

    xen/arm: fix affected memory range by dcache clean functions
    
    clean_dcache_va_range and clean_and_invalidate_dcache_va_range don't
    calculate the range correctly when "end" is not cacheline aligned. As a
    result, the last cacheline is not skipped. Fix the issue by aligning the
    start address to the cacheline size.
    
    In addition, make the code simpler and faster in
    invalidate_dcache_va_range, by removing the module operation and using
    bitmasks instead. Also remove the size adjustments in
    invalidate_dcache_va_range, because the size variable is not used later
    on.
    
    Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
    Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
    Reviewed-by: Julien Grall <julien.grall@arm.com>
    Tested-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>

commit f85fc979a6859541dc1bf583817ca5cce9287e1e
Author: Stefano Stabellini <sstabellini@kernel.org>
Date:   Wed Mar 1 11:43:15 2017 -0800

    xen/arm: introduce vwfi parameter
    
    Introduce new Xen command line parameter called "vwfi", which stands for
    virtual wfi. The default is "trap": Xen traps guest wfi and wfe
    instructions. In the case of wfi, Xen calls vcpu_block on the guest
    vcpu; in the case of guest wfe, Xen calls vcpu_yield on the guest vcpu.
    The behavior can be changed by setting vwfi to "native", in that case
    Xen doesn't trap neither wfi nor wfe, running them in guest context.
    
    The result is strong reduction in irq latency (from 5000ns to 2000ns,
    measured using https://github.com/edgarigl/tbm, the physical timer, and
    1 pcpu dedicated to 1 vcpu). The downside is that the scheduler thinks
    that the guest is busy when actually is sleeping, leading to suboptimal
    scheduling decisions.
    
    Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
    Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
    Reviewed-by: Julien Grall <julien.grall@arm.com>

commit 9967251965a4cea19e6f69f7c5472174c4c5b971
Author: Julien Grall <julien.grall@arm.com>
Date:   Fri Feb 24 10:01:59 2017 +0100

    arm/p2m: remove the page from p2m->pages list before freeing it
    
    The p2m code is using the page list field to link all the pages used
    for the stage-2 page tables. The page is added into the p2m->pages
    list just after the allocation but never removed from the list.
    
    The page list field is also used by the allocator, not removing may
    result a later Xen crash due to inconsistency (see [1]).
    
    This bug was introduced by the reworking of p2m code in commit 2ef3e36ec7
    "xen/arm: p2m: Introduce p2m_set_entry and __p2m_set_entry".
    
    [1] https://lists.xenproject.org/archives/html/xen-devel/2017-02/msg00524.html
    
    Reported-by: Vijaya Kumar K <Vijaya.Kumar@cavium.com>
    Signed-off-by: Julien Grall <julien.grall@arm.com>
    Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
    master commit: cf5e1a74b9687be3d146e59ab10c26be6da9d0d4
    master date: 2017-02-24 09:58:50 +0100

commit 34305da2df62c67a559c20d22bdd25b549bfd1d8
Author: Ian Jackson <ian.jackson@eu.citrix.com>
Date:   Wed Feb 22 16:26:41 2017 +0000

    QEMU_TAG update

commit 437a8e63adb3b2f819dd11557e65d9cda331c9b1
Author: Jan Beulich <jbeulich@suse.com>
Date:   Mon Feb 20 15:58:02 2017 +0100

    VMX: fix VMCS race on context-switch paths
    
    When __context_switch() is being bypassed during original context
    switch handling, the vCPU "owning" the VMCS partially loses control of
    it: It will appear non-running to remote CPUs, and hence their attempt
    to pause the owning vCPU will have no effect on it (as it already
    looks to be paused). At the same time the "owning" CPU will re-enable
    interrupts eventually (the lastest when entering the idle loop) and
    hence becomes subject to IPIs from other CPUs requesting access to the
    VMCS. As a result, when __context_switch() finally gets run, the CPU
    may no longer have the VMCS loaded, and hence any accesses to it would
    fail. Hence we may need to re-load the VMCS in vmx_ctxt_switch_from().
    
    For consistency use the new function also in vmx_do_resume(), to
    avoid leaving an open-coded incarnation of it around.
    
    Reported-by: Kevin Mayer <Kevin.Mayer@gdata.de>
    Reported-by: Anshul Makkar <anshul.makkar@citrix.com>
    Signed-off-by: Jan Beulich <jbeulich@suse.com>
    Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
    Acked-by: Kevin Tian <kevin.tian@intel.com>
    Reviewed-by: Sergey Dyasli <sergey.dyasli@citrix.com>
    Tested-by: Sergey Dyasli <sergey.dyasli@citrix.com>
    master commit: 2f4d2198a9b3ba94c959330b5c94fe95917c364c
    master date: 2017-02-17 15:49:56 +0100

commit 9028ba82efca076609d11f33ed6fa2a636ae9e58
Author: George Dunlap <george.dunlap@citrix.com>
Date:   Mon Feb 20 15:57:37 2017 +0100

    xen/p2m: Fix p2m_flush_table for non-nested cases
    
    Commit 71bb7304e7a7a35ea6df4b0cedebc35028e4c159 added flushing of
    nested p2m tables whenever the host p2m table changed.  Unfortunately
    in the process, it added a filter to p2m_flush_table() function so
    that the p2m would only be flushed if it was being used as a nested
    p2m.  This meant that the p2m was not being flushed at all for altp2m
    callers.
    
    Only check np2m_base if p2m_class for nested p2m's.
    
    NB that this is not a security issue: The only time this codepath is
    called is in cases where either nestedp2m or altp2m is enabled, and
    neither of them are in security support.
    
    Reported-by: Matt Leinhos <matt@starlab.io>
    Signed-off-by: George Dunlap <george.dunlap@citrix.com>
    Reviewed-by: Tim Deegan <tim@xen.org>
    Tested-by: Tamas K Lengyel <tamas@tklengyel.com>
    master commit: 6192e6378e094094906950120470a621d5b2977c
    master date: 2017-02-15 17:15:56 +0000

commit 1c28394aaab9727f5ce9c5f53e8617c50687d0dc
Author: David Woodhouse <dwmw@amazon.com>
Date:   Mon Feb 20 15:56:48 2017 +0100

    x86/ept: allow write-combining on !mfn_valid() MMIO mappings again
    
    For some MMIO regions, such as those high above RAM, mfn_valid() will
    return false.
    
    Since the fix for XSA-154 in commit c61a6f74f80e ("x86: enforce
    consistent cachability of MMIO mappings"), guests have no longer been
    able to use PAT to obtain write-combining on such regions because the
    'ignore PAT' bit is set in EPT.
    
    We probably want to err on the side of caution and preserve that
    behaviour for addresses in mmio_ro_ranges, but not for normal MMIO
    mappings. That necessitates a slight refactoring to check mfn_valid()
    later, and let the MMIO case get through to the right code path.
    
    Since we're not bailing out for !mfn_valid() immediately, the range
    checks need to be adjusted to cope — simply by masking in the low bits
    to account for 'order' instead of adding, to avoid overflow when the mfn
    is INVALID_MFN (which happens on unmap, since we carefully call this
    function to fill in the EMT even though the PTE won't be valid).
    
    The range checks are also slightly refactored to put only one of them in
    the fast path in the common case. If it doesn't overlap, then it
    *definitely* isn't contained, so we don't need both checks. And if it
    overlaps and is only one page, then it definitely *is* contained.
    
    Finally, add a comment clarifying how that 'return -1' works — it isn't
    returning an error and causing the mapping to fail; it relies on
    resolve_misconfig() being able to split the mapping later. So it's
    *only* sane to do it where order>0 and the 'problem' will be solved by
    splitting the large page. Not for blindly returning 'error', which I was
    tempted to do in my first attempt.
    
    Signed-off-by: David Woodhouse <dwmw@amazon.com>
    Reviewed-by: Jan Beulich <jbeulich@suse.com>
    Reviewed-by: Kevin Tian <kevin.tian@intel.com>
    master commit: 30921dc2df3665ca1b2593595aa6725ff013d386
    master date: 2017-02-07 14:30:01 +0100

commit c24629612fea2d44c8f03f0a2583e44dbbfc5e05
Author: Oleksandr Tyshchenko <olekstysh@gmail.com>
Date:   Wed Feb 15 12:20:48 2017 +0000

    IOMMU: always call teardown callback
    
    There is a possible scenario when (d)->need_iommu remains unset
    during guest domain execution. For example, when no devices
    were assigned to it. Taking into account that teardown callback
    is not called when (d)->need_iommu is unset we might have unreleased
    resourses after destroying domain.
    
    So, always call teardown callback to roll back actions
    that were performed in init callback.
    
    This is XSA-207.
    
    Signed-off-by: Oleksandr Tyshchenko <olekstysh@gmail.com>
    Reviewed-by: Jan Beulich <jbeulich@suse.com>
    Tested-by: Jan Beulich <jbeulich@suse.com>
    Tested-by: Julien Grall <julien.grall@arm.com>

commit 10baa197d218c222f298ac5ba0d4ef5afd1401ff
Author: George Dunlap <george.dunlap@citrix.com>
Date:   Thu Feb 9 10:25:58 2017 +0100

    x86/emulate: don't assume that addr_size == 32 implies protected mode
    
    Callers of x86_emulate() generally define addr_size based on the code
    segment.  In vm86 mode, the code segment is set by the hardware to be
    16-bits; but it is entirely possible to enable protected mode, set the
    CS to 32-bits, and then disable protected mode.  (This is commonly
    called "unreal mode".)
    
    But the instruction decoder only checks for protected mode when
    addr_size == 16.  So in unreal mode, hardware will throw a #UD for VEX
    prefixes, but our instruction decoder will decode them, triggering an
    ASSERT() further on in _get_fpu().  (With debug=n the emulator will
    incorrectly emulate the instruction rather than throwing a #UD, but
    this is only a bug, not a crash, so it's not a security issue.)
    
    Teach the instruction decoder to check that we're in protected mode,
    even if addr_size is 32.
    
    Signed-off-by: George Dunlap <george.dunlap@citrix.com>
    
    Split real mode and VM86 mode handling, as VM86 mode is strictly 16-bit
    at all times. Re-base.
    
    Signed-off-by: Jan Beulich <jbeulich@suse.com>
    Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
    master commit: 05118b1596ffe4559549edbb28bd0124a7316123
    master date: 2017-01-25 15:09:55 +0100

commit 4582c2b9597ff4b5be3f6b26449a3b8a0872e46e
Author: Dario Faggioli <dario.faggioli@citrix.com>
Date:   Thu Feb 9 10:25:33 2017 +0100

    xen: credit2: fix shutdown/suspend when playing with cpupools.
    
    In fact, during shutdown/suspend, we temporarily move all
    the vCPUs to the BSP (i.e., pCPU 0, as of now). For Credit2
    domains, we call csched2_vcpu_migrate(), expects to find the
    target pCPU in the domain's pool
    
    Therefore, if Credit2 is the default scheduler and we have
    removed pCPU 0 from cpupool0, shutdown/suspend fails like
    this:
    
     RIP:    e008:[<ffff82d08012906d>] sched_credit2.c#migrate+0x274/0x2d1
     Xen call trace:
        [<ffff82d08012906d>] sched_credit2.c#migrate+0x274/0x2d1
        [<ffff82d080129138>] sched_credit2.c#csched2_vcpu_migrate+0x6e/0x86
        [<ffff82d08012c468>] schedule.c#vcpu_move_locked+0x69/0x6f
        [<ffff82d08012ec14>] cpu_disable_scheduler+0x3d7/0x430
        [<ffff82d08019669b>] __cpu_disable+0x299/0x2b0
        [<ffff82d0801012f8>] cpu.c#take_cpu_down+0x2f/0x38
        [<ffff82d0801312d8>] stop_machine.c#stopmachine_action+0x7f/0x8d
        [<ffff82d0801330b8>] tasklet.c#do_tasklet_work+0x74/0xab
        [<ffff82d0801333ed>] do_tasklet+0x66/0x8b
        [<ffff82d080166a73>] domain.c#idle_loop+0x3b/0x5e
    
     ****************************************
     Panic on CPU 8:
     Assertion 'svc->vcpu->processor < nr_cpu_ids' failed at sched_credit2.c:1729
     ****************************************
    
    On the other hand, if Credit2 is the scheduler of another
    pool, when trying (still during shutdown/suspend) to move
    the vCPUs of the Credit2 domains to pCPU 0, it figures
    out that pCPU 0 is not a Credit2 pCPU, and fails like this:
    
     RIP:    e008:[<ffff82d08012916b>] sched_credit2.c#csched2_vcpu_migrate+0xa1/0x107
     Xen call trace:
        [<ffff82d08012916b>] sched_credit2.c#csched2_vcpu_migrate+0xa1/0x107
        [<ffff82d08012c4e9>] schedule.c#vcpu_move_locked+0x69/0x6f
        [<ffff82d08012edfc>] cpu_disable_scheduler+0x3d7/0x430
        [<ffff82d08019687b>] __cpu_disable+0x299/0x2b0
        [<ffff82d0801012f8>] cpu.c#take_cpu_down+0x2f/0x38
        [<ffff82d0801314c0>] stop_machine.c#stopmachine_action+0x7f/0x8d
        [<ffff82d0801332a0>] tasklet.c#do_tasklet_work+0x74/0xab
        [<ffff82d0801335d5>] do_tasklet+0x66/0x8b
        [<ffff82d080166c53>] domain.c#idle_loop+0x3b/0x5e
    
    The solution is to recognise the specific situation, inside
    csched2_vcpu_migrate() and, considering it is something temporary,
    which only happens during shutdown/suspend, quickly deal with it.
    
    Then, in the resume path, in restore_vcpu_affinity(), things
    are set back to normal, and a new v->processor is chosen, for
    each vCPU, from the proper set of pCPUs (i.e., the ones of
    the proper cpupool).
    
    Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
    Acked-by: George Dunlap <george.dunlap@citrix.com>
    
    xen: credit2: non Credit2 pCPUs are ok during shutdown/suspend.
    
    Commit 7478ebe1602e6 ("xen: credit2: fix shutdown/suspend
    when playing with cpupools"), while doing the right thing
    for actual code, forgot to update the ASSERT()s accordingly,
    in csched2_vcpu_migrate().
    
    In fact, as stated there already, during shutdown/suspend,
    we must allow a Credit2 vCPU to temporarily migrate to a
    non Credit2 BSP, without any ASSERT() triggering.
    
    Move them down, after the check for whether or not we are
    shutting down, where the assumption that the pCPU must be
    valid Credit2 ones, is valid.
    
    Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
    master commit: 7478ebe1602e6bb8242a18840b15757a1d5ad18a
    master date: 2017-01-24 17:02:07 +0000
    master commit: ad5808d9057248e7879cf375662f0a449fff7005
    master date: 2017-02-01 14:44:51 +0000

commit a20300baf5714ed6098a4068e0f464d6971fe0a7
Author: Dario Faggioli <dario.faggioli@citrix.com>
Date:   Thu Feb 9 10:24:56 2017 +0100

    xen: credit2: never consider CPUs outside of our cpupool.
    
    In fact, relying on the mask of what pCPUs belong to
    which Credit2 runqueue is not enough. If we only do that,
    when Credit2 is the boot scheduler, we may ASSERT() or
    panic when moving a pCPU from Pool-0 to another cpupool.
    
    This is because pCPUs outside of any pool are considered
    part of cpupool0. This puts us at risk of crash when those
    same pCPUs are added to another pool and something
    different than the idle domain is found to be running
    on them.
    
    Note that, even if we prevent the above to happen (which
    is the purpose of this patch), this is still pretty bad,
    in fact, when we remove a pCPU from Pool-0:
    - in Credit1, as we do *not* update prv->ncpus and
      prv->credit, which means we're considering the wrong
      total credits when doing accounting;
    - in Credit2, the pCPU remains part of one runqueue,
      and is hence at least considered during load balancing,
      even if no vCPU should really run there.
    
    In Credit1, this "only" causes skewed accounting and
    no crashes because there is a lot of `cpumask_and`ing
    going on with the cpumask of the domains' cpupool
    (which, BTW, comes at a price).
    
    A quick and not to involved (and easily backportable)
    solution for Credit2, is to do exactly the same.
    
    Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com
    Acked-by: George Dunlap <george.dunlap@citrix.com>
    master commit: e7191920261d20e52ca4c06a03589a1155981b04
    master date: 2017-01-24 17:02:07 +0000

commit 23e33036f8d5f33add75d7fbecad13bcb2cb239e
Author: Dario Faggioli <dario.faggioli@citrix.com>
Date:   Thu Feb 9 10:24:32 2017 +0100

    xen: credit2: use the correct scratch cpumask.
    
    In fact, there is one scratch mask per each CPU. When
    you use the one of a CPU, it must be true that:
     - the CPU belongs to your cpupool and scheduler,
     - you own the runqueue lock (the one you take via
       {v,p}cpu_schedule_lock()) for that CPU.
    
    This was not the case within the following functions:
    
    get_fallback_cpu(), csched2_cpu_pick(): as we can't be
    sure we either are on, or hold the lock for, the CPU
    that is in the vCPU's 'v->processor'.
    
    migrate(): it's ok, when called from balance_load(),
    because that comes from csched2_schedule(), which takes
    the runqueue lock of the CPU where it executes. But it is
    not ok when we come from csched2_vcpu_migrate(), which
    can be called from other places.
    
    The fix is to explicitly use the scratch space of the
    CPUs for which we know we hold the runqueue lock.
    
    Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
    Reported-by: Jan Beulich <JBeulich@suse.com>
    Reviewed-by: George Dunlap <george.dunlap@citrix.com>
    master commit: 548db8742872399936a2090cbcdfd5e1b34fcbcc
    master date: 2017-01-24 17:02:07 +0000

commit 95f1f99a7a2a7c8fbf0eeb1dc6b8473d6e09f535
Author: Joao Martins <joao.m.martins@oracle.com>
Date:   Thu Feb 9 10:23:52 2017 +0100

    x86/hvm: do not set msr_tsc_adjust on hvm_set_guest_tsc_fixed
    
    Commit 6e03363 ("x86: Implement TSC adjust feature for HVM guest")
    implemented TSC_ADJUST MSR for hvm guests. Though while booting
    an HVM guest the boot CPU would have a value set with delta_tsc -
    guest tsc while secondary CPUS would have 0. For example one can
    observe:
     $ xen-hvmctx 17 | grep tsc_adjust
     TSC_ADJUST: tsc_adjust ff9377dfef47fe66
     TSC_ADJUST: tsc_adjust 0
     TSC_ADJUST: tsc_adjust 0
     TSC_ADJUST: tsc_adjust 0
    
    Upcoming Linux 4.10 now validates whether this MSR is correct and
    adjusts them accordingly under the following conditions: values of < 0
    (our case for CPU 0) or != 0 or values > 7FFFFFFF. In this conditions it
    will force set to 0 and for the CPUs that the value doesn't match all
    together. If this msr is not correct we would see messages such as:
    
    [Firmware Bug]: TSC ADJUST: CPU0: -30517044286984129 force to 0
    
    And on HVM guests supporting TSC_ADJUST (requiring at least Haswell
    Intel) it won't boot.
    
    Our current vCPU 0 values are incorrect and according to Intel SDM which on
    section "Time-Stamp Counter Adjustment" states that "On RESET, the value
    of the IA32_TSC_ADJUST MSR is 0." hence we should set it 0 and be
    consistent across multiple vCPUs. Perhaps this MSR should be only
    changed by the guest which already happens through
    hvm_set_guest_tsc_adjust(..) routines (see below). After this patch
    guests running Linux 4.10 will see a valid IA32_TSC_ADJUST msr of value
     0 for all CPUs and are able to boot.
    
    On the same section of the spec ("Time-Stamp Counter Adjustment") it is
    also stated:
    "If an execution of WRMSR to the IA32_TIME_STAMP_COUNTER MSR
     adds (or subtracts) value X from the TSC, the logical processor also
     adds (or subtracts) value X from the IA32_TSC_ADJUST MSR.
    
     Unlike the TSC, the value of the IA32_TSC_ADJUST MSR changes only in
     response to WRMSR (either to the MSR itself, or to the
     IA32_TIME_STAMP_COUNTER MSR). Its value does not otherwise change as
     time elapses. Software seeking to adjust the TSC can do so by using
     WRMSR to write the same value to the IA32_TSC_ADJUST MSR on each logical
     processor."
    
    This suggests these MSRs values should only be changed through guest i.e.
    throught write intercept msrs. We keep IA32_TSC MSR logic such that writes
    accomodate adjustments to TSC_ADJUST, hence no functional change in the
    msr_tsc_adjust for IA32_TSC msr. Though, we do that in a separate routine
    namely hvm_set_guest_tsc_msr instead of through hvm_set_guest_tsc(...).
    
    Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
    Reviewed-by: Jan Beulich <jbeulich@suse.com>
    master commit: 98297f09bd07bb63407909aae1d309d8adeb572e
    master date: 2017-01-24 12:37:36 +0100

commit 9b0e6d34cb8e05d9ec5e308576c559f0aac5ba55
Author: Jan Beulich <jbeulich@suse.com>
Date:   Thu Feb 9 10:23:22 2017 +0100

    x86emul: correct FPU stub asm() constraints
    
    Properly inform the compiler about fic's role as both an input (its
    insn_bytes field) and output (its exn_raised field).
    
    Take the opportunity and bring emulate_fpu_insn_stub() more in line
    with emulate_fpu_insn_stub_eflags().
    
    Signed-off-by: Jan Beulich <jbeulich@suse.com>
    Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
    master commit: 3dfbb8df335f12297cfc7db9d3df2b74c474921b
    master date: 2017-01-24 12:35:59 +0100

commit b843de7f541037e8ff5779a017b837c71e7804af
Author: Jan Beulich <jbeulich@suse.com>
Date:   Thu Feb 9 10:22:55 2017 +0100

    x86: segment attribute handling adjustments
    
    Null selector loads into SS (possible in 64-bit mode only, and only in
    rings other than ring 3) must not alter SS.DPL. (This was found to be
    an issue on KVM, and fixed in Linux commit 33ab91103b.)
    
    Further arch_set_info_hvm_guest() didn't make sure that the ASSERT()s
    in hvm_set_segment_register() wouldn't trigger: Add further checks, but
    tolerate (adjust) clear accessed (CS, SS, DS, ES) and busy (TR) bits.
    
    Finally the setting of the accessed bits for user segments was lost by
    commit dd5c85e312 ("x86/hvm: Reposition the modification of raw segment
    data from the VMCB/VMCS"), yet VMX requires them to be set for usable
    segments. Add respective ASSERT()s (the only path not properly setting
    them was arch_set_info_hvm_guest()).
    
    Signed-off-by: Jan Beulich <jbeulich@suse.com>
    Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
    master commit: 366ff5f1b3252f9069d5aedb2ffc2567bb0a37c9
    master date: 2017-01-20 14:39:12 +0100

commit ba7e250cc48d068b3777ffddc2bb8b2f43d05e53
Author: Jan Beulich <jbeulich@suse.com>
Date:   Thu Feb 9 10:22:28 2017 +0100

    x86emul: LOCK check adjustments
    
    BT, being encoded as DstBitBase just like BT{C,R,S}, nevertheless does
    not write its (register or memory) operand and hence also doesn't allow
    a LOCK prefix to be used.
    
    Signed-off-by: Jan Beulich <jbeulich@suse.com>
    Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
    master commit: f2d4f4ba80de8a03a1b0f300d271715a88a8433d
    master date: 2017-01-20 14:37:33 +0100

commit 6240d926c4cfe5f83fc940e61d1c0418a8710791
Author: Jan Beulich <jbeulich@suse.com>
Date:   Thu Feb 9 10:21:50 2017 +0100

    x86emul: VEX.B is ignored in compatibility mode
    
    While VEX.R and VEX.X are guaranteed to be 1 in compatibility mode
    (and hence a respective mode_64bit() check can be dropped), VEX.B can
    be encoded as zero, but would be ignored by the processor. Since we
    emulate instructions in 64-bit mode (except possibly in the test
    harness), we need to force the bit to 1 in order to not act on the
    wrong {X,Y,Z}MM register (which has no bad effect on 32-bit test
    harness builds, as there the bit would again be ignored by the
    hardware, and would by default be expected to be 1 anyway).
    
    We must not, however, fiddle with the high bit of VEX.VVVV in the
    decode phase, as that would undermine the checking of instructions
    requiring the field to be all ones independent of mode. This is
    being enforced in copy_REX_VEX() instead.
    
    Signed-off-by: Jan Beulich <jbeulich@suse.com>
    Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
    
    x86emul: correct VEX/XOP/EVEX operand size handling for 16-bit code
    
    Operand size defaults to 32 bits in that case, but would not have been
    set that way in the absence of an operand size override.
    
    Reported-by: Wei Liu <wei.liu2@citrix.com> (by AFL fuzzing)
    Signed-off-by: Jan Beulich <jbeulich@suse.com>
    Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
    master commit: 89c76ee7f60777b81c8fd0475a6af7c84e72a791
    master date: 2017-01-17 10:32:25 +0100
    master commit: beb82042447c5d6e7073d816d6afc25c5a423cde
    master date: 2017-01-25 15:08:59 +0100

commit b378b1f9fa4796b5048e8ac0c58cdbb6307a55c4
Author: Andrew Cooper <andrew.cooper3@citrix.com>
Date:   Thu Feb 9 10:20:45 2017 +0100

    x86/xstate: Fix array overrun on hardware with LWP
    
    c/s da62246e4c "x86/xsaves: enable xsaves/xrstors/xsavec in xen" introduced
    setup_xstate_features() to allocate and fill xstate_offsets[] and
    xstate_sizes[].
    
    However, fls() casts xfeature_mask to 32bits which truncates LWP out of the
    calculation.  As a result, the arrays are allocated too short, and the cpuid
    infrastructure reads off the end of them when calculating xstate_size for the
    guest.
    
    On one test system, this results in 0x3fec83c0 being returned as the maximum
    size of an xsave area, which surprisingly appears not to bother Windows or
    Linux too much.  I suspect they both use current size based on xcr0, which Xen
    forwards from real hardware.
    
    Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
    Reviewed-by: Jan Beulich <jbeulich@suse.com>
    master commit: fe0d67576e335c02becf1cea8e67005509fa90b6
    master date: 2017-01-16 17:37:26 +0000

commit b29aed8b0355fe9f7d49faa9aef12b2f8f983c2c
Author: Tamas K Lengyel <tamas.lengyel@zentific.com>
Date:   Wed Jan 25 09:12:01 2017 -0700

    arm/p2m: Fix regression during domain shutdown with active mem_access
    
    The change in commit 438c5fe4f0c introduced a regression for domains where
    mem_acces is or was active. When relinquish_p2m_mapping attempts to clear
    a page where the order is not 0 the following ASSERT is triggered:
    
        ASSERT(!p2m->mem_access_enabled || page_order == 0);
    
    This regression was unfortunately not caught during testing in preparation
    for the 4.8 release.
    
    In this patch we adjust the ASSERT to not trip when the domain
    is being shutdown.
    
    Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com>
    Acked-by: Julien Grall <julien.grall@arm.com>

diff --git a/Config.mk b/Config.mk
index a83a205..d9ebcb7 100644
--- a/Config.mk
+++ b/Config.mk
@@ -277,8 +277,8 @@ SEABIOS_UPSTREAM_URL ?= git://xenbits.xen.org/seabios.git
 MINIOS_UPSTREAM_URL ?= git://xenbits.xen.org/mini-os.git
 endif
 OVMF_UPSTREAM_REVISION ?= bc54e50e0fe03c570014f363b547426913e92449
-QEMU_UPSTREAM_REVISION ?= qemu-xen-4.8.0
-MINIOS_UPSTREAM_REVISION ?= xen-RELEASE-4.8.0
+QEMU_UPSTREAM_REVISION ?= qemu-xen-4.8.1
+MINIOS_UPSTREAM_REVISION ?= xen-RELEASE-4.8.1
 # Wed Sep 28 11:50:04 2016 +0200
 # minios: fix build issue with xen_*mb defines
 
@@ -289,9 +289,7 @@ SEABIOS_UPSTREAM_REVISION ?= rel-1.10.0
 ETHERBOOT_NICS ?= rtl8139 8086100e
 
 
-QEMU_TRADITIONAL_REVISION ?= 095261a9ad5c31b9ed431f8382e8aa223089c85b
-# Mon Nov 14 17:19:46 2016 +0000
-# qemu: ioport_read, ioport_write: be defensive about 32-bit addresses
+QEMU_TRADITIONAL_REVISION ?= xen-4.8.1
 
 # Specify which qemu-dm to use. This may be `ioemu' to use the old
 # Mercurial in-tree version, or a local directory, or a git URL.
diff --git a/debian/changelog b/debian/changelog
index fafbb7e..0e6cf0f 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,3 +1,13 @@
+xen (4.8.1-1) unstable; urgency=high
+
+  * Update to upstream 4.8.1 release.
+    Changes include numerous bugfixes, including security fixes for:
+      XSA-212 / CVE-2017-7228   Closes:#859560
+      XSA-207 / no cve yet      Closes:#856229
+      XSA-206 / no cve yet      no Debian bug
+
+ -- Ian Jackson <ian.jackson@eu.citrix.com>  Tue, 18 Apr 2017 18:05:00 +0100
+
 xen (4.8.1~pre.2017.01.23-1) unstable; urgency=medium
 
   * Update to current upstream stable-4.8 git branch (Xen 4.8.1-pre).
diff --git a/debian/control.md5sum b/debian/control.md5sum
index d2d7fcf..218cada 100644
--- a/debian/control.md5sum
+++ b/debian/control.md5sum
@@ -1,4 +1,4 @@
-d74356cd54456cb07dc4a89ff001c233  debian/changelog
+414390ca652da67ac85ebd905500eb66  debian/changelog
 dc7b5d9f0538e3180af4e9aff9b0bd57  debian/bin/gencontrol.py
 20e336dbea44b1641802eff0dde9569b  debian/templates/control.main.in
 a15fa64ce6deead28d33c1581b14dba7  debian/templates/xen-hypervisor.postinst.in
diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index 0138978..54acc60 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1619,6 +1619,21 @@ Note that if **watchdog** option is also specified vpmu will be turned off.
 As the virtualisation is not 100% safe, don't use the vpmu flag on
 production systems (see http://xenbits.xen.org/xsa/advisory-163.html)!
 
+### vwfi
+> `= trap | native
+
+> Default: `trap`
+
+WFI is the ARM instruction to "wait for interrupt". WFE is similar and
+means "wait for event". This option, which is ARM specific, changes the
+way guest WFI and WFE are implemented in Xen. By default, Xen traps both
+instructions. In the case of WFI, Xen blocks the guest vcpu; in the case
+of WFE, Xen yield the guest vcpu. When setting vwfi to `native`, Xen
+doesn't trap either instruction, running them in guest context. Setting
+vwfi to `native` reduces irq latency significantly. It can also lead to
+suboptimal scheduling decisions, but only when the system is
+oversubscribed (i.e., in total there are more vCPUs than pCPUs).
+
 ### watchdog
 > `= force | <boolean>`
 
diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 2c83544..a71e98e 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2710,6 +2710,14 @@ int xc_livepatch_revert(xc_interface *xch, char *name, uint32_t timeout);
 int xc_livepatch_unload(xc_interface *xch, char *name, uint32_t timeout);
 int xc_livepatch_replace(xc_interface *xch, char *name, uint32_t timeout);
 
+/*
+ * Ensure cache coherency after memory modifications. A call to this function
+ * is only required on ARM as the x86 architecture provides cache coherency
+ * guarantees. Calling this function on x86 is allowed but has no effect.
+ */
+int xc_domain_cacheflush(xc_interface *xch, uint32_t domid,
+                         xen_pfn_t start_pfn, xen_pfn_t nr_pfns);
+
 /* Compat shims */
 #include "xenctrl_compat.h"
 
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 296b852..98ab6ba 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -74,10 +74,10 @@ int xc_domain_cacheflush(xc_interface *xch, uint32_t domid,
     /*
      * The x86 architecture provides cache coherency guarantees which prevent
      * the need for this hypercall.  Avoid the overhead of making a hypercall
-     * just for Xen to return -ENOSYS.
+     * just for Xen to return -ENOSYS.  It is safe to ignore this call on x86
+     * so we just return 0.
      */
-    errno = ENOSYS;
-    return -1;
+    return 0;
 #else
     DECLARE_DOMCTL;
     domctl.cmd = XEN_DOMCTL_cacheflush;
diff --git a/tools/libxc/xc_private.c b/tools/libxc/xc_private.c
index d57c39a..9ba4b73 100644
--- a/tools/libxc/xc_private.c
+++ b/tools/libxc/xc_private.c
@@ -64,8 +64,7 @@ struct xc_interface_core *xc_interface_open(xentoollog_logger *logger,
         goto err;
 
     xch->fmem = xenforeignmemory_open(xch->error_handler, 0);
-
-    if ( xch->xcall == NULL )
+    if ( xch->fmem == NULL )
         goto err;
 
     return xch;
diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h
index 97445ae..fddebdc 100644
--- a/tools/libxc/xc_private.h
+++ b/tools/libxc/xc_private.h
@@ -366,9 +366,6 @@ void bitmap_byte_to_64(uint64_t *lp, const uint8_t *bp, int nbits);
 /* Optionally flush file to disk and discard page cache */
 void discard_file_cache(xc_interface *xch, int fd, int flush);
 
-int xc_domain_cacheflush(xc_interface *xch, uint32_t domid,
-			 xen_pfn_t start_pfn, xen_pfn_t nr_pfns);
-
 #define MAX_MMU_UPDATES 1024
 struct xc_mmu {
     mmu_update_t updates[MAX_MMU_UPDATES];
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 0386f28..acf714e 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -2255,7 +2255,8 @@ static void device_disk_add(libxl__egc *egc, uint32_t domid,
             case LIBXL_DISK_BACKEND_QDISK:
                 flexarray_append(back, "params");
                 flexarray_append(back, GCSPRINTF("%s:%s",
-                              libxl__device_disk_string_of_format(disk->format), disk->pdev_path));
+                              libxl__device_disk_string_of_format(disk->format),
+                              disk->pdev_path ? : ""));
                 if (libxl_defbool_val(disk->colo_enable)) {
                     flexarray_append(back, "colo-host");
                     flexarray_append(back, libxl__sprintf(gc, "%s", disk->colo_host));
diff --git a/tools/ocaml/xenstored/Makefile b/tools/ocaml/xenstored/Makefile
index 1769e55..d238836 100644
--- a/tools/ocaml/xenstored/Makefile
+++ b/tools/ocaml/xenstored/Makefile
@@ -53,6 +53,7 @@ OBJS = paths \
 	domains \
 	connection \
 	connections \
+	history \
 	parse_arg \
 	process \
 	xenstored
diff --git a/tools/ocaml/xenstored/connection.ml b/tools/ocaml/xenstored/connection.ml
index 3ffd35b..a66d2f7 100644
--- a/tools/ocaml/xenstored/connection.ml
+++ b/tools/ocaml/xenstored/connection.ml
@@ -296,3 +296,8 @@ let debug con =
 	let domid = get_domstr con in
 	let watches = List.map (fun (path, token) -> Printf.sprintf "watch %s: %s %s\n" domid path token) (list_watches con) in
 	String.concat "" watches
+
+let decr_conflict_credit doms con =
+	match con.dom with
+	| None -> () (* It's a socket connection. We don't know which domain we're in, so treat it as if it's free to conflict *)
+	| Some dom -> Domains.decr_conflict_credit doms dom
diff --git a/tools/ocaml/xenstored/connections.ml b/tools/ocaml/xenstored/connections.ml
index f9bc225..ae76928 100644
--- a/tools/ocaml/xenstored/connections.ml
+++ b/tools/ocaml/xenstored/connections.ml
@@ -44,12 +44,14 @@ let add_domain cons dom =
 	| Some p -> Hashtbl.add cons.ports p con;
 	| None -> ()
 
-let select cons =
-	Hashtbl.fold
-		(fun _ con (ins, outs) ->
-		 let fd = Connection.get_fd con in
-		 (fd :: ins,  if Connection.has_output con then fd :: outs else outs))
-		cons.anonymous ([], [])
+let select ?(only_if = (fun _ -> true)) cons =
+	Hashtbl.fold (fun _ con (ins, outs) ->
+		if (only_if con) then (
+			let fd = Connection.get_fd con in
+			(fd :: ins,  if Connection.has_output con then fd :: outs else outs)
+		) else (ins, outs)
+	)
+	cons.anonymous ([], [])
 
 let find cons =
 	Hashtbl.find cons.anonymous
diff --git a/tools/ocaml/xenstored/define.ml b/tools/ocaml/xenstored/define.ml
index e9d957f..5a604d1 100644
--- a/tools/ocaml/xenstored/define.ml
+++ b/tools/ocaml/xenstored/define.ml
@@ -29,6 +29,10 @@ let maxwatch = ref (50)
 let maxtransaction = ref (20)
 let maxrequests = ref (-1)   (* maximum requests per transaction *)
 
+let conflict_burst_limit = ref 5.0
+let conflict_max_history_seconds = ref 0.05
+let conflict_rate_limit_is_aggregate = ref true
+
 let domid_self = 0x7FF0
 
 exception Not_a_directory of string
diff --git a/tools/ocaml/xenstored/domain.ml b/tools/ocaml/xenstored/domain.ml
index ab34314..4515650 100644
--- a/tools/ocaml/xenstored/domain.ml
+++ b/tools/ocaml/xenstored/domain.ml
@@ -31,8 +31,13 @@ type t =
 	mutable io_credit: int; (* the rounds of ring process left to do, default is 0,
 	                           usually set to 1 when there is work detected, could
 	                           also set to n to give "lazy" clients extra credit *)
+	mutable conflict_credit: float; (* Must be positive to perform writes; a commit
+	                                   that later causes conflict with another
+	                                   domain's transaction costs credit. *)
+	mutable caused_conflicts: int64;
 }
 
+let is_dom0 d = d.id = 0
 let get_path dom = "/local/domain/" ^ (sprintf "%u" dom.id)
 let get_id domain = domain.id
 let get_interface d = d.interface
@@ -48,6 +53,10 @@ let set_io_credit ?(n=1) domain = domain.io_credit <- max 0 n
 let incr_io_credit domain = domain.io_credit <- domain.io_credit + 1
 let decr_io_credit domain = domain.io_credit <- max 0 (domain.io_credit - 1)
 
+let is_paused_for_conflict dom = dom.conflict_credit <= 0.0
+
+let is_free_to_conflict = is_dom0
+
 let string_of_port = function
 | None -> "None"
 | Some x -> string_of_int (Xeneventchn.to_int x)
@@ -84,6 +93,12 @@ let make id mfn remote_port interface eventchn = {
 	port = None;
 	bad_client = false;
 	io_credit = 0;
+	conflict_credit = !Define.conflict_burst_limit;
+	caused_conflicts = 0L;
 }
 
-let is_dom0 d = d.id = 0
+let log_and_reset_conflict_stats logfn dom =
+	if dom.caused_conflicts > 0L then (
+		logfn dom.id dom.caused_conflicts;
+		dom.caused_conflicts <- 0L
+	)
diff --git a/tools/ocaml/xenstored/domains.ml b/tools/ocaml/xenstored/domains.ml
index 395f3a9..fdae298 100644
--- a/tools/ocaml/xenstored/domains.ml
+++ b/tools/ocaml/xenstored/domains.ml
@@ -15,20 +15,77 @@
  *)
 
 let debug fmt = Logging.debug "domains" fmt
+let error fmt = Logging.error "domains" fmt
+let warn fmt  = Logging.warn  "domains" fmt
 
 type domains = {
 	eventchn: Event.t;
 	table: (Xenctrl.domid, Domain.t) Hashtbl.t;
+
+	(* N.B. the Queue module is not thread-safe but oxenstored is single-threaded. *)
+	(* Domains queue up to regain conflict-credit; we have a queue for
+	   domains that are carrying some penalty and so are below the
+	   maximum credit, and another queue for domains that have run out of
+	   credit and so have had their access paused. *)
+	doms_conflict_paused: (Domain.t option ref) Queue.t;
+	doms_with_conflict_penalty: (Domain.t option ref) Queue.t;
+
+	(* A callback function to be called when we go from zero to one paused domain.
+	   This will be to reset the countdown until the next unit of credit is issued. *)
+	on_first_conflict_pause: unit -> unit;
+
+	(* If config is set to use individual instead of aggregate conflict-rate-limiting,
+	   we use these counts instead of the queues. The second one includes the first. *)
+	mutable n_paused: int;    (* Number of domains with zero or negative credit *)
+	mutable n_penalised: int; (* Number of domains with less than maximum credit *)
 }
 
-let init eventchn =
-	{ eventchn = eventchn; table = Hashtbl.create 10 }
+let init eventchn on_first_conflict_pause = {
+	eventchn = eventchn;
+	table = Hashtbl.create 10;
+	doms_conflict_paused = Queue.create ();
+	doms_with_conflict_penalty = Queue.create ();
+	on_first_conflict_pause = on_first_conflict_pause;
+	n_paused = 0;
+	n_penalised = 0;
+}
 let del doms id = Hashtbl.remove doms.table id
 let exist doms id = Hashtbl.mem doms.table id
 let find doms id = Hashtbl.find doms.table id
 let number doms = Hashtbl.length doms.table
 let iter doms fct = Hashtbl.iter (fun _ b -> fct b) doms.table
 
+let rec is_empty_queue q =
+	Queue.is_empty q ||
+		if !(Queue.peek q) = None
+		then (
+			ignore (Queue.pop q);
+			is_empty_queue q
+		) else false
+
+let all_at_max_credit doms =
+	if !Define.conflict_rate_limit_is_aggregate
+	then
+		(* Check both becuase if burst limit is 1.0 then a domain can go straight
+		 * from max-credit to paused without getting into the penalty queue. *)
+		is_empty_queue doms.doms_with_conflict_penalty
+		&& is_empty_queue doms.doms_conflict_paused
+	else doms.n_penalised = 0
+
+(* Functions to handle queues of domains given that the domain might be deleted while in a queue. *)
+let push dom queue =
+	Queue.push (ref (Some dom)) queue
+
+let rec pop queue =
+	match !(Queue.pop queue) with
+	| None -> pop queue
+	| Some x -> x
+
+let remove_from_queue dom queue =
+	Queue.iter (fun d -> match !d with
+		| None -> ()
+		| Some x -> if x=dom then d := None) queue
+
 let cleanup xc doms =
 	let notify = ref false in
 	let dead_dom = ref [] in
@@ -52,6 +109,11 @@ let cleanup xc doms =
 		let dom = Hashtbl.find doms.table id in
 		Domain.close dom;
 		Hashtbl.remove doms.table id;
+		if dom.Domain.conflict_credit <= !Define.conflict_burst_limit
+		then (
+			remove_from_queue dom doms.doms_with_conflict_penalty;
+			if (dom.Domain.conflict_credit <= 0.) then remove_from_queue dom doms.doms_conflict_paused
+		)
 	) !dead_dom;
 	!notify, !dead_dom
 
@@ -82,3 +144,74 @@ let create0 doms =
 	Domain.bind_interdomain dom;
 	Domain.notify dom;
 	dom
+
+let decr_conflict_credit doms dom =
+	dom.Domain.caused_conflicts <- Int64.add 1L dom.Domain.caused_conflicts;
+	let before = dom.Domain.conflict_credit in
+	let after = max (-1.0) (before -. 1.0) in
+	debug "decr_conflict_credit dom%d %F -> %F" (Domain.get_id dom) before after;
+	dom.Domain.conflict_credit <- after;
+	let newly_penalised =
+		before >= !Define.conflict_burst_limit
+		&& after < !Define.conflict_burst_limit in
+	let newly_paused = before > 0.0 && after <= 0.0 in
+	if !Define.conflict_rate_limit_is_aggregate then (
+		if newly_penalised
+		&& after > 0.0
+		then (
+			push dom doms.doms_with_conflict_penalty
+		) else if newly_paused
+		then (
+			let first_pause = Queue.is_empty doms.doms_conflict_paused in
+			push dom doms.doms_conflict_paused;
+			if first_pause then doms.on_first_conflict_pause ()
+		) else (
+			(* The queues are correct already: no further action needed. *)
+		)
+	) else (
+		if newly_penalised then doms.n_penalised <- doms.n_penalised + 1;
+		if newly_paused then (
+			doms.n_paused <- doms.n_paused + 1;
+			if doms.n_paused = 1 then doms.on_first_conflict_pause ()
+		)
+	)
+
+(* Give one point of credit to one domain, and update the queues appropriately. *)
+let incr_conflict_credit_from_queue doms =
+	let process_queue q requeue_test =
+		let d = pop q in
+		let before = d.Domain.conflict_credit in (* just for debug-logging *)
+		d.Domain.conflict_credit <- min (d.Domain.conflict_credit +. 1.0) !Define.conflict_burst_limit;
+		debug "incr_conflict_credit_from_queue: dom%d: %F -> %F" (Domain.get_id d) before d.Domain.conflict_credit;
+		if requeue_test d.Domain.conflict_credit then (
+			push d q (* Make it queue up again for its next point of credit. *)
+		)
+	in
+	let paused_queue_test cred = cred <= 0.0 in
+	let penalty_queue_test cred = cred < !Define.conflict_burst_limit in
+	try process_queue doms.doms_conflict_paused paused_queue_test
+	with Queue.Empty -> (
+		try process_queue doms.doms_with_conflict_penalty penalty_queue_test
+		with Queue.Empty -> () (* Both queues are empty: nothing to do here. *)
+	)
+
+let incr_conflict_credit doms =
+	if !Define.conflict_rate_limit_is_aggregate
+	then incr_conflict_credit_from_queue doms
+	else (
+		(* Give a point of credit to every domain, subject only to the cap. *)
+		let inc dom =
+			let before = dom.Domain.conflict_credit in
+			let after = min (before +. 1.0) !Define.conflict_burst_limit in
+			dom.Domain.conflict_credit <- after;
+			debug "incr_conflict_credit dom%d: %F -> %F" (Domain.get_id dom) before after;
+
+			if before <= 0.0 && after > 0.0
+			then doms.n_paused <- doms.n_paused - 1;
+
+			if before < !Define.conflict_burst_limit
+			&& after >= !Define.conflict_burst_limit
+			then doms.n_penalised <- doms.n_penalised - 1
+		in
+		if doms.n_penalised > 0 then iter doms inc
+	)
diff --git a/tools/ocaml/xenstored/history.ml b/tools/ocaml/xenstored/history.ml
new file mode 100644
index 0000000..f39565b
--- /dev/null
+++ b/tools/ocaml/xenstored/history.ml
@@ -0,0 +1,73 @@
+(*
+ * Copyright (c) 2017 Citrix Systems Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ *)
+
+type history_record = {
+	con: Connection.t;   (* connection that made a change *)
+	tid: int;            (* transaction id of the change (may be Transaction.none) *)
+	before: Store.t;     (* the store before the change *)
+	after: Store.t;      (* the store after the change *)
+	finish_count: int64; (* the commit-count at which the transaction finished *)
+}
+
+let history : history_record list ref = ref []
+
+(* Called from periodic_ops to ensure we don't discard symbols that are still needed. *)
+(* There is scope for optimisation here, since in consecutive commits one commit's `after`
+ * is the same thing as the next commit's `before`, but not all commits in history are
+ * consecutive. *)
+let mark_symbols () =
+	(* There are gaps where dom0's commits are missing. Otherwise we could assume that
+	 * each element's `before` is the same thing as the next element's `after`
+	 * since the next element is the previous commit *)
+	List.iter (fun hist_rec ->
+			Store.mark_symbols hist_rec.before;
+			Store.mark_symbols hist_rec.after;
+		)
+		!history
+
+(* Keep only enough commit-history to protect the running transactions that we are still tracking *)
+(* There is scope for optimisation here, replacing List.filter with something more efficient,
+ * probably on a different list-like structure. *)
+let trim ?txn () =
+	Transaction.trim_short_running_transactions txn;
+	history := match Transaction.oldest_short_running_transaction () with
+	| None -> [] (* We have no open transaction, so no history is needed *)
+	| Some (_, txn) -> (
+		(* keep records with finish_count recent enough to be relevant *)
+		List.filter (fun r -> r.finish_count > txn.Transaction.start_count) !history
+	)
+
+let end_transaction txn con tid commit =
+	let success = Connection.end_transaction con tid commit in
+	trim ~txn ();
+	success
+
+let push (x: history_record) =
+	let dom = x.con.Connection.dom in
+	match dom with
+	| None -> () (* treat socket connections as always free to conflict *)
+	| Some d -> if not (Domain.is_free_to_conflict d) then history := x :: !history
+
+(* Find the connections from records since commit-count [since] for which [f record] returns [true] *)
+let filter_connections ~ignore ~since ~f =
+	(* The "mem" call is an optimisation, to avoid calling f if we have picked con already. *)
+	(* Using a hash table rather than a list is to optimise the "mem" call. *)
+	List.fold_left (fun acc hist_rec ->
+		if hist_rec.finish_count > since
+		&& not (hist_rec.con == ignore)
+		&& not (Hashtbl.mem acc hist_rec.con)
+		&& f hist_rec
+		then Hashtbl.replace acc hist_rec.con ();
+		acc
+	) (Hashtbl.create 1023) !history
diff --git a/tools/ocaml/xenstored/oxenstored.conf.in b/tools/ocaml/xenstored/oxenstored.conf.in
index 82117a9..536611e 100644
--- a/tools/ocaml/xenstored/oxenstored.conf.in
+++ b/tools/ocaml/xenstored/oxenstored.conf.in
@@ -9,6 +9,38 @@ test-eagain = false
 # Activate transaction merge support
 merge-activate = true
 
+# Limits applied to domains whose writes cause other domains' transaction
+# commits to fail. Must include decimal point.
+
+# The burst limit is the number of conflicts a domain can cause to
+# fail in a short period; this value is used for both the initial and
+# the maximum value of each domain's conflict-credit, which falls by
+# one point for each conflict caused, and when it reaches zero the
+# domain's requests are ignored.
+conflict-burst-limit = 5.0
+
+# The conflict-credit is replenished over time:
+# one point is issued after each conflict-max-history-seconds, so this
+# is the minimum pause-time during which a domain will be ignored.
+conflict-max-history-seconds = 0.05
+
+# If the conflict-rate-limit-is-aggregate flag is true then after each
+# tick one point of conflict-credit is given to just one domain: the
+# one at the front of the queue. If false, then after each tick each
+# domain gets a point of conflict-credit.
+# 
+# In environments where it is known that every transaction will
+# involve a set of nodes that is writable by at most one other domain,
+# then it is safe to set this aggregate-limit flag to false for better
+# performance. (This can be determined by considering the layout of
+# the xenstore tree and permissions, together with the content of the
+# transactions that require protection.)
+# 
+# A transaction which involves a set of nodes which can be modified by
+# multiple other domains can suffer conflicts caused by any of those
+# domains, so the flag must be set to true.
+conflict-rate-limit-is-aggregate = true
+
 # Activate node permission system
 perms-activate = true
 
diff --git a/tools/ocaml/xenstored/process.ml b/tools/ocaml/xenstored/process.ml
index 7b60376..8a688c4 100644
--- a/tools/ocaml/xenstored/process.ml
+++ b/tools/ocaml/xenstored/process.ml
@@ -16,6 +16,7 @@
 
 let error fmt = Logging.error "process" fmt
 let info fmt = Logging.info "process" fmt
+let debug fmt = Logging.debug "process" fmt
 
 open Printf
 open Stdext
@@ -25,6 +26,7 @@ exception Transaction_nested
 exception Domain_not_match
 exception Invalid_Cmd_Args
 
+(* This controls the do_debug fn in this module, not the debug logging-function. *)
 let allow_debug = ref false
 
 let c_int_of_string s =
@@ -293,6 +295,11 @@ let write_response_log ~ty ~tid ~con ~response =
 	| Packet.Reply x -> write_answer_log ~ty ~tid ~con ~data:x
 	| Packet.Error e -> write_answer_log ~ty:(Xenbus.Xb.Op.Error) ~tid ~con ~data:e
 
+let record_commit ~con ~tid ~before ~after =
+	let inc r = r := Int64.add 1L !r in
+	let finish_count = inc Transaction.counter; !Transaction.counter in
+	History.push {History.con=con; tid=tid; before=before; after=after; finish_count=finish_count}
+
 (* Replay a stored transaction against a fresh store, check the responses are
    all equivalent: if so, commit the transaction. Otherwise send the abort to
    the client. *)
@@ -301,25 +308,57 @@ let transaction_replay c t doms cons =
 	| Transaction.No ->
 		error "attempted to replay a non-full transaction";
 		false
-	| Transaction.Full(id, oldroot, cstore) ->
+	| Transaction.Full(id, oldstore, cstore) ->
 		let tid = Connection.start_transaction c cstore in
-		let new_t = Transaction.make tid cstore in
+		let replay_t = Transaction.make ~internal:true tid cstore in
 		let con = sprintf "r(%d):%s" id (Connection.get_domstr c) in
-		let perform_exn (request, response) =
-			write_access_log ~ty:request.Packet.ty ~tid ~con ~data:request.Packet.data;
+
+		let perform_exn ~wlog txn (request, response) =
+			if wlog then write_access_log ~ty:request.Packet.ty ~tid ~con ~data:request.Packet.data;
 			let fct = function_of_type_simple_op request.Packet.ty in
-			let response' = input_handle_error ~cons ~doms ~fct ~con:c ~t:new_t ~req:request in
-			write_response_log ~ty:request.Packet.ty ~tid ~con ~response:response';
-			if not(Packet.response_equal response response') then raise Transaction_again in
+			let response' = input_handle_error ~cons ~doms ~fct ~con:c ~t:txn ~req:request in
+			if wlog then write_response_log ~ty:request.Packet.ty ~tid ~con ~response:response';
+			if not(Packet.response_equal response response') then raise Transaction_again
+		in
 		finally
 		(fun () ->
 			try
 				Logging.start_transaction ~con ~tid;
-				List.iter perform_exn (Transaction.get_operations t);
-				Logging.end_transaction ~con ~tid;
+				List.iter (perform_exn ~wlog:true replay_t) (Transaction.get_operations t); (* May throw EAGAIN *)
 
-				Transaction.commit ~con new_t
-			with e ->
+				Logging.end_transaction ~con ~tid;
+				Transaction.commit ~con replay_t
+			with
+			| Transaction_again -> (
+				Transaction.failed_commits := Int64.add !Transaction.failed_commits 1L;
+				let victim_domstr = Connection.get_domstr c in
+				debug "Apportioning blame for EAGAIN in txn %d, domain=%s" id victim_domstr;
+				let punish guilty_con =
+					debug "Blaming domain %s for conflict with domain %s txn %d"
+						(Connection.get_domstr guilty_con) victim_domstr id;
+					Connection.decr_conflict_credit doms guilty_con
+				in
+				let judge_and_sentence hist_rec = (
+					let can_apply_on store = (
+						let store = Store.copy store in
+						let trial_t = Transaction.make ~internal:true Transaction.none store in
+						try List.iter (perform_exn ~wlog:false trial_t) (Transaction.get_operations t);
+							true
+						with Transaction_again -> false
+					) in
+					if can_apply_on hist_rec.History.before
+					&& not (can_apply_on hist_rec.History.after)
+					then (punish hist_rec.History.con; true)
+					else false
+				) in
+				let guilty_cons = History.filter_connections ~ignore:c ~since:t.Transaction.start_count ~f:judge_and_sentence in
+				if Hashtbl.length guilty_cons = 0 then (
+					debug "Found no culprit for conflict in %s: must be self or not in history." con;
+					Transaction.failed_commits_no_culprit := Int64.add !Transaction.failed_commits_no_culprit 1L
+				);
+				false
+			)
+			| e ->
 				info "transaction_replay %d caught: %s" tid (Printexc.to_string e);
 				false
 			)
@@ -358,13 +397,20 @@ let do_transaction_end con t domains cons data =
 		| x :: _   -> raise (Invalid_argument x)
 		| _        -> raise Invalid_Cmd_Args
 		in
+	let commit = commit && not (Transaction.is_read_only t) in
 	let success =
 		let commit = if commit then Some (fun con trans -> transaction_replay con trans domains cons) else None in
-		Connection.end_transaction con (Transaction.get_id t) commit in
+		History.end_transaction t con (Transaction.get_id t) commit in
 	if not success then
 		raise Transaction_again;
-	if commit then
-		process_watch (List.rev (Transaction.get_paths t)) cons
+	if commit then begin
+		process_watch (List.rev (Transaction.get_paths t)) cons;
+		match t.Transaction.ty with
+		| Transaction.No ->
+			() (* no need to record anything *)
+		| Transaction.Full(id, oldstore, cstore) ->
+			record_commit ~con ~tid:id ~before:oldstore ~after:cstore
+	end
 
 let do_introduce con t domains cons data =
 	if not (Connection.is_dom0 con)
@@ -434,6 +480,37 @@ let function_of_type ty =
 	| _                              -> function_of_type_simple_op ty
 
 (**
+ * Determines which individual (non-transactional) operations we want to retain.
+ * We only want to retain operations that have side-effects in the store since
+ * these can be the cause of transactions failing.
+ *)
+let retain_op_in_history ty =
+	match ty with
+	| Xenbus.Xb.Op.Write
+	| Xenbus.Xb.Op.Mkdir
+	| Xenbus.Xb.Op.Rm
+	| Xenbus.Xb.Op.Setperms          -> true
+	| Xenbus.Xb.Op.Debug
+	| Xenbus.Xb.Op.Directory
+	| Xenbus.Xb.Op.Read
+	| Xenbus.Xb.Op.Getperms
+	| Xenbus.Xb.Op.Watch
+	| Xenbus.Xb.Op.Unwatch
+	| Xenbus.Xb.Op.Transaction_start
+	| Xenbus.Xb.Op.Transaction_end
+	| Xenbus.Xb.Op.Introduce
+	| Xenbus.Xb.Op.Release
+	| Xenbus.Xb.Op.Getdomainpath
+	| Xenbus.Xb.Op.Watchevent
+	| Xenbus.Xb.Op.Error
+	| Xenbus.Xb.Op.Isintroduced
+	| Xenbus.Xb.Op.Resume
+	| Xenbus.Xb.Op.Set_target
+	| Xenbus.Xb.Op.Restrict
+	| Xenbus.Xb.Op.Reset_watches
+	| Xenbus.Xb.Op.Invalid           -> false
+
+(**
  * Nothrow guarantee.
  *)
 let process_packet ~store ~cons ~doms ~con ~req =
@@ -448,7 +525,19 @@ let process_packet ~store ~cons ~doms ~con ~req =
 			else
 				Connection.get_transaction con tid
 			in
-		let response = input_handle_error ~cons ~doms ~fct ~con ~t ~req in
+
+		let execute () = input_handle_error ~cons ~doms ~fct ~con ~t ~req in
+
+		let response =
+			(* Note that transactions are recorded in history separately. *)
+			if tid = Transaction.none && retain_op_in_history ty then begin
+				let before = Store.copy store in
+				let response = execute () in
+				let after = Store.copy store in
+				record_commit ~con ~tid ~before ~after;
+				response
+			end else execute ()
+		in
 
 		let response = try
 			if tid <> Transaction.none then
diff --git a/tools/ocaml/xenstored/store.ml b/tools/ocaml/xenstored/store.ml
index 223ee21..9f619b8 100644
--- a/tools/ocaml/xenstored/store.ml
+++ b/tools/ocaml/xenstored/store.ml
@@ -211,6 +211,7 @@ let apply rnode path fct =
 	lookup rnode path fct
 end
 
+(* The Store.t type *)
 type t =
 {
 	mutable stat_transaction_coalesce: int;
diff --git a/tools/ocaml/xenstored/transaction.ml b/tools/ocaml/xenstored/transaction.ml
index 6b37fc2..23e7ccf 100644
--- a/tools/ocaml/xenstored/transaction.ml
+++ b/tools/ocaml/xenstored/transaction.ml
@@ -14,6 +14,8 @@
  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  * GNU Lesser General Public License for more details.
  *)
+let error fmt = Logging.error "transaction" fmt
+
 open Stdext
 
 let none = 0
@@ -69,34 +71,73 @@ let can_coalesce oldroot currentroot path =
 	else
 		false
 
-type ty = No | Full of (int * Store.Node.t * Store.t)
+type ty = No | Full of (
+	int *          (* Transaction id *)
+	Store.t *      (* Original store *)
+	Store.t        (* A pointer to the canonical store: its root changes on each transaction-commit *)
+)
 
 type t = {
 	ty: ty;
-	store: Store.t;
+	start_count: int64;
+	store: Store.t; (* This is the store that we change in write operations. *)
 	quota: Quota.t;
 	mutable paths: (Xenbus.Xb.Op.operation * Store.Path.t) list;
 	mutable operations: (Packet.request * Packet.response) list;
 	mutable read_lowpath: Store.Path.t option;
 	mutable write_lowpath: Store.Path.t option;
 }
+let get_id t = match t.ty with No -> none | Full (id, _, _) -> id
 
-let make id store =
-	let ty = if id = none then No else Full(id, Store.get_root store, store) in
-	{
+let counter = ref 0L
+let failed_commits = ref 0L
+let failed_commits_no_culprit = ref 0L
+let reset_conflict_stats () =
+	failed_commits := 0L;
+	failed_commits_no_culprit := 0L
+
+(* Scope for optimisation: different data-structure and functions to search/filter it *)
+let short_running_txns = ref []
+
+let oldest_short_running_transaction () =
+	let rec last = function
+		| [] -> None
+		| [x] -> Some x
+		| x :: xs -> last xs
+	in last !short_running_txns
+
+let trim_short_running_transactions txn =
+	let cutoff = Unix.gettimeofday () -. !Define.conflict_max_history_seconds in
+	let keep = match txn with
+		| None -> (function (start_time, _) -> start_time >= cutoff)
+		| Some t -> (function (start_time, tx) -> start_time >= cutoff && tx != t)
+	in
+	short_running_txns := List.filter
+		keep
+		!short_running_txns
+
+let make ?(internal=false) id store =
+	let ty = if id = none then No else Full(id, Store.copy store, store) in
+	let txn = {
 		ty = ty;
+		start_count = !counter;
 		store = if id = none then store else Store.copy store;
 		quota = Quota.copy store.Store.quota;
 		paths = [];
 		operations = [];
 		read_lowpath = None;
 		write_lowpath = None;
-	}
+	} in
+	if id <> none && not internal then (
+		let now = Unix.gettimeofday () in
+		short_running_txns := (now, txn) :: !short_running_txns
+	);
+	txn
 
-let get_id t = match t.ty with No -> none | Full (id, _, _) -> id
 let get_store t = t.store
 let get_paths t = t.paths
 
+let is_read_only t = t.paths = []
 let add_wop t ty path = t.paths <- (ty, path) :: t.paths
 let add_operation ~perm t request response =
 	if !Define.maxrequests >= 0
@@ -155,7 +196,7 @@ let commit ~con t =
 	let has_commited =
 	match t.ty with
 	| No                         -> true
-	| Full (id, oldroot, cstore) ->
+	| Full (id, oldstore, cstore) ->       (* "cstore" meaning current canonical store *)
 		let commit_partial oldroot cstore store =
 			(* get the lowest path of the query and verify that it hasn't
 			   been modified by others transactions. *)
@@ -198,7 +239,7 @@ let commit ~con t =
 		if !test_eagain && Random.int 3 = 0 then
 			false
 		else
-			try_commit oldroot cstore t.store
+			try_commit (Store.get_root oldstore) cstore t.store
 		in
 	if has_commited && has_write_ops then
 		Disk.write t.store;
diff --git a/tools/ocaml/xenstored/xenstored.ml b/tools/ocaml/xenstored/xenstored.ml
index 2efcce6..5474ece 100644
--- a/tools/ocaml/xenstored/xenstored.ml
+++ b/tools/ocaml/xenstored/xenstored.ml
@@ -53,14 +53,16 @@ let process_connection_fds store cons domains rset wset =
 
 let process_domains store cons domains =
 	let do_io_domain domain =
-		if not (Domain.is_bad_domain domain) then
-			let io_credit = Domain.get_io_credit domain in
-			if io_credit > 0 then (
-				let con = Connections.find_domain cons (Domain.get_id domain) in
-				Process.do_input store cons domains con;
-				Process.do_output store cons domains con;
-				Domain.decr_io_credit domain;
-			) in
+		if Domain.is_bad_domain domain
+		|| Domain.get_io_credit domain <= 0
+		|| Domain.is_paused_for_conflict domain
+		then () (* nothing to do *)
+		else (
+			let con = Connections.find_domain cons (Domain.get_id domain) in
+			Process.do_input store cons domains con;
+			Process.do_output store cons domains con;
+			Domain.decr_io_credit domain
+		) in
 	Domains.iter domains do_io_domain
 
 let sigusr1_handler store =
@@ -89,6 +91,9 @@ let parse_config filename =
 	let pidfile = ref default_pidfile in
 	let options = [
 		("merge-activate", Config.Set_bool Transaction.do_coalesce);
+		("conflict-burst-limit", Config.Set_float Define.conflict_burst_limit);
+		("conflict-max-history-seconds", Config.Set_float Define.conflict_max_history_seconds);
+		("conflict-rate-limit-is-aggregate", Config.Set_bool Define.conflict_rate_limit_is_aggregate);
 		("perms-activate", Config.Set_bool Perms.activate);
 		("quota-activate", Config.Set_bool Quota.activate);
 		("quota-maxwatch", Config.Set_int Define.maxwatch);
@@ -260,7 +265,23 @@ let _ =
 
 	let store = Store.create () in
 	let eventchn = Event.init () in
-	let domains = Domains.init eventchn in
+	let next_frequent_ops = ref 0. in
+	let advance_next_frequent_ops () =
+		next_frequent_ops := (Unix.gettimeofday () +. !Define.conflict_max_history_seconds)
+	in
+	let delay_next_frequent_ops_by duration =
+		next_frequent_ops := !next_frequent_ops +. duration
+	in
+	let domains = Domains.init eventchn advance_next_frequent_ops in
+
+	(* For things that need to be done periodically but more often
+	 * than the periodic_ops function *)
+	let frequent_ops () =
+		if Unix.gettimeofday () > !next_frequent_ops then (
+			History.trim ();
+			Domains.incr_conflict_credit domains;
+			advance_next_frequent_ops ()
+		) in
 	let cons = Connections.create () in
 
 	let quit = ref false in
@@ -356,6 +377,7 @@ let _ =
 	let last_scan_time = ref 0. in
 
 	let periodic_ops now =
+		debug "periodic_ops starting";
 		(* we garbage collect the string->int dictionary after a sizeable amount of operations,
 		 * there's no need to be really fast even if we got loose
 		 * objects since names are often reuse.
@@ -365,6 +387,7 @@ let _ =
 			Symbol.mark_all_as_unused ();
 			Store.mark_symbols store;
 			Connections.iter cons Connection.mark_symbols;
+			History.mark_symbols ();
 			Symbol.garbage ()
 		end;
 
@@ -374,7 +397,11 @@ let _ =
 
 		(* make sure we don't print general stats faster than 2 min *)
 		if now > (!last_stat_time +. 120.) then (
+			info "Transaction conflict statistics for last %F seconds:" (now -. !last_stat_time);
 			last_stat_time := now;
+			Domains.iter domains (Domain.log_and_reset_conflict_stats (info "Dom%d caused %Ld conflicts"));
+			info "%Ld failed transactions; of these no culprit was found for %Ld" !Transaction.failed_commits !Transaction.failed_commits_no_culprit;
+			Transaction.reset_conflict_stats ();
 
 			let gc = Gc.stat () in
 			let (lanon, lanon_ops, lanon_watchs,
@@ -392,23 +419,38 @@ let _ =
 			     gc.Gc.heap_words gc.Gc.heap_chunks
 			     gc.Gc.live_words gc.Gc.live_blocks
 			     gc.Gc.free_words gc.Gc.free_blocks
-		)
-		in
+		);
+		let elapsed = Unix.gettimeofday () -. now in
+		debug "periodic_ops took %F seconds." elapsed;
+		delay_next_frequent_ops_by elapsed
+	in
 
-		let period_ops_interval = 15. in
-		let period_start = ref 0. in
+	let period_ops_interval = 15. in
+	let period_start = ref 0. in
 
 	let main_loop () =
-
+		let is_peaceful c =
+			match Connection.get_domain c with
+			| None -> true (* Treat socket-connections as exempt, and free to conflict. *)
+			| Some dom -> not (Domain.is_paused_for_conflict dom)
+		in
+		frequent_ops ();
 		let mw = Connections.has_more_work cons in
+		let peaceful_mw = List.filter is_peaceful mw in
 		List.iter
 			(fun c ->
 			 match Connection.get_domain c with
 			 | None -> () | Some d -> Domain.incr_io_credit d)
-			mw;
+			peaceful_mw;
+		let start_time = Unix.gettimeofday () in
 		let timeout =
-			if List.length mw > 0 then 0. else period_ops_interval in
-		let inset, outset = Connections.select cons in
+			let until_next_activity =
+				if Domains.all_at_max_credit domains
+				then period_ops_interval
+				else min (max 0. (!next_frequent_ops -. start_time)) period_ops_interval in
+			if peaceful_mw <> [] then 0. else until_next_activity
+		in
+		let inset, outset = Connections.select ~only_if:is_peaceful cons in
 		let rset, wset, _ =
 		try
 			Select.select (spec_fds @ inset) outset [] timeout
@@ -418,6 +460,7 @@ let _ =
 			List.partition (fun fd -> List.mem fd spec_fds) rset in
 		if List.length sfds > 0 then
 			process_special_fds sfds;
+
 		if List.length cfds > 0 || List.length wset > 0 then
 			process_connection_fds store cons domains cfds wset;
 		if timeout <> 0. then (
@@ -425,6 +468,7 @@ let _ =
 			if now > !period_start +. period_ops_interval then
 				(period_start := now; periodic_ops now)
 		);
+
 		process_domains store cons domains
 		in
 
diff --git a/tools/tests/x86_emulator/test_x86_emulator.c b/tools/tests/x86_emulator/test_x86_emulator.c
index 9b31a36..7b467fe 100644
--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -163,6 +163,18 @@ static inline uint64_t xgetbv(uint32_t xcr)
     (ebx & (1U << 5)) != 0; \
 })
 
+static int read_segment(
+    enum x86_segment seg,
+    struct segment_register *reg,
+    struct x86_emulate_ctxt *ctxt)
+{
+    if ( !is_x86_user_segment(seg) )
+        return X86EMUL_UNHANDLEABLE;
+    memset(reg, 0, sizeof(*reg));
+    reg->attr.fields.p = 1;
+    return X86EMUL_OKAY;
+}
+
 static int read_cr(
     unsigned int reg,
     unsigned long *val,
@@ -215,6 +227,7 @@ static struct x86_emulate_ops emulops = {
     .write      = write,
     .cmpxchg    = cmpxchg,
     .cpuid      = cpuid,
+    .read_segment = read_segment,
     .read_cr    = read_cr,
     .get_fpu    = get_fpu,
 };
@@ -732,6 +745,27 @@ int main(int argc, char **argv)
         goto fail;
     printf("okay\n");
 
+    printf("%-40s", "Testing mov %%cr4,%%esi (bad ModRM)...");
+    /*
+     * Mod = 1, Reg = 4, R/M = 6 would normally encode a memory reference of
+     * disp8(%esi), but mov to/from cr/dr are special and behave as if they
+     * were encoded with Mod == 3.
+     */
+    instr[0] = 0x0f; instr[1] = 0x20, instr[2] = 0x66;
+    instr[3] = 0; /* Supposed disp8. */
+    regs.esi = 0;
+    regs.eip = (unsigned long)&instr[0];
+    rc = x86_emulate(&ctxt, &emulops);
+    /*
+     * We don't care precicely what gets read from %cr4 into %esi, just so
+     * long as ModRM is treated as a register operand and 0(%esi) isn't
+     * followed as a memory reference.
+     */
+    if ( (rc != X86EMUL_OKAY) ||
+         (regs.eip != (unsigned long)&instr[3]) )
+        goto fail;
+    printf("okay\n");
+
 #define decl_insn(which) extern const unsigned char which[], which##_len[]
 #define put_insn(which, insn) ".pushsection .test, \"ax\", @progbits\n" \
                               #which ": " insn "\n"                     \
diff --git a/tools/xenstore/Makefile b/tools/xenstore/Makefile
index f6dee14..5968f44 100644
--- a/tools/xenstore/Makefile
+++ b/tools/xenstore/Makefile
@@ -34,6 +34,7 @@ XENSTORED_OBJS_$(CONFIG_FreeBSD) = xenstored_posix.o
 XENSTORED_OBJS_$(CONFIG_MiniOS) = xenstored_minios.o
 
 XENSTORED_OBJS += $(XENSTORED_OBJS_y)
+LDLIBS_xenstored += -lrt
 
 ifneq ($(XENSTORE_STATIC_CLIENTS),y)
 LIBXENSTORE := libxenstore.so
@@ -75,7 +76,7 @@ endif
 $(XENSTORED_OBJS): CFLAGS += $(CFLAGS_libxengnttab)
 
 xenstored: $(XENSTORED_OBJS)
-	$(CC) $^ $(LDFLAGS) $(LDLIBS_libxenevtchn) $(LDLIBS_libxengnttab) $(LDLIBS_libxenctrl) $(SOCKET_LIBS) $(call LDFLAGS_RPATH,../lib) -o $@ $(APPEND_LDFLAGS)
+	$(CC) $^ $(LDFLAGS) $(LDLIBS_libxenevtchn) $(LDLIBS_libxengnttab) $(LDLIBS_libxenctrl) $(SOCKET_LIBS) $(LDLIBS_xenstored) $(call LDFLAGS_RPATH,../lib) -o $@ $(APPEND_LDFLAGS)
 
 xenstored.a: $(XENSTORED_OBJS)
 	$(AR) cr $@ $^
diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 3df977b..dc9a26f 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -358,6 +358,7 @@ static void initialize_fds(int sock, int *p_sock_pollfd_idx,
 			   int *ptimeout)
 {
 	struct connection *conn;
+	struct wrl_timestampt now;
 
 	if (fds)
 		memset(fds, 0, sizeof(struct pollfd) * current_array_size);
@@ -377,8 +378,12 @@ static void initialize_fds(int sock, int *p_sock_pollfd_idx,
 		xce_pollfd_idx = set_fd(xenevtchn_fd(xce_handle),
 					POLLIN|POLLPRI);
 
+	wrl_gettime_now(&now);
+	wrl_log_periodic(now);
+
 	list_for_each_entry(conn, &connections, list) {
 		if (conn->domain) {
+			wrl_check_timeout(conn->domain, now, ptimeout);
 			if (domain_can_read(conn) ||
 			    (domain_can_write(conn) &&
 			     !list_empty(&conn->out_list)))
@@ -833,6 +838,7 @@ static void delete_node_single(struct connection *conn, struct node *node)
 		corrupt(conn, "Could not delete '%s'", node->name);
 		return;
 	}
+
 	domain_entry_dec(conn, node);
 }
 
@@ -972,6 +978,7 @@ static void do_write(struct connection *conn, struct buffered_data *in)
 	}
 
 	add_change_node(conn->transaction, name, false);
+	wrl_apply_debit_direct(conn);
 	fire_watches(conn, in, name, false);
 	send_ack(conn, XS_WRITE);
 }
@@ -1003,6 +1010,7 @@ static void do_mkdir(struct connection *conn, struct buffered_data *in)
 			return;
 		}
 		add_change_node(conn->transaction, name, false);
+		wrl_apply_debit_direct(conn);
 		fire_watches(conn, in, name, false);
 	}
 	send_ack(conn, XS_MKDIR);
@@ -1129,6 +1137,7 @@ static void do_rm(struct connection *conn, struct buffered_data *in)
 
 	if (_rm(conn, node, name)) {
 		add_change_node(conn->transaction, name, true);
+		wrl_apply_debit_direct(conn);
 		fire_watches(conn, in, name, true);
 		send_ack(conn, XS_RM);
 	}
@@ -1205,6 +1214,7 @@ static void do_set_perms(struct connection *conn, struct buffered_data *in)
 	}
 
 	add_change_node(conn->transaction, name, false);
+	wrl_apply_debit_direct(conn);
 	fire_watches(conn, in, name, false);
 	send_ack(conn, XS_SET_PERMS);
 }
diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
index ecc614f..9e9d960 100644
--- a/tools/xenstore/xenstored_core.h
+++ b/tools/xenstore/xenstored_core.h
@@ -33,6 +33,12 @@
 #include "list.h"
 #include "tdb.h"
 
+#define MIN(a, b) (((a) < (b))? (a) : (b))
+
+typedef int32_t wrl_creditt;
+#define WRL_CREDIT_MAX (1000*1000*1000)
+/* ^ satisfies non-overflow condition for wrl_xfer_credit */
+
 struct buffered_data
 {
 	struct list_head list;
diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
index 5de93d4..18ac327 100644
--- a/tools/xenstore/xenstored_domain.c
+++ b/tools/xenstore/xenstored_domain.c
@@ -21,6 +21,8 @@
 #include <unistd.h>
 #include <stdlib.h>
 #include <stdarg.h>
+#include <time.h>
+#include <syslog.h>
 
 #include "utils.h"
 #include "talloc.h"
@@ -74,6 +76,11 @@ struct domain
 
 	/* number of watch for this domain */
 	int nbwatch;
+
+	/* write rate limit */
+	wrl_creditt wrl_credit; /* [ -wrl_config_writecost, +_dburst ] */
+	struct wrl_timestampt wrl_timestamp;
+	bool wrl_delay_logged;
 };
 
 static LIST_HEAD(domains);
@@ -206,6 +213,8 @@ static int destroy_domain(void *_domain)
 
 	fire_watches(NULL, domain, "@releaseDomain", false);
 
+	wrl_domain_destroy(domain);
+
 	return 0;
 }
 
@@ -253,6 +262,9 @@ void handle_event(void)
 bool domain_can_read(struct connection *conn)
 {
 	struct xenstore_domain_interface *intf = conn->domain->interface;
+
+	if (domain_is_unprivileged(conn) && conn->domain->wrl_credit < 0)
+		return false;
 	return (intf->req_cons != intf->req_prod);
 }
 
@@ -284,6 +296,8 @@ static struct domain *new_domain(void *context, unsigned int domid,
 	domain->domid = domid;
 	domain->path = talloc_domain_path(domain, domid);
 
+	wrl_domain_new(domain);
+
 	list_add(&domain->list, &domains);
 	talloc_set_destructor(domain, destroy_domain);
 
@@ -751,6 +765,233 @@ int domain_watch(struct connection *conn)
 		: 0;
 }
 
+static wrl_creditt wrl_config_writecost      = WRL_FACTOR;
+static wrl_creditt wrl_config_rate           = WRL_RATE   * WRL_FACTOR;
+static wrl_creditt wrl_config_dburst         = WRL_DBURST * WRL_FACTOR;
+static wrl_creditt wrl_config_gburst         = WRL_GBURST * WRL_FACTOR;
+static wrl_creditt wrl_config_newdoms_dburst =
+	                         WRL_DBURST * WRL_NEWDOMS * WRL_FACTOR;
+
+long wrl_ntransactions;
+
+static long wrl_ndomains;
+static wrl_creditt wrl_reserve; /* [-wrl_config_newdoms_dburst, +_gburst ] */
+static time_t wrl_log_last_warning; /* 0: no previous warning */
+
+void wrl_gettime_now(struct wrl_timestampt *now_wt)
+{
+	struct timespec now_ts;
+	int r;
+
+	r = clock_gettime(CLOCK_MONOTONIC, &now_ts);
+	if (r)
+		barf_perror("Could not find time (clock_gettime failed)");
+
+	now_wt->sec = now_ts.tv_sec;
+	now_wt->msec = now_ts.tv_nsec / 1000000;
+}
+
+static void wrl_xfer_credit(wrl_creditt *debit,  wrl_creditt debit_floor,
+			    wrl_creditt *credit, wrl_creditt credit_ceil)
+	/*
+	 * Transfers zero or more credit from "debit" to "credit".
+	 * Transfers as much as possible while maintaining
+	 * debit >= debit_floor and credit <= credit_ceil.
+	 * (If that's violated already, does nothing.)
+	 *
+	 * Sufficient conditions to avoid overflow, either of:
+	 *  |every argument| <= 0x3fffffff
+	 *  |every argument| <= 1E9
+	 *  |every argument| <= WRL_CREDIT_MAX
+	 * (And this condition is preserved.)
+	 */
+{
+	wrl_creditt xfer = MIN( *debit      - debit_floor,
+			        credit_ceil - *credit      );
+	if (xfer > 0) {
+		*debit -= xfer;
+		*credit += xfer;
+	}
+}
+
+void wrl_domain_new(struct domain *domain)
+{
+	domain->wrl_credit = 0;
+	wrl_gettime_now(&domain->wrl_timestamp);
+	wrl_ndomains++;
+	/* Steal up to DBURST from the reserve */
+	wrl_xfer_credit(&wrl_reserve, -wrl_config_newdoms_dburst,
+			&domain->wrl_credit, wrl_config_dburst);
+}
+
+void wrl_domain_destroy(struct domain *domain)
+{
+	wrl_ndomains--;
+	/*
+	 * Don't bother recalculating domain's credit - this just
+	 * means we don't give the reserve the ending domain's credit
+	 * for time elapsed since last update.
+	 */
+	wrl_xfer_credit(&domain->wrl_credit, 0,
+			&wrl_reserve, wrl_config_dburst);
+}
+
+void wrl_credit_update(struct domain *domain, struct wrl_timestampt now)
+{
+	/*
+	 * We want to calculate
+	 *    credit += (now - timestamp) * RATE / ndoms;
+	 * But we want it to saturate, and to avoid floating point.
+	 * To avoid rounding errors from constantly adding small
+	 * amounts of credit, we only add credit for whole milliseconds.
+	 */
+	long seconds      = now.sec -  domain->wrl_timestamp.sec;
+	long milliseconds = now.msec - domain->wrl_timestamp.msec;
+	long msec;
+	int64_t denom, num;
+	wrl_creditt surplus;
+
+	seconds = MIN(seconds, 1000*1000); /* arbitrary, prevents overflow */
+	msec = seconds * 1000 + milliseconds;
+
+	if (msec < 0)
+                /* shouldn't happen with CLOCK_MONOTONIC */
+		msec = 0;
+
+	/* 32x32 -> 64 cannot overflow */
+	denom = (int64_t)msec * wrl_config_rate;
+	num  =  (int64_t)wrl_ndomains * 1000;
+	/* denom / num <= 1E6 * wrl_config_rate, so with
+	   reasonable wrl_config_rate, denom / num << 2^64 */
+
+	/* at last! */
+	domain->wrl_credit = MIN( (int64_t)domain->wrl_credit + denom / num,
+				  WRL_CREDIT_MAX );
+	/* (maybe briefly violating the DBURST cap on wrl_credit) */
+
+	/* maybe take from the reserve to make us nonnegative */
+	wrl_xfer_credit(&wrl_reserve,        0,
+			&domain->wrl_credit, 0);
+
+	/* return any surplus (over DBURST) to the reserve */
+	surplus = 0;
+	wrl_xfer_credit(&domain->wrl_credit, wrl_config_dburst,
+			&surplus,            WRL_CREDIT_MAX);
+	wrl_xfer_credit(&surplus,     0,
+			&wrl_reserve, wrl_config_gburst);
+	/* surplus is now implicitly discarded */
+
+	domain->wrl_timestamp = now;
+
+	trace("wrl: dom %4d %6ld  msec  %9ld credit   %9ld reserve"
+	      "  %9ld discard\n",
+	      domain->domid,
+	      msec,
+	      (long)domain->wrl_credit, (long)wrl_reserve,
+	      (long)surplus);
+}
+
+void wrl_check_timeout(struct domain *domain,
+		       struct wrl_timestampt now,
+		       int *ptimeout)
+{
+	uint64_t num, denom;
+	int wakeup;
+
+	wrl_credit_update(domain, now);
+
+	if (domain->wrl_credit >= 0)
+		/* not blocked */
+		return;
+
+	if (!*ptimeout)
+		/* already decided on immediate wakeup,
+		   so no need to calculate our timeout */
+		return;
+
+	/* calculate  wakeup = now + -credit / (RATE / ndoms); */
+
+	/* credit cannot go more -ve than one transaction,
+	 * so the first multiplication cannot overflow even 32-bit */
+	num   = (uint64_t)(-domain->wrl_credit * 1000) * wrl_ndomains;
+	denom = wrl_config_rate;
+
+	wakeup = MIN( num / denom /* uint64_t */, INT_MAX );
+	if (*ptimeout==-1 || wakeup < *ptimeout)
+		*ptimeout = wakeup;
+
+	trace("wrl: domain %u credit=%ld (reserve=%ld) SLEEPING for %d\n",
+	      domain->domid,
+	      (long)domain->wrl_credit, (long)wrl_reserve,
+	      wakeup);
+}
+
+#define WRL_LOG(now, ...) \
+	(syslog(LOG_WARNING, "write rate limit: " __VA_ARGS__))
+
+void wrl_apply_debit_actual(struct domain *domain)
+{
+	struct wrl_timestampt now;
+
+	if (!domain)
+		/* sockets escape the write rate limit */
+		return;
+
+	wrl_gettime_now(&now);
+	wrl_credit_update(domain, now);
+
+	domain->wrl_credit -= wrl_config_writecost;
+	trace("wrl: domain %u credit=%ld (reserve=%ld)\n",
+	      domain->domid,
+	      (long)domain->wrl_credit, (long)wrl_reserve);
+
+	if (domain->wrl_credit < 0) {
+		if (!domain->wrl_delay_logged) {
+			domain->wrl_delay_logged = true;
+			WRL_LOG(now, "domain %ld is affected",
+				(long)domain->domid);
+		} else if (!wrl_log_last_warning) {
+			WRL_LOG(now, "rate limiting restarts");
+		}
+		wrl_log_last_warning = now.sec;
+	}
+}
+
+void wrl_log_periodic(struct wrl_timestampt now)
+{
+	if (wrl_log_last_warning &&
+	    (now.sec - wrl_log_last_warning) > WRL_LOGEVERY) {
+		WRL_LOG(now, "not in force recently");
+		wrl_log_last_warning = 0;
+	}
+}
+
+void wrl_apply_debit_direct(struct connection *conn)
+{
+	if (!conn)
+		/* some writes are generated internally */
+		return;
+
+	if (conn->transaction)
+		/* these are accounted for when the transaction ends */
+		return;
+
+	if (!wrl_ntransactions)
+		/* we don't conflict with anyone */
+		return;
+
+	wrl_apply_debit_actual(conn->domain);
+}
+
+void wrl_apply_debit_trans_commit(struct connection *conn)
+{
+	if (wrl_ntransactions <= 1)
+		/* our own transaction appears in the counter */
+		return;
+
+	wrl_apply_debit_actual(conn->domain);
+}
+
 /*
  * Local variables:
  *  c-file-style: "linux"
diff --git a/tools/xenstore/xenstored_domain.h b/tools/xenstore/xenstored_domain.h
index 2554423..561ab5d 100644
--- a/tools/xenstore/xenstored_domain.h
+++ b/tools/xenstore/xenstored_domain.h
@@ -65,4 +65,31 @@ void domain_watch_inc(struct connection *conn);
 void domain_watch_dec(struct connection *conn);
 int domain_watch(struct connection *conn);
 
+/* Write rate limiting */
+
+#define WRL_FACTOR   1000 /* for fixed-point arithmetic */
+#define WRL_RATE      200
+#define WRL_DBURST     10
+#define WRL_GBURST   1000
+#define WRL_NEWDOMS     5
+#define WRL_LOGEVERY  120 /* seconds */
+
+struct wrl_timestampt {
+	time_t sec;
+	int msec;
+};
+
+extern long wrl_ntransactions;
+
+void wrl_gettime_now(struct wrl_timestampt *now_ts);
+void wrl_domain_new(struct domain *domain);
+void wrl_domain_destroy(struct domain *domain);
+void wrl_credit_update(struct domain *domain, struct wrl_timestampt now);
+void wrl_check_timeout(struct domain *domain,
+                       struct wrl_timestampt now,
+                       int *ptimeout);
+void wrl_log_periodic(struct wrl_timestampt now);
+void wrl_apply_debit_direct(struct connection *conn);
+void wrl_apply_debit_trans_commit(struct connection *conn);
+
 #endif /* _XENSTORED_DOMAIN_H */
diff --git a/tools/xenstore/xenstored_transaction.c b/tools/xenstore/xenstored_transaction.c
index 84cb0bf..5059a11 100644
--- a/tools/xenstore/xenstored_transaction.c
+++ b/tools/xenstore/xenstored_transaction.c
@@ -120,6 +120,7 @@ static int destroy_transaction(void *_transaction)
 {
 	struct transaction *trans = _transaction;
 
+	wrl_ntransactions--;
 	trace_destroy(trans, "transaction");
 	if (trans->tdb)
 		tdb_close(trans->tdb);
@@ -183,6 +184,7 @@ void do_transaction_start(struct connection *conn, struct buffered_data *in)
 	talloc_steal(conn, trans);
 	talloc_set_destructor(trans, destroy_transaction);
 	conn->transaction_started++;
+	wrl_ntransactions++;
 
 	snprintf(id_str, sizeof(id_str), "%u", trans->id);
 	send_reply(conn, XS_TRANSACTION_START, id_str, strlen(id_str)+1);
@@ -218,6 +220,9 @@ void do_transaction_end(struct connection *conn, struct buffered_data *in)
 			send_error(conn, EAGAIN);
 			return;
 		}
+
+		wrl_apply_debit_trans_commit(conn);
+
 		if (!replace_tdb(trans->tdb_name, trans->tdb)) {
 			send_error(conn, errno);
 			return;
diff --git a/xen/Makefile b/xen/Makefile
index 22d1361..25bd1f3 100644
--- a/xen/Makefile
+++ b/xen/Makefile
@@ -2,7 +2,7 @@
 # All other places this is stored (eg. compile.h) should be autogenerated.
 export XEN_VERSION       = 4
 export XEN_SUBVERSION    = 8
-export XEN_EXTRAVERSION ?= .1-pre$(XEN_VENDORVERSION)
+export XEN_EXTRAVERSION ?= .1$(XEN_VENDORVERSION)
 export XEN_FULLVERSION   = $(XEN_VERSION).$(XEN_SUBVERSION)$(XEN_EXTRAVERSION)
 -include xen-version
 
diff --git a/xen/arch/arm/alternative.c b/xen/arch/arm/alternative.c
index b9c2b3a..fdf5911 100644
--- a/xen/arch/arm/alternative.c
+++ b/xen/arch/arm/alternative.c
@@ -25,6 +25,7 @@
 #include <xen/vmap.h>
 #include <xen/smp.h>
 #include <xen/stop_machine.h>
+#include <xen/virtual_region.h>
 #include <asm/alternative.h>
 #include <asm/atomic.h>
 #include <asm/byteorder.h>
@@ -155,8 +156,12 @@ static int __apply_alternatives_multi_stop(void *unused)
         int ret;
         struct alt_region region;
         mfn_t xen_mfn = _mfn(virt_to_mfn(_start));
-        unsigned int xen_order = get_order_from_bytes(_end - _start);
+        paddr_t xen_size = _end - _start;
+        unsigned int xen_order = get_order_from_bytes(xen_size);
         void *xenmap;
+        struct virtual_region patch_region = {
+            .list = LIST_HEAD_INIT(patch_region.list),
+        };
 
         BUG_ON(patched);
 
@@ -170,6 +175,15 @@ static int __apply_alternatives_multi_stop(void *unused)
         BUG_ON(!xenmap);
 
         /*
+         * If we generate a new branch instruction, the target will be
+         * calculated in this re-mapped Xen region. So we have to register
+         * this re-mapped Xen region as a virtual region temporarily.
+         */
+        patch_region.start = xenmap;
+        patch_region.end = xenmap + xen_size;
+        register_virtual_region(&patch_region);
+
+        /*
          * Find the virtual address of the alternative region in the new
          * mapping.
          * alt_instr contains relative offset, so the function
@@ -183,6 +197,8 @@ static int __apply_alternatives_multi_stop(void *unused)
         /* The patching is not expected to fail during boot. */
         BUG_ON(ret != 0);
 
+        unregister_virtual_region(&patch_region);
+
         vunmap(xenmap);
 
         /* Barriers provided by the cache flushing */
diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index e8a400c..418b1cc 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -48,20 +48,6 @@ struct map_range_data
     p2m_type_t p2mt;
 };
 
-static const struct dt_device_match dev_map_attrs[] __initconst =
-{
-    {
-        __DT_MATCH_COMPATIBLE("mmio-sram"),
-        __DT_MATCH_PROP("no-memory-wc"),
-        .data = (void *) (uintptr_t) p2m_mmio_direct_dev,
-    },
-    {
-        __DT_MATCH_COMPATIBLE("mmio-sram"),
-        .data = (void *) (uintptr_t) p2m_mmio_direct_nc,
-    },
-    { /* sentinel */ },
-};
-
 //#define DEBUG_11_ALLOCATION
 #ifdef DEBUG_11_ALLOCATION
 # define D11PRINT(fmt, args...) printk(XENLOG_DEBUG fmt, ##args)
@@ -1159,21 +1145,6 @@ static int handle_device(struct domain *d, struct dt_device_node *dev,
     return 0;
 }
 
-static p2m_type_t lookup_map_attr(struct dt_device_node *node,
-                                  p2m_type_t parent_p2mt)
-{
-    const struct dt_device_match *r;
-
-    /* Search and if nothing matches, use the parent's attributes.  */
-    r = dt_match_node(dev_map_attrs, node);
-
-    /*
-     * If this node does not dictate specific mapping attributes,
-     * it inherits its parent's attributes.
-     */
-    return r ? (uintptr_t) r->data : parent_p2mt;
-}
-
 static int handle_node(struct domain *d, struct kernel_info *kinfo,
                        struct dt_device_node *node,
                        p2m_type_t p2mt)
@@ -1264,7 +1235,6 @@ static int handle_node(struct domain *d, struct kernel_info *kinfo,
                "WARNING: Path %s is reserved, skip the node as we may re-use the path.\n",
                path);
 
-    p2mt = lookup_map_attr(node, p2mt);
     res = handle_device(d, node, p2mt);
     if ( res)
         return res;
@@ -1319,7 +1289,7 @@ static int handle_node(struct domain *d, struct kernel_info *kinfo,
 
 static int prepare_dtb(struct domain *d, struct kernel_info *kinfo)
 {
-    const p2m_type_t default_p2mt = p2m_mmio_direct_dev;
+    const p2m_type_t default_p2mt = p2m_mmio_direct_c;
     const void *fdt;
     int new_size;
     int ret;
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 63c744a..a5348f2 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -205,7 +205,10 @@ int gic_remove_irq_from_guest(struct domain *d, unsigned int virq,
          */
         if ( test_bit(_IRQ_INPROGRESS, &desc->status) ||
              !test_bit(_IRQ_DISABLED, &desc->status) )
+        {
+            vgic_unlock_rank(v_target, rank, flags);
             return -EBUSY;
+        }
     }
 
     clear_bit(_IRQ_GUEST, &desc->status);
diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
index 06d4843..508028b 100644
--- a/xen/arch/arm/irq.c
+++ b/xen/arch/arm/irq.c
@@ -477,26 +477,32 @@ int route_irq_to_guest(struct domain *d, unsigned int virq,
      */
     if ( desc->action != NULL )
     {
-        struct domain *ad = irq_get_domain(desc);
-
-        if ( test_bit(_IRQ_GUEST, &desc->status) && d == ad )
+        if ( test_bit(_IRQ_GUEST, &desc->status) )
         {
-            if ( irq_get_guest_info(desc)->virq != virq )
+            struct domain *ad = irq_get_domain(desc);
+
+            if ( d == ad )
+            {
+                if ( irq_get_guest_info(desc)->virq != virq )
+                {
+                    printk(XENLOG_G_ERR
+                           "d%u: IRQ %u is already assigned to vIRQ %u\n",
+                           d->domain_id, irq, irq_get_guest_info(desc)->virq);
+                    retval = -EBUSY;
+                }
+            }
+            else
             {
-                printk(XENLOG_G_ERR
-                       "d%u: IRQ %u is already assigned to vIRQ %u\n",
-                       d->domain_id, irq, irq_get_guest_info(desc)->virq);
+                printk(XENLOG_G_ERR "IRQ %u is already used by domain %u\n",
+                       irq, ad->domain_id);
                 retval = -EBUSY;
             }
-            goto out;
         }
-
-        if ( test_bit(_IRQ_GUEST, &desc->status) )
-            printk(XENLOG_G_ERR "IRQ %u is already used by domain %u\n",
-                   irq, ad->domain_id);
         else
+        {
             printk(XENLOG_G_ERR "IRQ %u is already used by Xen\n", irq);
-        retval = -EBUSY;
+            retval = -EBUSY;
+        }
         goto out;
     }
 
diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 99588a3..596283f 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -390,6 +390,16 @@ void flush_page_to_ram(unsigned long mfn)
 
     clean_and_invalidate_dcache_va_range(v, PAGE_SIZE);
     unmap_domain_page(v);
+
+    /*
+     * For some of the instruction cache (such as VIPT), the entire I-Cache
+     * needs to be flushed to guarantee that all the aliases of a given
+     * physical address will be removed from the cache.
+     * Invalidating the I-Cache by VA highly depends on the behavior of the
+     * I-Cache (See D4.9.2 in ARM DDI 0487A.k_iss10775). Instead of using flush
+     * by VA on select platforms, we just flush the entire cache here.
+     */
+    invalidate_icache();
 }
 
 void __init arch_init_memory(void)
diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index cc5634b..c7c726b 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -135,13 +135,12 @@ void p2m_restore_state(struct vcpu *n)
 {
     register_t hcr;
     struct p2m_domain *p2m = &n->domain->arch.p2m;
+    uint8_t *last_vcpu_ran;
 
     if ( is_idle_vcpu(n) )
         return;
 
     hcr = READ_SYSREG(HCR_EL2);
-    WRITE_SYSREG(hcr & ~HCR_VM, HCR_EL2);
-    isb();
 
     WRITE_SYSREG64(p2m->vttbr, VTTBR_EL2);
     isb();
@@ -156,6 +155,17 @@ void p2m_restore_state(struct vcpu *n)
 
     WRITE_SYSREG(hcr, HCR_EL2);
     isb();
+
+    last_vcpu_ran = &p2m->last_vcpu_ran[smp_processor_id()];
+
+    /*
+     * Flush local TLB for the domain to prevent wrong TLB translation
+     * when running multiple vCPU of the same domain on a single pCPU.
+     */
+    if ( *last_vcpu_ran != INVALID_VCPU_ID && *last_vcpu_ran != n->vcpu_id )
+        flush_tlb_local();
+
+    *last_vcpu_ran = n->vcpu_id;
 }
 
 static void p2m_flush_tlb(struct p2m_domain *p2m)
@@ -734,6 +744,7 @@ static void p2m_free_entry(struct p2m_domain *p2m,
     unsigned int i;
     lpae_t *table;
     mfn_t mfn;
+    struct page_info *pg;
 
     /* Nothing to do if the entry is invalid. */
     if ( !p2m_valid(entry) )
@@ -771,7 +782,10 @@ static void p2m_free_entry(struct p2m_domain *p2m,
     mfn = _mfn(entry.p2m.base);
     ASSERT(mfn_valid(mfn_x(mfn)));
 
-    free_domheap_page(mfn_to_page(mfn_x(mfn)));
+    pg = mfn_to_page(mfn_x(mfn));
+
+    page_list_del(pg, &p2m->pages);
+    free_domheap_page(pg);
 }
 
 static bool p2m_split_superpage(struct p2m_domain *p2m, lpae_t *entry,
@@ -982,9 +996,10 @@ static int __p2m_set_entry(struct p2m_domain *p2m,
 
     /*
      * The radix-tree can only work on 4KB. This is only used when
-     * memaccess is enabled.
+     * memaccess is enabled and during shutdown.
      */
-    ASSERT(!p2m->mem_access_enabled || page_order == 0);
+    ASSERT(!p2m->mem_access_enabled || page_order == 0 ||
+           p2m->domain->is_dying);
     /*
      * The access type should always be p2m_access_rwx when the mapping
      * is removed.
@@ -1176,7 +1191,7 @@ int map_dev_mmio_region(struct domain *d,
     if ( !(nr && iomem_access_permitted(d, mfn_x(mfn), mfn_x(mfn) + nr - 1)) )
         return 0;
 
-    res = map_mmio_regions(d, gfn, nr, mfn);
+    res = p2m_insert_mapping(d, gfn, nr, mfn, p2m_mmio_direct_c);
     if ( res < 0 )
     {
         printk(XENLOG_G_ERR "Unable to map MFNs [%#"PRI_mfn" - %#"PRI_mfn" in Dom%d\n",
@@ -1308,6 +1323,7 @@ int p2m_init(struct domain *d)
 {
     struct p2m_domain *p2m = &d->arch.p2m;
     int rc = 0;
+    unsigned int cpu;
 
     rwlock_init(&p2m->lock);
     INIT_PAGE_LIST_HEAD(&p2m->pages);
@@ -1336,6 +1352,17 @@ int p2m_init(struct domain *d)
 
     rc = p2m_alloc_table(d);
 
+    /*
+     * Make sure that the type chosen to is able to store the an vCPU ID
+     * between 0 and the maximum of virtual CPUS supported as long as
+     * the INVALID_VCPU_ID.
+     */
+    BUILD_BUG_ON((1 << (sizeof(p2m->last_vcpu_ran[0]) * 8)) < MAX_VIRT_CPUS);
+    BUILD_BUG_ON((1 << (sizeof(p2m->last_vcpu_ran[0])* 8)) < INVALID_VCPU_ID);
+
+    for_each_possible_cpu(cpu)
+       p2m->last_vcpu_ran[cpu] = INVALID_VCPU_ID;
+
     return rc;
 }
 
diff --git a/xen/arch/arm/psci.c b/xen/arch/arm/psci.c
index 7966b5e..34ee97e 100644
--- a/xen/arch/arm/psci.c
+++ b/xen/arch/arm/psci.c
@@ -147,7 +147,7 @@ int __init psci_init_0_2(void)
     psci_ver = call_smc(PSCI_0_2_FN_PSCI_VERSION, 0, 0, 0);
 
     /* For the moment, we only support PSCI 0.2 and PSCI 1.x */
-    if ( psci_ver != PSCI_VERSION(0, 2) && PSCI_VERSION_MAJOR(psci_ver != 1) )
+    if ( psci_ver != PSCI_VERSION(0, 2) && PSCI_VERSION_MAJOR(psci_ver) != 1 )
     {
         printk("Error: Unrecognized PSCI version %u.%u\n",
                PSCI_VERSION_MAJOR(psci_ver), PSCI_VERSION_MINOR(psci_ver));
diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index 38eb888..861c39e 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -784,6 +784,8 @@ void __init start_xen(unsigned long boot_phys_offset,
 
     smp_init_cpus();
     cpus = smp_get_max_cpus();
+    printk(XENLOG_INFO "SMP: Allowing %u CPUs\n", cpus);
+    nr_cpu_ids = cpus;
 
     init_xen_time();
 
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 8ff73fe..90aba2a 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -101,6 +101,19 @@ static int debug_stack_lines = 40;
 
 integer_param("debug_stack_lines", debug_stack_lines);
 
+static enum {
+	TRAP,
+	NATIVE,
+} vwfi;
+
+static void __init parse_vwfi(const char *s)
+{
+	if ( !strcmp(s, "native") )
+		vwfi = NATIVE;
+	else
+		vwfi = TRAP;
+}
+custom_param("vwfi", parse_vwfi);
 
 void init_traps(void)
 {
@@ -127,8 +140,8 @@ void init_traps(void)
 
     /* Setup hypervisor traps */
     WRITE_SYSREG(HCR_PTW|HCR_BSU_INNER|HCR_AMO|HCR_IMO|HCR_FMO|HCR_VM|
-                 HCR_TWE|HCR_TWI|HCR_TSC|HCR_TAC|HCR_SWIO|HCR_TIDCP|HCR_FB,
-                 HCR_EL2);
+                 (vwfi != NATIVE ? (HCR_TWI|HCR_TWE) : 0) |
+                 HCR_TSC|HCR_TAC|HCR_SWIO|HCR_TIDCP|HCR_FB,HCR_EL2);
     isb();
 }
 
@@ -643,7 +656,7 @@ static const char *mode_string(uint32_t cpsr)
     };
     mode = cpsr & PSR_MODE_MASK;
 
-    if ( mode > ARRAY_SIZE(mode_strings) )
+    if ( mode >= ARRAY_SIZE(mode_strings) )
         return "Unknown";
     return mode_strings[mode] ? : "Unknown";
 }
@@ -2280,6 +2293,20 @@ static void do_sysreg(struct cpu_user_regs *regs,
         return inject_undef64_exception(regs, hsr.len);
 
     /*
+     *  ICC_SRE_EL2.Enable = 0
+     *
+     *  GIC Architecture Specification (IHI 0069C): Section 8.1.9
+     */
+    case HSR_SYSREG_ICC_SRE_EL1:
+        /*
+         * Trapped when the guest is using GICv2 whilst the platform
+         * interrupt controller is GICv3. In this case, the register
+         * should be emulate as RAZ/WI to tell the guest to use the GIC
+         * memory mapped interface (i.e GICv2 compatibility).
+         */
+        return handle_raz_wi(regs, regidx, hsr.sysreg.read, hsr, 1);
+
+    /*
      * HCR_EL2.TIDCP
      *
      * ARMv8 (DDI 0487A.d): D1-1501 Table D1-43
diff --git a/xen/arch/arm/vgic-v2.c b/xen/arch/arm/vgic-v2.c
index c6d280e..92188a2 100644
--- a/xen/arch/arm/vgic-v2.c
+++ b/xen/arch/arm/vgic-v2.c
@@ -79,7 +79,7 @@ static uint32_t vgic_fetch_itargetsr(struct vgic_irq_rank *rank,
     offset &= ~(NR_TARGETS_PER_ITARGETSR - 1);
 
     for ( i = 0; i < NR_TARGETS_PER_ITARGETSR; i++, offset++ )
-        reg |= (1 << rank->vcpu[offset]) << (i * NR_BITS_PER_TARGET);
+        reg |= (1 << read_atomic(&rank->vcpu[offset])) << (i * NR_BITS_PER_TARGET);
 
     return reg;
 }
@@ -152,7 +152,7 @@ static void vgic_store_itargetsr(struct domain *d, struct vgic_irq_rank *rank,
         /* The vCPU ID always starts from 0 */
         new_target--;
 
-        old_target = rank->vcpu[offset];
+        old_target = read_atomic(&rank->vcpu[offset]);
 
         /* Only migrate the vIRQ if the target vCPU has changed */
         if ( new_target != old_target )
@@ -162,7 +162,7 @@ static void vgic_store_itargetsr(struct domain *d, struct vgic_irq_rank *rank,
                              virq);
         }
 
-        rank->vcpu[offset] = new_target;
+        write_atomic(&rank->vcpu[offset], new_target);
     }
 }
 
diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c
index ec038a3..2d71cac 100644
--- a/xen/arch/arm/vgic-v3.c
+++ b/xen/arch/arm/vgic-v3.c
@@ -107,7 +107,7 @@ static uint64_t vgic_fetch_irouter(struct vgic_irq_rank *rank,
     /* Get the index in the rank */
     offset &= INTERRUPT_RANK_MASK;
 
-    return vcpuid_to_vaffinity(rank->vcpu[offset]);
+    return vcpuid_to_vaffinity(read_atomic(&rank->vcpu[offset]));
 }
 
 /*
@@ -135,7 +135,7 @@ static void vgic_store_irouter(struct domain *d, struct vgic_irq_rank *rank,
     offset &= virq & INTERRUPT_RANK_MASK;
 
     new_vcpu = vgic_v3_irouter_to_vcpu(d, irouter);
-    old_vcpu = d->vcpu[rank->vcpu[offset]];
+    old_vcpu = d->vcpu[read_atomic(&rank->vcpu[offset])];
 
     /*
      * From the spec (see 8.9.13 in IHI 0069A), any write with an
@@ -153,7 +153,7 @@ static void vgic_store_irouter(struct domain *d, struct vgic_irq_rank *rank,
     if ( new_vcpu != old_vcpu )
         vgic_migrate_irq(old_vcpu, new_vcpu, virq);
 
-    rank->vcpu[offset] = new_vcpu->vcpu_id;
+    write_atomic(&rank->vcpu[offset], new_vcpu->vcpu_id);
 }
 
 static inline bool vgic_reg64_check_access(struct hsr_dabt dabt)
diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index 0965119..d12e6f0 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -85,7 +85,7 @@ static void vgic_rank_init(struct vgic_irq_rank *rank, uint8_t index,
     rank->index = index;
 
     for ( i = 0; i < NR_INTERRUPT_PER_RANK; i++ )
-        rank->vcpu[i] = vcpu;
+        write_atomic(&rank->vcpu[i], vcpu);
 }
 
 int domain_vgic_register(struct domain *d, int *mmio_count)
@@ -218,28 +218,11 @@ int vcpu_vgic_free(struct vcpu *v)
     return 0;
 }
 
-/* The function should be called by rank lock taken. */
-static struct vcpu *__vgic_get_target_vcpu(struct vcpu *v, unsigned int virq)
-{
-    struct vgic_irq_rank *rank = vgic_rank_irq(v, virq);
-
-    ASSERT(spin_is_locked(&rank->lock));
-
-    return v->domain->vcpu[rank->vcpu[virq & INTERRUPT_RANK_MASK]];
-}
-
-/* takes the rank lock */
 struct vcpu *vgic_get_target_vcpu(struct vcpu *v, unsigned int virq)
 {
-    struct vcpu *v_target;
     struct vgic_irq_rank *rank = vgic_rank_irq(v, virq);
-    unsigned long flags;
-
-    vgic_lock_rank(v, rank, flags);
-    v_target = __vgic_get_target_vcpu(v, virq);
-    vgic_unlock_rank(v, rank, flags);
-
-    return v_target;
+    int target = read_atomic(&rank->vcpu[virq & INTERRUPT_RANK_MASK]);
+    return v->domain->vcpu[target];
 }
 
 static int vgic_get_virq_priority(struct vcpu *v, unsigned int virq)
@@ -326,7 +309,7 @@ void vgic_disable_irqs(struct vcpu *v, uint32_t r, int n)
 
     while ( (i = find_next_bit(&mask, 32, i)) < 32 ) {
         irq = i + (32 * n);
-        v_target = __vgic_get_target_vcpu(v, irq);
+        v_target = vgic_get_target_vcpu(v, irq);
         p = irq_to_pending(v_target, irq);
         clear_bit(GIC_IRQ_GUEST_ENABLED, &p->status);
         gic_remove_from_queues(v_target, irq);
@@ -368,7 +351,7 @@ void vgic_enable_irqs(struct vcpu *v, uint32_t r, int n)
 
     while ( (i = find_next_bit(&mask, 32, i)) < 32 ) {
         irq = i + (32 * n);
-        v_target = __vgic_get_target_vcpu(v, irq);
+        v_target = vgic_get_target_vcpu(v, irq);
         p = irq_to_pending(v_target, irq);
         set_bit(GIC_IRQ_GUEST_ENABLED, &p->status);
         spin_lock_irqsave(&v_target->arch.vgic.lock, flags);
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index eae643f..093856a 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1315,16 +1315,24 @@ static inline int check_segment(struct segment_register *reg,
         return 0;
     }
 
-    if ( seg != x86_seg_tr && !reg->attr.fields.s )
+    if ( seg == x86_seg_tr )
     {
-        gprintk(XENLOG_ERR,
-                "System segment provided for a code or data segment\n");
-        return -EINVAL;
-    }
+        if ( reg->attr.fields.s )
+        {
+            gprintk(XENLOG_ERR, "Code or data segment provided for TR\n");
+            return -EINVAL;
+        }
 
-    if ( seg == x86_seg_tr && reg->attr.fields.s )
+        if ( reg->attr.fields.type != SYS_DESC_tss_busy )
+        {
+            gprintk(XENLOG_ERR, "Non-32-bit-TSS segment provided for TR\n");
+            return -EINVAL;
+        }
+    }
+    else if ( !reg->attr.fields.s )
     {
-        gprintk(XENLOG_ERR, "Code or data segment provided for TR\n");
+        gprintk(XENLOG_ERR,
+                "System segment provided for a code or data segment\n");
         return -EINVAL;
     }
 
@@ -1387,7 +1395,8 @@ int arch_set_info_hvm_guest(struct vcpu *v, const vcpu_hvm_context_t *ctx)
 #define SEG(s, r) ({                                                        \
     s = (struct segment_register){ .base = (r)->s ## _base,                 \
                                    .limit = (r)->s ## _limit,               \
-                                   .attr.bytes = (r)->s ## _ar };           \
+                                   .attr.bytes = (r)->s ## _ar |            \
+                                       (x86_seg_##s != x86_seg_tr ? 1 : 2) }; \
     check_segment(&s, x86_seg_ ## s); })
 
         rc = SEG(cs, regs);
diff --git a/xen/arch/x86/efi/efi-boot.h b/xen/arch/x86/efi/efi-boot.h
index 388c4ea..d11b9c4 100644
--- a/xen/arch/x86/efi/efi-boot.h
+++ b/xen/arch/x86/efi/efi-boot.h
@@ -13,7 +13,11 @@ static struct file __initdata ucode;
 static multiboot_info_t __initdata mbi = {
     .flags = MBI_MODULES | MBI_LOADERNAME
 };
-static module_t __initdata mb_modules[3];
+/*
+ * The array size needs to be one larger than the number of modules we
+ * support - see __start_xen().
+ */
+static module_t __initdata mb_modules[5];
 
 static void __init edd_put_string(u8 *dst, size_t n, const char *src)
 {
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index f8ef6e5..6c30bec 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -387,13 +387,20 @@ void hvm_set_guest_tsc_fixed(struct vcpu *v, u64 guest_tsc, u64 at_tsc)
     }
 
     delta_tsc = guest_tsc - tsc;
-    v->arch.hvm_vcpu.msr_tsc_adjust += delta_tsc
-                          - v->arch.hvm_vcpu.cache_tsc_offset;
     v->arch.hvm_vcpu.cache_tsc_offset = delta_tsc;
 
     hvm_funcs.set_tsc_offset(v, v->arch.hvm_vcpu.cache_tsc_offset, at_tsc);
 }
 
+static void hvm_set_guest_tsc_msr(struct vcpu *v, u64 guest_tsc)
+{
+    uint64_t tsc_offset = v->arch.hvm_vcpu.cache_tsc_offset;
+
+    hvm_set_guest_tsc(v, guest_tsc);
+    v->arch.hvm_vcpu.msr_tsc_adjust += v->arch.hvm_vcpu.cache_tsc_offset
+                          - tsc_offset;
+}
+
 void hvm_set_guest_tsc_adjust(struct vcpu *v, u64 tsc_adjust)
 {
     v->arch.hvm_vcpu.cache_tsc_offset += tsc_adjust
@@ -3940,7 +3947,7 @@ int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content,
         break;
 
     case MSR_IA32_TSC:
-        hvm_set_guest_tsc(v, msr_content);
+        hvm_set_guest_tsc_msr(v, msr_content);
         break;
 
     case MSR_IA32_TSC_ADJUST:
diff --git a/xen/arch/x86/hvm/mtrr.c b/xen/arch/x86/hvm/mtrr.c
index 228dac1..cc448e7 100644
--- a/xen/arch/x86/hvm/mtrr.c
+++ b/xen/arch/x86/hvm/mtrr.c
@@ -776,17 +776,19 @@ int epte_get_entry_emt(struct domain *d, unsigned long gfn, mfn_t mfn,
     if ( v->domain != d )
         v = d->vcpu ? d->vcpu[0] : NULL;
 
-    if ( !mfn_valid(mfn_x(mfn)) ||
-         rangeset_contains_range(mmio_ro_ranges, mfn_x(mfn),
-                                 mfn_x(mfn) + (1UL << order) - 1) )
-    {
-        *ipat = 1;
-        return MTRR_TYPE_UNCACHABLE;
-    }
-
+    /* Mask, not add, for order so it works with INVALID_MFN on unmapping */
     if ( rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
-                                 mfn_x(mfn) + (1UL << order) - 1) )
+                                 mfn_x(mfn) | ((1UL << order) - 1)) )
+    {
+        if ( !order || rangeset_contains_range(mmio_ro_ranges, mfn_x(mfn),
+                                               mfn_x(mfn) | ((1UL << order) - 1)) )
+        {
+            *ipat = 1;
+            return MTRR_TYPE_UNCACHABLE;
+        }
+        /* Force invalid memory type so resolve_misconfig() will split it */
         return -1;
+    }
 
     if ( direct_mmio )
     {
@@ -798,6 +800,12 @@ int epte_get_entry_emt(struct domain *d, unsigned long gfn, mfn_t mfn,
         return MTRR_TYPE_WRBACK;
     }
 
+    if ( !mfn_valid(mfn_x(mfn)) )
+    {
+        *ipat = 1;
+        return MTRR_TYPE_UNCACHABLE;
+    }
+
     if ( !need_iommu(d) && !cache_flush_permitted(d) )
     {
         *ipat = 1;
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 37bd6c4..8edc846 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -353,7 +353,7 @@ static void svm_save_cpu_state(struct vcpu *v, struct hvm_hw_cpu *data)
     data->msr_cstar        = vmcb->cstar;
     data->msr_syscall_mask = vmcb->sfmask;
     data->msr_efer         = v->arch.hvm_vcpu.guest_efer;
-    data->msr_flags        = -1ULL;
+    data->msr_flags        = 0;
 }
 
 
diff --git a/xen/arch/x86/hvm/svm/vmcb.c b/xen/arch/x86/hvm/svm/vmcb.c
index 9ea014f..f982fc9 100644
--- a/xen/arch/x86/hvm/svm/vmcb.c
+++ b/xen/arch/x86/hvm/svm/vmcb.c
@@ -72,6 +72,9 @@ static int construct_vmcb(struct vcpu *v)
     struct arch_svm_struct *arch_svm = &v->arch.hvm_svm;
     struct vmcb_struct *vmcb = arch_svm->vmcb;
 
+    /* Build-time check of the size of VMCB AMD structure. */
+    BUILD_BUG_ON(sizeof(*vmcb) != PAGE_SIZE);
+
     vmcb->_general1_intercepts = 
         GENERAL1_INTERCEPT_INTR        | GENERAL1_INTERCEPT_NMI         |
         GENERAL1_INTERCEPT_SMI         | GENERAL1_INTERCEPT_INIT        |
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 0995496..4646ecc 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -552,6 +552,20 @@ static void vmx_load_vmcs(struct vcpu *v)
     local_irq_restore(flags);
 }
 
+void vmx_vmcs_reload(struct vcpu *v)
+{
+    /*
+     * As we may be running with interrupts disabled, we can't acquire
+     * v->arch.hvm_vmx.vmcs_lock here. However, with interrupts disabled
+     * the VMCS can't be taken away from us anymore if we still own it.
+     */
+    ASSERT(v->is_running || !local_irq_is_enabled());
+    if ( v->arch.hvm_vmx.vmcs_pa == this_cpu(current_vmcs) )
+        return;
+
+    vmx_load_vmcs(v);
+}
+
 int vmx_cpu_up_prepare(unsigned int cpu)
 {
     /*
@@ -1090,6 +1104,9 @@ static int construct_vmcs(struct vcpu *v)
             vmx_disable_intercept_for_msr(v, MSR_IA32_BNDCFGS, MSR_TYPE_R | MSR_TYPE_W);
     }
 
+    /* All guest MSR state is dirty. */
+    v->arch.hvm_vmx.msr_state.flags = ((1u << VMX_MSR_COUNT) - 1);
+
     /* I/O access bitmap. */
     __vmwrite(IO_BITMAP_A, __pa(d->arch.hvm_domain.io_bitmap));
     __vmwrite(IO_BITMAP_B, __pa(d->arch.hvm_domain.io_bitmap) + PAGE_SIZE);
@@ -1652,10 +1669,7 @@ void vmx_do_resume(struct vcpu *v)
     bool_t debug_state;
 
     if ( v->arch.hvm_vmx.active_cpu == smp_processor_id() )
-    {
-        if ( v->arch.hvm_vmx.vmcs_pa != this_cpu(current_vmcs) )
-            vmx_load_vmcs(v);
-    }
+        vmx_vmcs_reload(v);
     else
     {
         /*
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 7b2c50c..9a42e2e 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -739,13 +739,12 @@ static int vmx_vmcs_restore(struct vcpu *v, struct hvm_hw_cpu *c)
 static void vmx_save_cpu_state(struct vcpu *v, struct hvm_hw_cpu *data)
 {
     struct vmx_msr_state *guest_state = &v->arch.hvm_vmx.msr_state;
-    unsigned long guest_flags = guest_state->flags;
 
     data->shadow_gs = v->arch.hvm_vmx.shadow_gs;
     data->msr_cstar = v->arch.hvm_vmx.cstar;
 
     /* save msrs */
-    data->msr_flags        = guest_flags;
+    data->msr_flags        = 0;
     data->msr_lstar        = guest_state->msrs[VMX_INDEX_MSR_LSTAR];
     data->msr_star         = guest_state->msrs[VMX_INDEX_MSR_STAR];
     data->msr_syscall_mask = guest_state->msrs[VMX_INDEX_MSR_SYSCALL_MASK];
@@ -756,7 +755,7 @@ static void vmx_load_cpu_state(struct vcpu *v, struct hvm_hw_cpu *data)
     struct vmx_msr_state *guest_state = &v->arch.hvm_vmx.msr_state;
 
     /* restore msrs */
-    guest_state->flags = data->msr_flags & 7;
+    guest_state->flags = ((1u << VMX_MSR_COUNT) - 1);
     guest_state->msrs[VMX_INDEX_MSR_LSTAR]        = data->msr_lstar;
     guest_state->msrs[VMX_INDEX_MSR_STAR]         = data->msr_star;
     guest_state->msrs[VMX_INDEX_MSR_SYSCALL_MASK] = data->msr_syscall_mask;
@@ -896,6 +895,18 @@ static void vmx_ctxt_switch_from(struct vcpu *v)
     if ( unlikely(!this_cpu(vmxon)) )
         return;
 
+    if ( !v->is_running )
+    {
+        /*
+         * When this vCPU isn't marked as running anymore, a remote pCPU's
+         * attempt to pause us (from vmx_vmcs_enter()) won't have a reason
+         * to spin in vcpu_sleep_sync(), and hence that pCPU might have taken
+         * away the VMCS from us. As we're running with interrupts disabled,
+         * we also can't call vmx_vmcs_enter().
+         */
+        vmx_vmcs_reload(v);
+    }
+
     vmx_fpu_leave(v);
     vmx_save_guest_msrs(v);
     vmx_restore_host_msrs();
diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
index 3b025d5..9e246b6 100644
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -452,7 +452,7 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn)
                      mfn |= _PAGE_PSE_PAT >> PAGE_SHIFT;
                 }
                 else
-                     mfn &= ~(_PAGE_PSE_PAT >> PAGE_SHIFT);
+                     mfn &= ~((unsigned long)_PAGE_PSE_PAT >> PAGE_SHIFT);
                 flags |= _PAGE_PSE;
             }
             e = l1e_from_pfn(mfn, flags);
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 6a45185..162120c 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -2048,7 +2048,8 @@ p2m_flush_table(struct p2m_domain *p2m)
     ASSERT(page_list_empty(&p2m->pod.super));
     ASSERT(page_list_empty(&p2m->pod.single));
 
-    if ( p2m->np2m_base == P2M_BASE_EADDR )
+    /* No need to flush if it's already empty */
+    if ( p2m_is_nestedp2m(p2m) && p2m->np2m_base == P2M_BASE_EADDR )
     {
         p2m_unlock(p2m);
         return;
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index b130671..1bfe4ce 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -890,6 +890,17 @@ void __init noreturn __start_xen(unsigned long mbi_p)
         mod[i].reserved = 0;
     }
 
+    if ( efi_enabled )
+    {
+        /*
+         * This needs to remain in sync with xen_in_range() and the
+         * respective reserve_e820_ram() invocation below.
+         */
+        mod[mbi->mods_count].mod_start = PFN_DOWN(mbi->mem_upper);
+        mod[mbi->mods_count].mod_end = __pa(__2M_rwdata_end) -
+                                       (mbi->mem_upper & PAGE_MASK);
+    }
+
     modules_headroom = bzimage_headroom(bootstrap_map(mod), mod->mod_end);
     bootstrap_map(NULL);
 
@@ -925,7 +936,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
                      1UL << (PAGE_SHIFT + 32)) )
             e = min(HYPERVISOR_VIRT_END - DIRECTMAP_VIRT_START,
                     1UL << (PAGE_SHIFT + 32));
-#define reloc_size ((__pa(&_end) + mask) & ~mask)
+#define reloc_size ((__pa(__2M_rwdata_end) + mask) & ~mask)
         /* Is the region suitable for relocating Xen? */
         if ( !xen_phys_start && e <= limit )
         {
@@ -1070,8 +1081,9 @@ void __init noreturn __start_xen(unsigned long mbi_p)
             if ( mod[j].reserved )
                 continue;
 
-            /* Don't overlap with other modules. */
-            end = consider_modules(s, e, size, mod, mbi->mods_count, j);
+            /* Don't overlap with other modules (or Xen itself). */
+            end = consider_modules(s, e, size, mod,
+                                   mbi->mods_count + efi_enabled, j);
 
             if ( highmem_start && end > highmem_start )
                 continue;
@@ -1096,9 +1108,9 @@ void __init noreturn __start_xen(unsigned long mbi_p)
          */
         while ( !kexec_crash_area.start )
         {
-            /* Don't overlap with modules. */
-            e = consider_modules(s, e, PAGE_ALIGN(kexec_crash_area.size),
-                                 mod, mbi->mods_count, -1);
+            /* Don't overlap with modules (or Xen itself). */
+            e = consider_modules(s, e, PAGE_ALIGN(kexec_crash_area.size), mod,
+                                 mbi->mods_count + efi_enabled, -1);
             if ( s >= e )
                 break;
             if ( e > kexec_crash_area_limit )
@@ -1122,8 +1134,10 @@ void __init noreturn __start_xen(unsigned long mbi_p)
 
     if ( !xen_phys_start )
         panic("Not enough memory to relocate Xen.");
-    reserve_e820_ram(&boot_e820, efi_enabled ? mbi->mem_upper : __pa(&_start),
-                     __pa(&_end));
+
+    /* This needs to remain in sync with xen_in_range(). */
+    reserve_e820_ram(&boot_e820, efi_enabled ? mbi->mem_upper : __pa(_stext),
+                     __pa(__2M_rwdata_end));
 
     /* Late kexec reservation (dynamic start address). */
     kexec_reserve_area(&boot_e820);
@@ -1672,7 +1686,7 @@ int __hwdom_init xen_in_range(unsigned long mfn)
     paddr_t start, end;
     int i;
 
-    enum { region_s3, region_text, region_bss, nr_regions };
+    enum { region_s3, region_ro, region_rw, nr_regions };
     static struct {
         paddr_t s, e;
     } xen_regions[nr_regions] __hwdom_initdata;
@@ -1683,12 +1697,20 @@ int __hwdom_init xen_in_range(unsigned long mfn)
         /* S3 resume code (and other real mode trampoline code) */
         xen_regions[region_s3].s = bootsym_phys(trampoline_start);
         xen_regions[region_s3].e = bootsym_phys(trampoline_end);
-        /* hypervisor code + data */
-        xen_regions[region_text].s =__pa(&_stext);
-        xen_regions[region_text].e = __pa(&__init_begin);
-        /* bss */
-        xen_regions[region_bss].s = __pa(&__bss_start);
-        xen_regions[region_bss].e = __pa(&__bss_end);
+
+        /*
+         * This needs to remain in sync with the uses of the same symbols in
+         * - __start_xen() (above)
+         * - is_xen_fixed_mfn()
+         * - tboot_shutdown()
+         */
+
+        /* hypervisor .text + .rodata */
+        xen_regions[region_ro].s = __pa(&_stext);
+        xen_regions[region_ro].e = __pa(&__2M_rodata_end);
+        /* hypervisor .data + .bss */
+        xen_regions[region_rw].s = __pa(&__2M_rwdata_start);
+        xen_regions[region_rw].e = __pa(&__2M_rwdata_end);
     }
 
     start = (paddr_t)mfn << PAGE_SHIFT;
diff --git a/xen/arch/x86/tboot.c b/xen/arch/x86/tboot.c
index e5d7c42..562efcd 100644
--- a/xen/arch/x86/tboot.c
+++ b/xen/arch/x86/tboot.c
@@ -12,6 +12,7 @@
 #include <asm/processor.h>
 #include <asm/e820.h>
 #include <asm/tboot.h>
+#include <asm/setup.h>
 #include <crypto/vmac.h>
 
 /* tboot=<physical address of shared page> */
@@ -282,7 +283,7 @@ static void tboot_gen_xenheap_integrity(const uint8_t key[TB_KEY_SIZE],
 
         if ( !mfn_valid(mfn) )
             continue;
-        if ( (mfn << PAGE_SHIFT) < __pa(&_end) )
+        if ( is_xen_fixed_mfn(mfn) )
             continue; /* skip Xen */
         if ( (mfn >= PFN_DOWN(g_tboot_shared->tboot_base - 3 * PAGE_SIZE))
              && (mfn < PFN_UP(g_tboot_shared->tboot_base
@@ -363,20 +364,22 @@ void tboot_shutdown(uint32_t shutdown_type)
     if ( shutdown_type == TB_SHUTDOWN_S3 )
     {
         /*
-         * Xen regions for tboot to MAC
+         * Xen regions for tboot to MAC. This needs to remain in sync with
+         * xen_in_range().
          */
         g_tboot_shared->num_mac_regions = 3;
         /* S3 resume code (and other real mode trampoline code) */
         g_tboot_shared->mac_regions[0].start = bootsym_phys(trampoline_start);
         g_tboot_shared->mac_regions[0].size = bootsym_phys(trampoline_end) -
                                               bootsym_phys(trampoline_start);
-        /* hypervisor code + data */
+        /* hypervisor .text + .rodata */
         g_tboot_shared->mac_regions[1].start = (uint64_t)__pa(&_stext);
-        g_tboot_shared->mac_regions[1].size = __pa(&__init_begin) -
+        g_tboot_shared->mac_regions[1].size = __pa(&__2M_rodata_end) -
                                               __pa(&_stext);
-        /* bss */
-        g_tboot_shared->mac_regions[2].start = (uint64_t)__pa(&__bss_start);
-        g_tboot_shared->mac_regions[2].size = __pa(&__bss_end) - __pa(&__bss_start);
+        /* hypervisor .data + .bss */
+        g_tboot_shared->mac_regions[2].start = (uint64_t)__pa(&__2M_rwdata_start);
+        g_tboot_shared->mac_regions[2].size = __pa(&__2M_rwdata_end) -
+                                              __pa(&__2M_rwdata_start);
 
         /*
          * MAC domains and other Xen memory
diff --git a/xen/arch/x86/x86_emulate/x86_emulate.c b/xen/arch/x86/x86_emulate/x86_emulate.c
index b06c456..3dc6f10 100644
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -331,7 +331,11 @@ union vex {
 
 #define copy_REX_VEX(ptr, rex, vex) do { \
     if ( (vex).opcx != vex_none ) \
+    { \
+        if ( !mode_64bit() ) \
+            vex.reg |= 8; \
         ptr[0] = 0xc4, ptr[1] = (vex).raw[0], ptr[2] = (vex).raw[1]; \
+    } \
     else if ( mode_64bit() ) \
         ptr[1] = rex | REX_PREFIX; \
 } while (0)
@@ -870,15 +874,15 @@ do{ struct fpu_insn_ctxt fic;                           \
     put_fpu(&fic);                                      \
 } while (0)
 
-#define emulate_fpu_insn_stub(_bytes...)                                \
+#define emulate_fpu_insn_stub(bytes...)                                 \
 do {                                                                    \
-    uint8_t *buf = get_stub(stub);                                      \
-    unsigned int _nr = sizeof((uint8_t[]){ _bytes });                   \
-    struct fpu_insn_ctxt fic = { .insn_bytes = _nr };                   \
-    memcpy(buf, ((uint8_t[]){ _bytes, 0xc3 }), _nr + 1);                \
-    get_fpu(X86EMUL_FPU_fpu, &fic);                                     \
-    stub.func();                                                        \
-    put_fpu(&fic);                                                      \
+    unsigned int nr_ = sizeof((uint8_t[]){ bytes });                    \
+    struct fpu_insn_ctxt fic_ = { .insn_bytes = nr_ };                  \
+    memcpy(get_stub(stub), ((uint8_t[]){ bytes, 0xc3 }), nr_ + 1);      \
+    get_fpu(X86EMUL_FPU_fpu, &fic_);                                    \
+    asm volatile ( "call *%[stub]" : "+m" (fic_) :                      \
+                   [stub] "rm" (stub.func) );                           \
+    put_fpu(&fic_);                                                     \
     put_stub(stub);                                                     \
 } while (0)
 
@@ -893,7 +897,7 @@ do {                                                                    \
                    "call *%[func];"                                     \
                    _POST_EFLAGS("[eflags]", "[mask]", "[tmp]")          \
                    : [eflags] "+g" (_regs.eflags),                      \
-                     [tmp] "=&r" (tmp_)                                 \
+                     [tmp] "=&r" (tmp_), "+m" (fic_)                    \
                    : [func] "rm" (stub.func),                           \
                      [mask] "i" (EFLG_ZF|EFLG_PF|EFLG_CF) );            \
     put_fpu(&fic_);                                                     \
@@ -1356,6 +1360,11 @@ protmode_load_seg(
         }
         memset(sreg, 0, sizeof(*sreg));
         sreg->sel = sel;
+
+        /* Since CPL == SS.DPL, we need to put back DPL. */
+        if ( seg == x86_seg_ss )
+            sreg->attr.fields.dpl = sel;
+
         return X86EMUL_OKAY;
     }
 
@@ -2017,16 +2026,21 @@ x86_decode(
             default:
                 BUG(); /* Shouldn't be possible. */
             case 2:
-                if ( in_realmode(ctxt, ops) || (state->regs->eflags & EFLG_VM) )
+                if ( state->regs->eflags & EFLG_VM )
                     break;
                 /* fall through */
             case 4:
-                if ( modrm_mod != 3 )
+                if ( modrm_mod != 3 || in_realmode(ctxt, ops) )
                     break;
                 /* fall through */
             case 8:
                 /* VEX / XOP / EVEX */
                 generate_exception_if(rex_prefix || vex.pfx, EXC_UD, -1);
+                /*
+                 * With operand size override disallowed (see above), op_bytes
+                 * should not have changed from its default.
+                 */
+                ASSERT(op_bytes == def_op_bytes);
 
                 vex.raw[0] = modrm;
                 if ( b == 0xc5 )
@@ -2053,6 +2067,12 @@ x86_decode(
                             op_bytes = 8;
                         }
                     }
+                    else
+                    {
+                        /* Operand size fixed at 4 (no override via W bit). */
+                        op_bytes = 4;
+                        vex.b = 1;
+                    }
                     switch ( b )
                     {
                     case 0x62:
@@ -2071,7 +2091,7 @@ x86_decode(
                         break;
                     }
                 }
-                if ( mode_64bit() && !vex.r )
+                if ( !vex.r )
                     rex_prefix |= REX_R;
 
                 ext = vex.opcx;
@@ -2113,12 +2133,21 @@ x86_decode(
 
                 opcode |= b | MASK_INSR(vex.pfx, X86EMUL_OPC_PFX_MASK);
 
+                if ( !(d & ModRM) )
+                {
+                    modrm_reg = modrm_rm = modrm_mod = modrm = 0;
+                    break;
+                }
+
                 modrm = insn_fetch_type(uint8_t);
                 modrm_mod = (modrm & 0xc0) >> 6;
 
                 break;
             }
+    }
 
+    if ( d & ModRM )
+    {
         modrm_reg = ((rex_prefix & 4) << 1) | ((modrm & 0x38) >> 3);
         modrm_rm  = modrm & 0x07;
 
@@ -2182,6 +2211,17 @@ x86_decode(
                     break;
                 }
                 break;
+            case 0x20: /* mov cr,reg */
+            case 0x21: /* mov dr,reg */
+            case 0x22: /* mov reg,cr */
+            case 0x23: /* mov reg,dr */
+                /*
+                 * Mov to/from cr/dr ignore the encoding of Mod, and behave as
+                 * if they were encoded as reg/reg instructions.  No futher
+                 * disp/SIB bytes are fetched.
+                 */
+                modrm_mod = 3;
+                break;
             }
             break;
 
@@ -4730,7 +4770,7 @@ x86_emulate(
     case X86EMUL_OPC(0x0f, 0x21): /* mov dr,reg */
     case X86EMUL_OPC(0x0f, 0x22): /* mov reg,cr */
     case X86EMUL_OPC(0x0f, 0x23): /* mov reg,dr */
-        generate_exception_if(ea.type != OP_REG, EXC_UD, -1);
+        ASSERT(ea.type == OP_REG); /* Early operand adjustment ensures this. */
         generate_exception_if(!mode_ring0(), EXC_GP, 0);
         modrm_reg |= lock_prefix << 3;
         if ( b & 2 )
@@ -5050,6 +5090,7 @@ x86_emulate(
     }
 
     case X86EMUL_OPC(0x0f, 0xa3): bt: /* bt */
+        generate_exception_if(lock_prefix, EXC_UD, 0);
         emulate_2op_SrcV_nobyte("bt", src, dst, _regs.eflags);
         dst.type = OP_NONE;
         break;
diff --git a/xen/arch/x86/x86_emulate/x86_emulate.h b/xen/arch/x86/x86_emulate/x86_emulate.h
index 993c576..708ce78 100644
--- a/xen/arch/x86/x86_emulate/x86_emulate.h
+++ b/xen/arch/x86/x86_emulate/x86_emulate.h
@@ -71,7 +71,7 @@ enum x86_swint_emulation {
  * Attribute for segment selector. This is a copy of bit 40:47 & 52:55 of the
  * segment descriptor. It happens to match the format of an AMD SVM VMCB.
  */
-typedef union __attribute__((__packed__)) segment_attributes {
+typedef union segment_attributes {
     uint16_t bytes;
     struct
     {
@@ -91,7 +91,7 @@ typedef union __attribute__((__packed__)) segment_attributes {
  * Full state of a segment register (visible and hidden portions).
  * Again, this happens to match the format of an AMD SVM VMCB.
  */
-struct __attribute__((__packed__)) segment_register {
+struct segment_register {
     uint16_t   sel;
     segment_attributes_t attr;
     uint32_t   limit;
diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
index 7676de9..1154996 100644
--- a/xen/arch/x86/xen.lds.S
+++ b/xen/arch/x86/xen.lds.S
@@ -299,7 +299,7 @@ SECTIONS
 }
 
 ASSERT(__image_base__ > XEN_VIRT_START ||
-       _end <= XEN_VIRT_END - NR_CPUS * PAGE_SIZE,
+       __2M_rwdata_end <= XEN_VIRT_END - NR_CPUS * PAGE_SIZE,
        "Xen image overlaps stubs area")
 
 #ifdef CONFIG_KEXEC
diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
index 85a0116..a5da858 100644
--- a/xen/arch/x86/xstate.c
+++ b/xen/arch/x86/xstate.c
@@ -92,7 +92,7 @@ static int setup_xstate_features(bool_t bsp)
 
     if ( bsp )
     {
-        xstate_features = fls(xfeature_mask);
+        xstate_features = flsl(xfeature_mask);
         xstate_offsets = xzalloc_array(unsigned int, xstate_features);
         if ( !xstate_offsets )
             return -ENOMEM;
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 21797ca..17f9e1e 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -437,8 +437,8 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
         goto fail_early;
     }
 
-    if ( !guest_handle_okay(exch.in.extent_start, exch.in.nr_extents) ||
-         !guest_handle_okay(exch.out.extent_start, exch.out.nr_extents) )
+    if ( !guest_handle_subrange_okay(exch.in.extent_start, exch.nr_exchanged,
+                                     exch.in.nr_extents - 1) )
     {
         rc = -EFAULT;
         goto fail_early;
@@ -448,11 +448,27 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
     {
         in_chunk_order  = exch.out.extent_order - exch.in.extent_order;
         out_chunk_order = 0;
+
+        if ( !guest_handle_subrange_okay(exch.out.extent_start,
+                                         exch.nr_exchanged >> in_chunk_order,
+                                         exch.out.nr_extents - 1) )
+        {
+            rc = -EFAULT;
+            goto fail_early;
+        }
     }
     else
     {
         in_chunk_order  = 0;
         out_chunk_order = exch.in.extent_order - exch.out.extent_order;
+
+        if ( !guest_handle_subrange_okay(exch.out.extent_start,
+                                         exch.nr_exchanged << out_chunk_order,
+                                         exch.out.nr_extents - 1) )
+        {
+            rc = -EFAULT;
+            goto fail_early;
+        }
     }
 
     d = rcu_lock_domain_by_any_id(exch.in.domid);
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index ef8e0d8..6f7860a 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -491,12 +491,15 @@ void smt_idle_mask_set(unsigned int cpu, const cpumask_t *idlers,
 }
 
 /*
- * Clear the bits of all the siblings of cpu from mask.
+ * Clear the bits of all the siblings of cpu from mask (if necessary).
  */
 static inline
 void smt_idle_mask_clear(unsigned int cpu, cpumask_t *mask)
 {
-    cpumask_andnot(mask, mask, per_cpu(cpu_sibling_mask, cpu));
+    const cpumask_t *cpu_siblings = per_cpu(cpu_sibling_mask, cpu);
+
+    if ( cpumask_subset(cpu_siblings, mask) )
+        cpumask_andnot(mask, mask, per_cpu(cpu_sibling_mask, cpu));
 }
 
 /*
@@ -510,24 +513,26 @@ void smt_idle_mask_clear(unsigned int cpu, cpumask_t *mask)
  */
 static int get_fallback_cpu(struct csched2_vcpu *svc)
 {
-    int cpu;
+    struct vcpu *v = svc->vcpu;
+    int cpu = v->processor;
 
-    if ( likely(cpumask_test_cpu(svc->vcpu->processor,
-                                 svc->vcpu->cpu_hard_affinity)) )
-        return svc->vcpu->processor;
+    cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+                cpupool_domain_cpumask(v->domain));
 
-    cpumask_and(cpumask_scratch, svc->vcpu->cpu_hard_affinity,
-                &svc->rqd->active);
-    cpu = cpumask_first(cpumask_scratch);
-    if ( likely(cpu < nr_cpu_ids) )
+    if ( likely(cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu))) )
         return cpu;
 
-    cpumask_and(cpumask_scratch, svc->vcpu->cpu_hard_affinity,
-                cpupool_domain_cpumask(svc->vcpu->domain));
+    if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu),
+                                   &svc->rqd->active)) )
+    {
+        cpumask_and(cpumask_scratch_cpu(cpu), &svc->rqd->active,
+                    cpumask_scratch_cpu(cpu));
+        return cpumask_first(cpumask_scratch_cpu(cpu));
+    }
 
-    ASSERT(!cpumask_empty(cpumask_scratch));
+    ASSERT(!cpumask_empty(cpumask_scratch_cpu(cpu)));
 
-    return cpumask_first(cpumask_scratch);
+    return cpumask_first(cpumask_scratch_cpu(cpu));
 }
 
 /*
@@ -898,6 +903,14 @@ __runq_remove(struct csched2_vcpu *svc)
 
 void burn_credits(struct csched2_runqueue_data *rqd, struct csched2_vcpu *, s_time_t);
 
+static inline void
+tickle_cpu(unsigned int cpu, struct csched2_runqueue_data *rqd)
+{
+    __cpumask_set_cpu(cpu, &rqd->tickled);
+    smt_idle_mask_clear(cpu, &rqd->smt_idle);
+    cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
+}
+
 /*
  * Check what processor it is best to 'wake', for picking up a vcpu that has
  * just been put (back) in the runqueue. Logic is as follows:
@@ -941,6 +954,9 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
                     (unsigned char *)&d);
     }
 
+    cpumask_and(cpumask_scratch_cpu(cpu), new->vcpu->cpu_hard_affinity,
+                cpupool_domain_cpumask(new->vcpu->domain));
+
     /*
      * First of all, consider idle cpus, checking if we can just
      * re-use the pcpu where we were running before.
@@ -953,7 +969,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
         cpumask_andnot(&mask, &rqd->idle, &rqd->smt_idle);
     else
         cpumask_copy(&mask, &rqd->smt_idle);
-    cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
+    cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu));
     i = cpumask_test_or_cycle(cpu, &mask);
     if ( i < nr_cpu_ids )
     {
@@ -968,7 +984,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
      * gone through the scheduler yet.
      */
     cpumask_andnot(&mask, &rqd->idle, &rqd->tickled);
-    cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
+    cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu));
     i = cpumask_test_or_cycle(cpu, &mask);
     if ( i < nr_cpu_ids )
     {
@@ -984,7 +1000,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
      */
     cpumask_andnot(&mask, &rqd->active, &rqd->idle);
     cpumask_andnot(&mask, &mask, &rqd->tickled);
-    cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
+    cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu));
     if ( cpumask_test_cpu(cpu, &mask) )
     {
         cur = CSCHED2_VCPU(curr_on_cpu(cpu));
@@ -1062,9 +1078,8 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
                     sizeof(d),
                     (unsigned char *)&d);
     }
-    __cpumask_set_cpu(ipid, &rqd->tickled);
-    smt_idle_mask_clear(ipid, &rqd->smt_idle);
-    cpu_raise_softirq(ipid, SCHEDULE_SOFTIRQ);
+
+    tickle_cpu(ipid, rqd);
 
     if ( unlikely(new->tickled_cpu != -1) )
         SCHED_STAT_CRANK(tickled_cpu_overwritten);
@@ -1104,18 +1119,28 @@ static void reset_credit(const struct scheduler *ops, int cpu, s_time_t now,
 
     list_for_each( iter, &rqd->svc )
     {
+        unsigned int svc_cpu;
         struct csched2_vcpu * svc;
         int start_credit;
 
         svc = list_entry(iter, struct csched2_vcpu, rqd_elem);
+        svc_cpu = svc->vcpu->processor;
 
         ASSERT(!is_idle_vcpu(svc->vcpu));
         ASSERT(svc->rqd == rqd);
 
+        /*
+         * If svc is running, it is our responsibility to make sure, here,
+         * that the credit it has spent so far get accounted.
+         */
+        if ( svc->vcpu == curr_on_cpu(svc_cpu) )
+            burn_credits(rqd, svc, now);
+
         start_credit = svc->credit;
 
-        /* And add INIT * m, avoiding integer multiplication in the
-         * common case. */
+        /*
+         * Add INIT * m, avoiding integer multiplication in the common case.
+         */
         if ( likely(m==1) )
             svc->credit += CSCHED2_CREDIT_INIT;
         else
@@ -1378,7 +1403,9 @@ csched2_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc)
     SCHED_STAT_CRANK(vcpu_sleep);
 
     if ( curr_on_cpu(vc->processor) == vc )
-        cpu_raise_softirq(vc->processor, SCHEDULE_SOFTIRQ);
+    {
+        tickle_cpu(vc->processor, svc->rqd);
+    }
     else if ( __vcpu_on_runq(svc) )
     {
         ASSERT(svc->rqd == RQD(ops, vc->processor));
@@ -1492,7 +1519,7 @@ static int
 csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
 {
     struct csched2_private *prv = CSCHED2_PRIV(ops);
-    int i, min_rqi = -1, new_cpu;
+    int i, min_rqi = -1, new_cpu, cpu = vc->processor;
     struct csched2_vcpu *svc = CSCHED2_VCPU(vc);
     s_time_t min_avgload = MAX_LOAD;
 
@@ -1512,7 +1539,7 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
      * just grab the prv lock.  Instead, we'll have to trylock, and
      * do something else reasonable if we fail.
      */
-    ASSERT(spin_is_locked(per_cpu(schedule_data, vc->processor).schedule_lock));
+    ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
 
     if ( !read_trylock(&prv->lock) )
     {
@@ -1526,6 +1553,9 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
         goto out;
     }
 
+    cpumask_and(cpumask_scratch_cpu(cpu), vc->cpu_hard_affinity,
+                cpupool_domain_cpumask(vc->domain));
+
     /*
      * First check to see if we're here because someone else suggested a place
      * for us to move.
@@ -1537,13 +1567,13 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
             printk(XENLOG_WARNING "%s: target runqueue disappeared!\n",
                    __func__);
         }
-        else
+        else if ( cpumask_intersects(cpumask_scratch_cpu(cpu),
+                                     &svc->migrate_rqd->active) )
         {
-            cpumask_and(cpumask_scratch, vc->cpu_hard_affinity,
+            cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
                         &svc->migrate_rqd->active);
-            new_cpu = cpumask_any(cpumask_scratch);
-            if ( new_cpu < nr_cpu_ids )
-                goto out_up;
+            new_cpu = cpumask_any(cpumask_scratch_cpu(cpu));
+            goto out_up;
         }
         /* Fall-through to normal cpu pick */
     }
@@ -1571,12 +1601,12 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
          */
         if ( rqd == svc->rqd )
         {
-            if ( cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) )
+            if ( cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active) )
                 rqd_avgload = max_t(s_time_t, rqd->b_avgload - svc->avgload, 0);
         }
         else if ( spin_trylock(&rqd->lock) )
         {
-            if ( cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) )
+            if ( cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active) )
                 rqd_avgload = rqd->b_avgload;
 
             spin_unlock(&rqd->lock);
@@ -1598,9 +1628,9 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
         goto out_up;
     }
 
-    cpumask_and(cpumask_scratch, vc->cpu_hard_affinity,
+    cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
                 &prv->rqd[min_rqi].active);
-    new_cpu = cpumask_any(cpumask_scratch);
+    new_cpu = cpumask_any(cpumask_scratch_cpu(cpu));
     BUG_ON(new_cpu >= nr_cpu_ids);
 
  out_up:
@@ -1675,6 +1705,8 @@ static void migrate(const struct scheduler *ops,
                     struct csched2_runqueue_data *trqd, 
                     s_time_t now)
 {
+    int cpu = svc->vcpu->processor;
+
     if ( unlikely(tb_init_done) )
     {
         struct {
@@ -1696,8 +1728,8 @@ static void migrate(const struct scheduler *ops,
         svc->migrate_rqd = trqd;
         __set_bit(_VPF_migrating, &svc->vcpu->pause_flags);
         __set_bit(__CSFLAG_runq_migrate_request, &svc->flags);
-        cpu_raise_softirq(svc->vcpu->processor, SCHEDULE_SOFTIRQ);
         SCHED_STAT_CRANK(migrate_requested);
+        tickle_cpu(cpu, svc->rqd);
     }
     else
     {
@@ -1711,9 +1743,11 @@ static void migrate(const struct scheduler *ops,
         }
         __runq_deassign(svc);
 
-        cpumask_and(cpumask_scratch, svc->vcpu->cpu_hard_affinity,
+        cpumask_and(cpumask_scratch_cpu(cpu), svc->vcpu->cpu_hard_affinity,
+                    cpupool_domain_cpumask(svc->vcpu->domain));
+        cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
                     &trqd->active);
-        svc->vcpu->processor = cpumask_any(cpumask_scratch);
+        svc->vcpu->processor = cpumask_any(cpumask_scratch_cpu(cpu));
         ASSERT(svc->vcpu->processor < nr_cpu_ids);
 
         __runq_assign(svc, trqd);
@@ -1737,8 +1771,14 @@ static void migrate(const struct scheduler *ops,
 static bool_t vcpu_is_migrateable(struct csched2_vcpu *svc,
                                   struct csched2_runqueue_data *rqd)
 {
+    struct vcpu *v = svc->vcpu;
+    int cpu = svc->vcpu->processor;
+
+    cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+                cpupool_domain_cpumask(v->domain));
+
     return !(svc->flags & CSFLAG_runq_migrate_request) &&
-           cpumask_intersects(svc->vcpu->cpu_hard_affinity, &rqd->active);
+           cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active);
 }
 
 static void balance_load(const struct scheduler *ops, int cpu, s_time_t now)
@@ -1928,10 +1968,40 @@ static void
 csched2_vcpu_migrate(
     const struct scheduler *ops, struct vcpu *vc, unsigned int new_cpu)
 {
+    struct domain *d = vc->domain;
     struct csched2_vcpu * const svc = CSCHED2_VCPU(vc);
     struct csched2_runqueue_data *trqd;
+    s_time_t now = NOW();
+
+    /*
+     * Being passed a target pCPU which is outside of our cpupool is only
+     * valid if we are shutting down (or doing ACPI suspend), and we are
+     * moving everyone to BSP, no matter whether or not BSP is inside our
+     * cpupool.
+     *
+     * And since there indeed is the chance that it is not part of it, all
+     * we must do is remove _and_ unassign the vCPU from any runqueue, as
+     * well as updating v->processor with the target, so that the suspend
+     * process can continue.
+     *
+     * It will then be during resume that a new, meaningful, value for
+     * v->processor will be chosen, and during actual domain unpause that
+     * the vCPU will be assigned to and added to the proper runqueue.
+     */
+    if ( unlikely(!cpumask_test_cpu(new_cpu, cpupool_domain_cpumask(d))) )
+    {
+        ASSERT(system_state == SYS_STATE_suspend);
+        if ( __vcpu_on_runq(svc) )
+        {
+            __runq_remove(svc);
+            update_load(ops, svc->rqd, NULL, -1, now);
+        }
+        __runq_deassign(svc);
+        vc->processor = new_cpu;
+        return;
+    }
 
-    /* Check if new_cpu is valid */
+    /* If here, new_cpu must be a valid Credit2 pCPU, and in our affinity. */
     ASSERT(cpumask_test_cpu(new_cpu, &CSCHED2_PRIV(ops)->initialized));
     ASSERT(cpumask_test_cpu(new_cpu, vc->cpu_hard_affinity));
 
@@ -1946,7 +2016,7 @@ csched2_vcpu_migrate(
      * pointing to a pcpu where we can't run any longer.
      */
     if ( trqd != svc->rqd )
-        migrate(ops, svc, trqd, NOW());
+        migrate(ops, svc, trqd, now);
     else
         vc->processor = new_cpu;
 }
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 5b444c4..47b2155 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -84,7 +84,27 @@ static struct scheduler __read_mostly ops;
           : (typeof((opsptr)->fn(opsptr, ##__VA_ARGS__)))0 )
 
 #define DOM2OP(_d)    (((_d)->cpupool == NULL) ? &ops : ((_d)->cpupool->sched))
-#define VCPU2OP(_v)   (DOM2OP((_v)->domain))
+static inline struct scheduler *VCPU2OP(const struct vcpu *v)
+{
+    struct domain *d = v->domain;
+
+    if ( likely(d->cpupool != NULL) )
+        return d->cpupool->sched;
+
+    /*
+     * If d->cpupool is NULL, this is a vCPU of the idle domain. And this
+     * case is special because the idle domain does not really belong to
+     * a cpupool and, hence, doesn't really have a scheduler). In fact, its
+     * vCPUs (may) run on pCPUs which are in different pools, with different
+     * schedulers.
+     *
+     * What we want, in this case, is the scheduler of the pCPU where this
+     * particular idle vCPU is running. And, since v->processor never changes
+     * for idle vCPUs, it is safe to use it, with no locks, to figure that out.
+     */
+    ASSERT(is_idle_domain(d));
+    return per_cpu(scheduler, v->processor);
+}
 #define VCPU2ONLINE(_v) cpupool_domain_cpumask((_v)->domain)
 
 static inline void trace_runstate_change(struct vcpu *v, int new_state)
@@ -633,8 +653,11 @@ void vcpu_force_reschedule(struct vcpu *v)
 
 void restore_vcpu_affinity(struct domain *d)
 {
+    unsigned int cpu = smp_processor_id();
     struct vcpu *v;
 
+    ASSERT(system_state == SYS_STATE_resume);
+
     for_each_vcpu ( d, v )
     {
         spinlock_t *lock = vcpu_schedule_lock_irq(v);
@@ -643,18 +666,34 @@ void restore_vcpu_affinity(struct domain *d)
         {
             cpumask_copy(v->cpu_hard_affinity, v->cpu_hard_affinity_saved);
             v->affinity_broken = 0;
+
         }
 
-        if ( v->processor == smp_processor_id() )
+        /*
+         * During suspend (in cpu_disable_scheduler()), we moved every vCPU
+         * to BSP (which, as of now, is pCPU 0), as a temporary measure to
+         * allow the nonboot processors to have their data structure freed
+         * and go to sleep. But nothing guardantees that the BSP is a valid
+         * pCPU for a particular domain.
+         *
+         * Therefore, here, before actually unpausing the domains, we should
+         * set v->processor of each of their vCPUs to something that will
+         * make sense for the scheduler of the cpupool in which they are in.
+         */
+        cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+                    cpupool_domain_cpumask(v->domain));
+        v->processor = cpumask_any(cpumask_scratch_cpu(cpu));
+
+        if ( v->processor == cpu )
         {
             set_bit(_VPF_migrating, &v->pause_flags);
-            vcpu_schedule_unlock_irq(lock, v);
+            spin_unlock_irq(lock);;
             vcpu_sleep_nosync(v);
             vcpu_migrate(v);
         }
         else
         {
-            vcpu_schedule_unlock_irq(lock, v);
+            spin_unlock_irq(lock);
         }
     }
 
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index d793f5d..5e81813 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -244,8 +244,7 @@ void iommu_domain_destroy(struct domain *d)
     if ( !iommu_enabled || !dom_iommu(d)->platform_ops )
         return;
 
-    if ( need_iommu(d) )
-        iommu_teardown(d);
+    iommu_teardown(d);
 
     arch_iommu_domain_destroy(d);
 }
diff --git a/xen/include/asm-arm/config.h b/xen/include/asm-arm/config.h
index ba61f65..6a92f53 100644
--- a/xen/include/asm-arm/config.h
+++ b/xen/include/asm-arm/config.h
@@ -46,6 +46,8 @@
 #define MAX_VIRT_CPUS 8
 #endif
 
+#define INVALID_VCPU_ID MAX_VIRT_CPUS
+
 #define asmlinkage /* Nothing needed */
 
 #define __LINUX_ARM_ARCH__ 7
diff --git a/xen/include/asm-arm/cpufeature.h b/xen/include/asm-arm/cpufeature.h
index af60fe3..c0a25ae 100644
--- a/xen/include/asm-arm/cpufeature.h
+++ b/xen/include/asm-arm/cpufeature.h
@@ -24,7 +24,7 @@
 #define cpu_has_arm       (boot_cpu_feature32(arm) == 1)
 #define cpu_has_thumb     (boot_cpu_feature32(thumb) >= 1)
 #define cpu_has_thumb2    (boot_cpu_feature32(thumb) >= 3)
-#define cpu_has_jazelle   (boot_cpu_feature32(jazelle) >= 0)
+#define cpu_has_jazelle   (boot_cpu_feature32(jazelle) > 0)
 #define cpu_has_thumbee   (boot_cpu_feature32(thumbee) == 1)
 #define cpu_has_aarch32   (cpu_has_arm || cpu_has_thumb)
 
diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
index fdb6b47..9e71776 100644
--- a/xen/include/asm-arm/p2m.h
+++ b/xen/include/asm-arm/p2m.h
@@ -95,6 +95,9 @@ struct p2m_domain {
 
     /* back pointer to domain */
     struct domain *domain;
+
+    /* Keeping track on which CPU this p2m was used and for which vCPU */
+    uint8_t last_vcpu_ran[NR_CPUS];
 };
 
 /*
diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h
index c492d6d..a0f9344 100644
--- a/xen/include/asm-arm/page.h
+++ b/xen/include/asm-arm/page.h
@@ -292,24 +292,20 @@ extern size_t cacheline_bytes;
 
 static inline int invalidate_dcache_va_range(const void *p, unsigned long size)
 {
-    size_t off;
     const void *end = p + size;
+    size_t cacheline_mask = cacheline_bytes - 1;
 
     dsb(sy);           /* So the CPU issues all writes to the range */
 
-    off = (unsigned long)p % cacheline_bytes;
-    if ( off )
+    if ( (uintptr_t)p & cacheline_mask )
     {
-        p -= off;
+        p = (void *)((uintptr_t)p & ~cacheline_mask);
         asm volatile (__clean_and_invalidate_dcache_one(0) : : "r" (p));
         p += cacheline_bytes;
-        size -= cacheline_bytes - off;
     }
-    off = (unsigned long)end % cacheline_bytes;
-    if ( off )
+    if ( (uintptr_t)end & cacheline_mask )
     {
-        end -= off;
-        size -= off;
+        end = (void *)((uintptr_t)end & ~cacheline_mask);
         asm volatile (__clean_and_invalidate_dcache_one(0) : : "r" (end));
     }
 
@@ -323,9 +319,10 @@ static inline int invalidate_dcache_va_range(const void *p, unsigned long size)
 
 static inline int clean_dcache_va_range(const void *p, unsigned long size)
 {
-    const void *end;
+    const void *end = p + size;
     dsb(sy);           /* So the CPU issues all writes to the range */
-    for ( end = p + size; p < end; p += cacheline_bytes )
+    p = (void *)((uintptr_t)p & ~(cacheline_bytes - 1));
+    for ( ; p < end; p += cacheline_bytes )
         asm volatile (__clean_dcache_one(0) : : "r" (p));
     dsb(sy);           /* So we know the flushes happen before continuing */
     /* ARM callers assume that dcache_* functions cannot fail. */
@@ -335,9 +332,10 @@ static inline int clean_dcache_va_range(const void *p, unsigned long size)
 static inline int clean_and_invalidate_dcache_va_range
     (const void *p, unsigned long size)
 {
-    const void *end;
+    const void *end = p + size;
     dsb(sy);         /* So the CPU issues all writes to the range */
-    for ( end = p + size; p < end; p += cacheline_bytes )
+    p = (void *)((uintptr_t)p & ~(cacheline_bytes - 1));
+    for ( ; p < end; p += cacheline_bytes )
         asm volatile (__clean_and_invalidate_dcache_one(0) : : "r" (p));
     dsb(sy);         /* So we know the flushes happen before continuing */
     /* ARM callers assume that dcache_* functions cannot fail. */
diff --git a/xen/include/asm-arm/sysregs.h b/xen/include/asm-arm/sysregs.h
index 570f43e..887368e 100644
--- a/xen/include/asm-arm/sysregs.h
+++ b/xen/include/asm-arm/sysregs.h
@@ -90,6 +90,7 @@
 #define HSR_SYSREG_ICC_SGI1R_EL1  HSR_SYSREG(3,0,c12,c11,5)
 #define HSR_SYSREG_ICC_ASGI1R_EL1 HSR_SYSREG(3,1,c12,c11,6)
 #define HSR_SYSREG_ICC_SGI0R_EL1  HSR_SYSREG(3,2,c12,c11,7)
+#define HSR_SYSREG_ICC_SRE_EL1    HSR_SYSREG(3,0,c12,c12,5)
 #define HSR_SYSREG_CONTEXTIDR_EL1 HSR_SYSREG(3,0,c13,c0,1)
 
 #define HSR_SYSREG_PMCR_EL0       HSR_SYSREG(3,3,c9,c12,0)
diff --git a/xen/include/asm-arm/vgic.h b/xen/include/asm-arm/vgic.h
index 300f461..51b187f 100644
--- a/xen/include/asm-arm/vgic.h
+++ b/xen/include/asm-arm/vgic.h
@@ -69,7 +69,7 @@ struct pending_irq
     unsigned long status;
     struct irq_desc *desc; /* only set it the irq corresponds to a physical irq */
     unsigned int irq;
-#define GIC_INVALID_LR         ~(uint8_t)0
+#define GIC_INVALID_LR         (uint8_t)~0
     uint8_t lr;
     uint8_t priority;
     /* inflight is used to append instances of pending_irq to
@@ -107,7 +107,9 @@ struct vgic_irq_rank {
 
     /*
      * It's more convenient to store a target VCPU per vIRQ
-     * than the register ITARGETSR/IROUTER itself
+     * than the register ITARGETSR/IROUTER itself.
+     * Use atomic operations to read/write the vcpu fields to avoid
+     * taking the rank lock.
      */
     uint8_t vcpu[32];
 };
diff --git a/xen/include/asm-x86/hvm/svm/vmcb.h b/xen/include/asm-x86/hvm/svm/vmcb.h
index bad2382..a3cd1b1 100644
--- a/xen/include/asm-x86/hvm/svm/vmcb.h
+++ b/xen/include/asm-x86/hvm/svm/vmcb.h
@@ -308,7 +308,7 @@ enum VMEXIT_EXITCODE
 /* Definition of segment state is borrowed by the generic HVM code. */
 typedef struct segment_register svm_segment_register_t;
 
-typedef union __packed
+typedef union
 {
     u64 bytes;
     struct 
@@ -322,7 +322,7 @@ typedef union __packed
     } fields;
 } eventinj_t;
 
-typedef union __packed
+typedef union
 {
     u64 bytes;
     struct 
@@ -340,7 +340,7 @@ typedef union __packed
     } fields;
 } vintr_t;
 
-typedef union __packed
+typedef union
 {
     u64 bytes;
     struct 
@@ -357,7 +357,7 @@ typedef union __packed
     } fields;
 } ioio_info_t;
 
-typedef union __packed
+typedef union
 {
     u64 bytes;
     struct
@@ -366,7 +366,7 @@ typedef union __packed
     } fields;
 } lbrctrl_t;
 
-typedef union __packed
+typedef union
 {
     uint32_t bytes;
     struct
@@ -401,7 +401,7 @@ typedef union __packed
 #define IOPM_SIZE   (12 * 1024)
 #define MSRPM_SIZE  (8  * 1024)
 
-struct __packed vmcb_struct {
+struct vmcb_struct {
     u32 _cr_intercepts;         /* offset 0x00 - cleanbit 0 */
     u32 _dr_intercepts;         /* offset 0x04 - cleanbit 0 */
     u32 _exception_intercepts;  /* offset 0x08 - cleanbit 0 */
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 997f4f5..0dfd5f8 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -238,6 +238,7 @@ void vmx_destroy_vmcs(struct vcpu *v);
 void vmx_vmcs_enter(struct vcpu *v);
 bool_t __must_check vmx_vmcs_try_enter(struct vcpu *v);
 void vmx_vmcs_exit(struct vcpu *v);
+void vmx_vmcs_reload(struct vcpu *v);
 
 #define CPU_BASED_VIRTUAL_INTR_PENDING        0x00000004
 #define CPU_BASED_USE_TSC_OFFSETING           0x00000008
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 1b4d1c3..6687dbc 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -253,8 +253,8 @@ struct spage_info
 #define is_xen_heap_mfn(mfn) \
     (__mfn_valid(mfn) && is_xen_heap_page(__mfn_to_page(mfn)))
 #define is_xen_fixed_mfn(mfn)                     \
-    ((((mfn) << PAGE_SHIFT) >= __pa(&_start)) &&  \
-     (((mfn) << PAGE_SHIFT) <= __pa(&_end)))
+    ((((mfn) << PAGE_SHIFT) >= __pa(&_stext)) &&  \
+     (((mfn) << PAGE_SHIFT) <= __pa(&__2M_rwdata_end)))
 
 #define PRtype_info "016lx"/* should only be used for printk's */
 
diff --git a/xen/include/asm-x86/x86_64/uaccess.h b/xen/include/asm-x86/x86_64/uaccess.h
index 953abe7..4275e66 100644
--- a/xen/include/asm-x86/x86_64/uaccess.h
+++ b/xen/include/asm-x86/x86_64/uaccess.h
@@ -29,8 +29,9 @@ extern void *xlat_malloc(unsigned long *xlat_page_current, size_t size);
 /*
  * Valid if in +ve half of 48-bit address space, or above Xen-reserved area.
  * This is also valid for range checks (addr, addr+size). As long as the
- * start address is outside the Xen-reserved area then we will access a
- * non-canonical address (and thus fault) before ever reaching VIRT_START.
+ * start address is outside the Xen-reserved area, sequential accesses
+ * (starting at addr) will hit a non-canonical address (and thus fault)
+ * before ever reaching VIRT_START.
  */
 #define __addr_ok(addr) \
     (((unsigned long)(addr) < (1UL<<47)) || \
@@ -40,7 +41,8 @@ extern void *xlat_malloc(unsigned long *xlat_page_current, size_t size);
     (__addr_ok(addr) || is_compat_arg_xlat_range(addr, size))
 
 #define array_access_ok(addr, count, size) \
-    (access_ok(addr, (count)*(size)))
+    (likely(((count) ?: 0UL) < (~0UL / (size))) && \
+     access_ok(addr, (count) * (size)))
 
 #define __compat_addr_ok(d, addr) \
     ((unsigned long)(addr) < HYPERVISOR_COMPAT_VIRT_START(d))
diff --git a/xen/include/public/arch-x86/hvm/save.h b/xen/include/public/arch-x86/hvm/save.h
index 8d73b51..419a3b2 100644
--- a/xen/include/public/arch-x86/hvm/save.h
+++ b/xen/include/public/arch-x86/hvm/save.h
@@ -135,7 +135,7 @@ struct hvm_hw_cpu {
     uint64_t shadow_gs;
 
     /* msr content saved/restored. */
-    uint64_t msr_flags;
+    uint64_t msr_flags; /* Obsolete, ignored. */
     uint64_t msr_lstar;
     uint64_t msr_star;
     uint64_t msr_cstar;
@@ -249,7 +249,7 @@ struct hvm_hw_cpu_compat {
     uint64_t shadow_gs;
 
     /* msr content saved/restored. */
-    uint64_t msr_flags;
+    uint64_t msr_flags; /* Obsolete, ignored. */
     uint64_t msr_lstar;
     uint64_t msr_star;
     uint64_t msr_cstar;
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index 5bf840f..315a4e8 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -222,9 +222,9 @@ DEFINE_XEN_GUEST_HANDLE(xen_machphys_mapping_t);
                                     * XENMEM_add_to_physmap_batch only. */
 #define XENMAPSPACE_dev_mmio     5 /* device mmio region
                                       ARM only; the region is mapped in
-                                      Stage-2 using the memory attribute
-                                      "Device-nGnRE" (previously named
-                                      "Device" on ARMv7) */
+                                      Stage-2 using the Normal Memory
+                                      Inner/Outer Write-Back Cacheable
+                                      memory attribute. */
 /* ` } */
 
 /*
diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index 95460af..edc9086 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -712,18 +712,13 @@ static XSM_INLINE int xsm_pmu_op (XSM_DEFAULT_ARG struct domain *d, unsigned int
     XSM_ASSERT_ACTION(XSM_OTHER);
     switch ( op )
     {
-    case XENPMU_mode_set:
-    case XENPMU_mode_get:
-    case XENPMU_feature_set:
-    case XENPMU_feature_get:
-        return xsm_default_action(XSM_PRIV, d, current->domain);
     case XENPMU_init:
     case XENPMU_finish:
     case XENPMU_lvtpc_set:
     case XENPMU_flush:
         return xsm_default_action(XSM_HOOK, d, current->domain);
     default:
-        return -EPERM;
+        return xsm_default_action(XSM_PRIV, d, current->domain);
     }
 }

diff -Nru xen-4.8.1~pre.2017.01.23/Config.mk xen-4.8.1/Config.mk
--- xen-4.8.1~pre.2017.01.23/Config.mk	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/Config.mk	2017-04-10 14:21:48.000000000 +0100
@@ -277,8 +277,8 @@
 MINIOS_UPSTREAM_URL ?= git://xenbits.xen.org/mini-os.git
 endif
 OVMF_UPSTREAM_REVISION ?= bc54e50e0fe03c570014f363b547426913e92449
-QEMU_UPSTREAM_REVISION ?= qemu-xen-4.8.0
-MINIOS_UPSTREAM_REVISION ?= xen-RELEASE-4.8.0
+QEMU_UPSTREAM_REVISION ?= qemu-xen-4.8.1
+MINIOS_UPSTREAM_REVISION ?= xen-RELEASE-4.8.1
 # Wed Sep 28 11:50:04 2016 +0200
 # minios: fix build issue with xen_*mb defines
 
@@ -289,9 +289,7 @@
 ETHERBOOT_NICS ?= rtl8139 8086100e
 
 
-QEMU_TRADITIONAL_REVISION ?= 095261a9ad5c31b9ed431f8382e8aa223089c85b
-# Mon Nov 14 17:19:46 2016 +0000
-# qemu: ioport_read, ioport_write: be defensive about 32-bit addresses
+QEMU_TRADITIONAL_REVISION ?= xen-4.8.1
 
 # Specify which qemu-dm to use. This may be `ioemu' to use the old
 # Mercurial in-tree version, or a local directory, or a git URL.
diff -Nru xen-4.8.1~pre.2017.01.23/debian/changelog xen-4.8.1/debian/changelog
--- xen-4.8.1~pre.2017.01.23/debian/changelog	2017-01-23 16:23:58.000000000 +0000
+++ xen-4.8.1/debian/changelog	2017-04-18 18:05:00.000000000 +0100
@@ -1,3 +1,13 @@
+xen (4.8.1-1) unstable; urgency=high
+
+  * Update to upstream 4.8.1 release.
+    Changes include numerous bugfixes, including security fixes for:
+      XSA-212 / CVE-2017-7228   Closes:#859560
+      XSA-207 / no cve yet      Closes:#856229
+      XSA-206 / no cve yet      no Debian bug
+
+ -- Ian Jackson <ian.jackson@eu.citrix.com>  Tue, 18 Apr 2017 18:05:00 +0100
+
 xen (4.8.1~pre.2017.01.23-1) unstable; urgency=medium
 
   * Update to current upstream stable-4.8 git branch (Xen 4.8.1-pre).
diff -Nru xen-4.8.1~pre.2017.01.23/debian/control.md5sum xen-4.8.1/debian/control.md5sum
--- xen-4.8.1~pre.2017.01.23/debian/control.md5sum	2017-01-23 16:23:58.000000000 +0000
+++ xen-4.8.1/debian/control.md5sum	2017-04-18 18:05:13.000000000 +0100
@@ -1,4 +1,4 @@
-d74356cd54456cb07dc4a89ff001c233  debian/changelog
+414390ca652da67ac85ebd905500eb66  debian/changelog
 dc7b5d9f0538e3180af4e9aff9b0bd57  debian/bin/gencontrol.py
 20e336dbea44b1641802eff0dde9569b  debian/templates/control.main.in
 a15fa64ce6deead28d33c1581b14dba7  debian/templates/xen-hypervisor.postinst.in
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/config-prefix.diff xen-4.8.1/debian/patches/config-prefix.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/config-prefix.diff	2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/config-prefix.diff	2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
 From: Bastian Blank <waldi@debian.org>
 Date: Sat, 5 Jul 2014 11:46:45 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 3ff81ee48afd44afd4c5bc2dbd4daf2edeb0d8fc
+X-Dgit-Generated: 4.8.1-1 a376dc60f2926c349685de141c3993c7d791a494
 Subject: config-prefix.diff
 
 Patch-Name: config-prefix.diff
 
 ---
 
---- xen-4.8.1~pre.2017.01.23.orig/Config.mk
-+++ xen-4.8.1~pre.2017.01.23/Config.mk
+--- xen-4.8.1.orig/Config.mk
++++ xen-4.8.1/Config.mk
 @@ -82,7 +82,7 @@ EXTRA_LIB += $(EXTRA_PREFIX)/lib
  endif
  
@@ -18,8 +18,8 @@
  # The above requires that prefix contains *no spaces*. This variable is here
  # to permit the user to set PYTHON_PREFIX_ARG to '' to workaround this bug:
  #  https://bugs.launchpad.net/ubuntu/+bug/362570
---- xen-4.8.1~pre.2017.01.23.orig/config/Paths.mk.in
-+++ xen-4.8.1~pre.2017.01.23/config/Paths.mk.in
+--- xen-4.8.1.orig/config/Paths.mk.in
++++ xen-4.8.1/config/Paths.mk.in
 @@ -13,6 +13,7 @@
  # http://wiki.xen.org/wiki/Category:Host_Configuration#System_wide_xen_configuration
  
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/rerun-autogen.sh-stretch xen-4.8.1/debian/patches/rerun-autogen.sh-stretch
--- xen-4.8.1~pre.2017.01.23/debian/patches/rerun-autogen.sh-stretch	2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/rerun-autogen.sh-stretch	2017-04-18 18:07:28.000000000 +0100
@@ -1,6 +1,6 @@
 From: Ian Jackson <ian.jackson@citrix.com>
 Date: Fri, 28 Oct 2016 14:52:13 +0100
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 b3c8b0d4caa81fac565ec8439f33ff8677827dc5
+X-Dgit-Generated: 4.8.1-1 b1ceff30c4420ee49c49761e183b4ee2a66e3ed4
 Subject: Rerun autogen.sh (stretch)
 
 Using autoconf 2.69-10 (amd64)
@@ -9,8 +9,8 @@
 
 ---
 
---- xen-4.8.1~pre.2017.01.23.orig/configure
-+++ xen-4.8.1~pre.2017.01.23/configure
+--- xen-4.8.1.orig/configure
++++ xen-4.8.1/configure
 @@ -641,6 +641,7 @@ infodir
  docdir
  oldincludedir
@@ -60,8 +60,8 @@
    --libdir=DIR            object code libraries [EPREFIX/lib]
    --includedir=DIR        C header files [PREFIX/include]
    --oldincludedir=DIR     C header files for non-gcc [/usr/include]
---- xen-4.8.1~pre.2017.01.23.orig/docs/configure
-+++ xen-4.8.1~pre.2017.01.23/docs/configure
+--- xen-4.8.1.orig/docs/configure
++++ xen-4.8.1/docs/configure
 @@ -632,6 +632,7 @@ infodir
  docdir
  oldincludedir
@@ -111,8 +111,8 @@
    --libdir=DIR            object code libraries [EPREFIX/lib]
    --includedir=DIR        C header files [PREFIX/include]
    --oldincludedir=DIR     C header files for non-gcc [/usr/include]
---- xen-4.8.1~pre.2017.01.23.orig/stubdom/configure
-+++ xen-4.8.1~pre.2017.01.23/stubdom/configure
+--- xen-4.8.1.orig/stubdom/configure
++++ xen-4.8.1/stubdom/configure
 @@ -659,6 +659,7 @@ infodir
  docdir
  oldincludedir
@@ -162,8 +162,8 @@
    --libdir=DIR            object code libraries [EPREFIX/lib]
    --includedir=DIR        C header files [PREFIX/include]
    --oldincludedir=DIR     C header files for non-gcc [/usr/include]
---- xen-4.8.1~pre.2017.01.23.orig/tools/configure
-+++ xen-4.8.1~pre.2017.01.23/tools/configure
+--- xen-4.8.1.orig/tools/configure
++++ xen-4.8.1/tools/configure
 @@ -767,6 +767,7 @@ infodir
  docdir
  oldincludedir
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-blktap2-prefix.diff xen-4.8.1/debian/patches/tools-blktap2-prefix.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-blktap2-prefix.diff	2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-blktap2-prefix.diff	2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
 From: Bastian Blank <waldi@debian.org>
 Date: Sat, 5 Jul 2014 11:46:53 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 a28766f0ef4d267d0af7becdca134ad5a1d669e1
+X-Dgit-Generated: 4.8.1-1 ad82a5763c9d4ebeb72fa838c4abc77b72596370
 Subject: tools-blktap2-prefix.diff
 
 Patch-Name: tools-blktap2-prefix.diff
 
 ---
 
---- xen-4.8.1~pre.2017.01.23.orig/tools/blktap2/control/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/blktap2/control/Makefile
+--- xen-4.8.1.orig/tools/blktap2/control/Makefile
++++ xen-4.8.1/tools/blktap2/control/Makefile
 @@ -1,10 +1,7 @@
  XEN_ROOT := $(CURDIR)/../../../
  include $(XEN_ROOT)/tools/Rules.mk
@@ -68,8 +68,8 @@
  	rm -f *~
  
  distclean: clean
---- xen-4.8.1~pre.2017.01.23.orig/tools/blktap2/vhd/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/blktap2/vhd/Makefile
+--- xen-4.8.1.orig/tools/blktap2/vhd/Makefile
++++ xen-4.8.1/tools/blktap2/vhd/Makefile
 @@ -12,6 +12,7 @@ CFLAGS            += -Werror
  CFLAGS            += -Wno-unused
  CFLAGS            += -I../include
@@ -78,8 +78,8 @@
  
  ifeq ($(CONFIG_X86_64),y)
  CFLAGS            += -fPIC
---- xen-4.8.1~pre.2017.01.23.orig/tools/blktap2/vhd/lib/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/blktap2/vhd/lib/Makefile
+--- xen-4.8.1.orig/tools/blktap2/vhd/lib/Makefile
++++ xen-4.8.1/tools/blktap2/vhd/lib/Makefile
 @@ -2,25 +2,19 @@ XEN_ROOT=$(CURDIR)/../../../..
  BLKTAP_ROOT := ../..
  include $(XEN_ROOT)/tools/Rules.mk
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-console-prefix.diff xen-4.8.1/debian/patches/tools-console-prefix.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-console-prefix.diff	2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-console-prefix.diff	2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
 From: Bastian Blank <waldi@debian.org>
 Date: Sat, 5 Jul 2014 11:46:54 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 cdeb6d3730e004fbea6063379cc6ca80f9db5788
+X-Dgit-Generated: 4.8.1-1 54721627e1abd8f67827b3383ddfa6c174b572b9
 Subject: tools-console-prefix.diff
 
 Patch-Name: tools-console-prefix.diff
 
 ---
 
---- xen-4.8.1~pre.2017.01.23.orig/tools/console/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/console/Makefile
+--- xen-4.8.1.orig/tools/console/Makefile
++++ xen-4.8.1/tools/console/Makefile
 @@ -8,6 +8,7 @@ CFLAGS  += $(CFLAGS_libxenstore)
  LDLIBS += $(LDLIBS_libxenctrl)
  LDLIBS += $(LDLIBS_libxenstore)
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-include-install.diff xen-4.8.1/debian/patches/tools-include-install.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-include-install.diff	2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-include-install.diff	2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
 From: Bastian Blank <waldi@debian.org>
 Date: Sat, 5 Jul 2014 11:47:30 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 7169aa91d7ccc325357b27120340b57561cf8438
+X-Dgit-Generated: 4.8.1-1 732acd91e545566bca164b886afa82027df7c463
 Subject: tools-include-install.diff
 
 Patch-Name: tools-include-install.diff
 
 ---
 
---- xen-4.8.1~pre.2017.01.23.orig/tools/include/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/include/Makefile
+--- xen-4.8.1.orig/tools/include/Makefile
++++ xen-4.8.1/tools/include/Makefile
 @@ -14,7 +14,6 @@ xen-foreign:
  xen/.dir:
  	@rm -rf xen
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-libfsimage-abiname.diff xen-4.8.1/debian/patches/tools-libfsimage-abiname.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-libfsimage-abiname.diff	2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-libfsimage-abiname.diff	2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
 From: Bastian Blank <waldi@debian.org>
 Date: Sat, 5 Jul 2014 11:46:47 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 4ad1691c46fa9bebaa95e9e29b7081e446243c9d
+X-Dgit-Generated: 4.8.1-1 2a020aa59aec69c1d00f3fb8c86b188873e802ea
 Subject: tools-libfsimage-abiname.diff
 
 Patch-Name: tools-libfsimage-abiname.diff
 
 ---
 
---- xen-4.8.1~pre.2017.01.23.orig/tools/libfsimage/common/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/libfsimage/common/Makefile
+--- xen-4.8.1.orig/tools/libfsimage/common/Makefile
++++ xen-4.8.1/tools/libfsimage/common/Makefile
 @@ -1,9 +1,6 @@
  XEN_ROOT = $(CURDIR)/../../..
  include $(XEN_ROOT)/tools/libfsimage/Rules.mk
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-libfsimage-prefix.diff xen-4.8.1/debian/patches/tools-libfsimage-prefix.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-libfsimage-prefix.diff	2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-libfsimage-prefix.diff	2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
 From: Bastian Blank <waldi@debian.org>
 Date: Sat, 5 Jul 2014 11:46:55 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 a88cc0796836248ff332ad1c7e176c6a0609b4fa
+X-Dgit-Generated: 4.8.1-1 0fc6ef9d31deed6668d7f18924664cbde155ea85
 Subject: tools-libfsimage-prefix.diff
 
 Patch-Name: tools-libfsimage-prefix.diff
 
 ---
 
---- xen-4.8.1~pre.2017.01.23.orig/tools/libfsimage/Rules.mk
-+++ xen-4.8.1~pre.2017.01.23/tools/libfsimage/Rules.mk
+--- xen-4.8.1.orig/tools/libfsimage/Rules.mk
++++ xen-4.8.1/tools/libfsimage/Rules.mk
 @@ -3,10 +3,11 @@ include $(XEN_ROOT)/tools/Rules.mk
  CFLAGS += -Wno-unknown-pragmas -I$(XEN_ROOT)/tools/libfsimage/common/ -DFSIMAGE_FSDIR=\"$(FSDIR)\"
  CFLAGS += -Werror -D_GNU_SOURCE
@@ -22,8 +22,8 @@
  
  FSLIB = fsimage.so
  
---- xen-4.8.1~pre.2017.01.23.orig/tools/libfsimage/common/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/libfsimage/common/Makefile
+--- xen-4.8.1.orig/tools/libfsimage/common/Makefile
++++ xen-4.8.1/tools/libfsimage/common/Makefile
 @@ -1,6 +1,8 @@
  XEN_ROOT = $(CURDIR)/../../..
  include $(XEN_ROOT)/tools/libfsimage/Rules.mk
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-libxc-abiname.diff xen-4.8.1/debian/patches/tools-libxc-abiname.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-libxc-abiname.diff	2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-libxc-abiname.diff	2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
 From: Bastian Blank <waldi@debian.org>
 Date: Sat, 5 Jul 2014 11:46:48 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 db4a464d8cb49e3c33a1bd2f74f4321f8e20df2d
+X-Dgit-Generated: 4.8.1-1 45ad000e7e61a57a78bca482c464c78badbfeab5
 Subject: tools-libxc-abiname.diff
 
 Patch-Name: tools-libxc-abiname.diff
 
 ---
 
---- xen-4.8.1~pre.2017.01.23.orig/tools/libxc/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/libxc/Makefile
+--- xen-4.8.1.orig/tools/libxc/Makefile
++++ xen-4.8.1/tools/libxc/Makefile
 @@ -1,9 +1,6 @@
  XEN_ROOT = $(CURDIR)/../..
  include $(XEN_ROOT)/tools/Rules.mk
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-libxl-abiname.diff xen-4.8.1/debian/patches/tools-libxl-abiname.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-libxl-abiname.diff	2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-libxl-abiname.diff	2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
 From: Bastian Blank <waldi@debian.org>
 Date: Sat, 5 Jul 2014 11:46:49 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 4a36b7f60eeb84d27ba814d7dba8214bdb96fb0c
+X-Dgit-Generated: 4.8.1-1 b6860f8b5e4980eedd3e75e5e81be73343d92558
 Subject: tools-libxl-abiname.diff
 
 Patch-Name: tools-libxl-abiname.diff
 
 ---
 
---- xen-4.8.1~pre.2017.01.23.orig/tools/libxl/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/libxl/Makefile
+--- xen-4.8.1.orig/tools/libxl/Makefile
++++ xen-4.8.1/tools/libxl/Makefile
 @@ -5,12 +5,6 @@
  XEN_ROOT = $(CURDIR)/../..
  include $(XEN_ROOT)/tools/Rules.mk
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-libxl-prefix.diff xen-4.8.1/debian/patches/tools-libxl-prefix.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-libxl-prefix.diff	2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-libxl-prefix.diff	2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
 From: Bastian Blank <waldi@debian.org>
 Date: Sat, 5 Jul 2014 11:46:57 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 a08a69ac537d3733ab0663d35991fa2c2c142108
+X-Dgit-Generated: 4.8.1-1 0c590f711182ecc6c0aaee5fc0bf89f384c98fce
 Subject: tools-libxl-prefix.diff
 
 Patch-Name: tools-libxl-prefix.diff
 
 ---
 
---- xen-4.8.1~pre.2017.01.23.orig/tools/libxl/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/libxl/Makefile
+--- xen-4.8.1.orig/tools/libxl/Makefile
++++ xen-4.8.1/tools/libxl/Makefile
 @@ -12,6 +12,8 @@ CFLAGS += -I. -fPIC
  ifeq ($(CONFIG_Linux),y)
  LIBUUID_LIBS += -luuid
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-misc-prefix.diff xen-4.8.1/debian/patches/tools-misc-prefix.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-misc-prefix.diff	2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-misc-prefix.diff	2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
 From: Bastian Blank <waldi@debian.org>
 Date: Sat, 5 Jul 2014 11:46:59 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 f63435d7a553a80a3490c5ef8999b5ac175bc5fe
+X-Dgit-Generated: 4.8.1-1 92bd9e6a61c01d45b42463d3097f87935167e731
 Subject: tools-misc-prefix.diff
 
 Patch-Name: tools-misc-prefix.diff
 
 ---
 
---- xen-4.8.1~pre.2017.01.23.orig/tools/misc/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/misc/Makefile
+--- xen-4.8.1.orig/tools/misc/Makefile
++++ xen-4.8.1/tools/misc/Makefile
 @@ -54,12 +54,8 @@ all build: $(TARGETS_BUILD)
  
  .PHONY: install
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-pygrub-prefix.diff xen-4.8.1/debian/patches/tools-pygrub-prefix.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-pygrub-prefix.diff	2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-pygrub-prefix.diff	2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
 From: Bastian Blank <waldi@debian.org>
 Date: Sat, 5 Jul 2014 11:47:01 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 72f92b5bd96f0bd5726deadaeff420361fb13a0b
+X-Dgit-Generated: 4.8.1-1 e11fc351a6d75288200f781c656599ec3547c484
 Subject: tools-pygrub-prefix.diff
 
 Patch-Name: tools-pygrub-prefix.diff
 
 ---
 
---- xen-4.8.1~pre.2017.01.23.orig/tools/pygrub/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/pygrub/Makefile
+--- xen-4.8.1.orig/tools/pygrub/Makefile
++++ xen-4.8.1/tools/pygrub/Makefile
 @@ -16,11 +16,6 @@ install: all
  	CC="$(CC)" CFLAGS="$(PY_CFLAGS)" LDFLAGS="$(PY_LDFLAGS)" $(PYTHON) \
  		setup.py install $(PYTHON_PREFIX_ARG) --root="$(DESTDIR)"  \
@@ -21,8 +21,8 @@
  
  .PHONY: clean
  clean:
---- xen-4.8.1~pre.2017.01.23.orig/tools/pygrub/setup.py
-+++ xen-4.8.1~pre.2017.01.23/tools/pygrub/setup.py
+--- xen-4.8.1.orig/tools/pygrub/setup.py
++++ xen-4.8.1/tools/pygrub/setup.py
 @@ -4,11 +4,13 @@ import os
  import sys
  
@@ -37,8 +37,8 @@
      include_dirs = [ XEN_ROOT + "/tools/libfsimage/common/" ],
      library_dirs = [ XEN_ROOT + "/tools/libfsimage/common/" ],
      libraries = ["fsimage"],
---- xen-4.8.1~pre.2017.01.23.orig/tools/pygrub/src/pygrub
-+++ xen-4.8.1~pre.2017.01.23/tools/pygrub/src/pygrub
+--- xen-4.8.1.orig/tools/pygrub/src/pygrub
++++ xen-4.8.1/tools/pygrub/src/pygrub
 @@ -21,6 +21,8 @@ import xen.lowlevel.xc
  import curses, _curses, curses.wrapper, curses.textpad, curses.ascii
  import getopt
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-pygrub-remove-static-solaris-support xen-4.8.1/debian/patches/tools-pygrub-remove-static-solaris-support
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-pygrub-remove-static-solaris-support	2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-pygrub-remove-static-solaris-support	2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
 From: Bastian Blank <waldi@debian.org>
 Date: Sat, 5 Jul 2014 11:47:29 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 33f2e5cf7348dc23eaac81e8ac9a9c7e6ed94f15
+X-Dgit-Generated: 4.8.1-1 00315ed8c451173d0d212d55a831023166f3b212
 Subject: Remove static solaris support from pygrub
 
 Patch-Name: tools-pygrub-remove-static-solaris-support
 
 ---
 
---- xen-4.8.1~pre.2017.01.23.orig/tools/pygrub/src/pygrub
-+++ xen-4.8.1~pre.2017.01.23/tools/pygrub/src/pygrub
+--- xen-4.8.1.orig/tools/pygrub/src/pygrub
++++ xen-4.8.1/tools/pygrub/src/pygrub
 @@ -16,7 +16,6 @@ import os, sys, string, struct, tempfile
  import copy
  import logging
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-python-prefix.diff xen-4.8.1/debian/patches/tools-python-prefix.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-python-prefix.diff	2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-python-prefix.diff	2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
 From: Bastian Blank <waldi@debian.org>
 Date: Sat, 5 Jul 2014 11:47:02 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 20d81801d70a3d3d6517a6a2d28fd4eabcd99e07
+X-Dgit-Generated: 4.8.1-1 5ea3aead5ce755e99c1e811dc3bdf74cec9e991f
 Subject: tools-python-prefix.diff
 
 Patch-Name: tools-python-prefix.diff
 
 ---
 
---- xen-4.8.1~pre.2017.01.23.orig/tools/python/setup.py
-+++ xen-4.8.1~pre.2017.01.23/tools/python/setup.py
+--- xen-4.8.1.orig/tools/python/setup.py
++++ xen-4.8.1/tools/python/setup.py
 @@ -5,6 +5,7 @@ import os, sys
  XEN_ROOT = "../.."
  
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-rpath.diff xen-4.8.1/debian/patches/tools-rpath.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-rpath.diff	2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-rpath.diff	2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
 From: Bastian Blank <waldi@debian.org>
 Date: Sat, 5 Jul 2014 11:46:51 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 e92e68cad6bfc85c595f35e4303932f34b985088
+X-Dgit-Generated: 4.8.1-1 31f508fde90e729a0f734dc00d0c75213f075a2e
 Subject: tools-rpath.diff
 
 Patch-Name: tools-rpath.diff
 
 ---
 
---- xen-4.8.1~pre.2017.01.23.orig/tools/Rules.mk
-+++ xen-4.8.1~pre.2017.01.23/tools/Rules.mk
+--- xen-4.8.1.orig/tools/Rules.mk
++++ xen-4.8.1/tools/Rules.mk
 @@ -9,6 +9,8 @@ include $(XEN_ROOT)/Config.mk
  export _INSTALL := $(INSTALL)
  INSTALL = $(XEN_ROOT)/tools/cross-install
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-xcutils-rpath.diff xen-4.8.1/debian/patches/tools-xcutils-rpath.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-xcutils-rpath.diff	2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-xcutils-rpath.diff	2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
 From: Bastian Blank <waldi@debian.org>
 Date: Sat, 5 Jul 2014 11:47:05 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 dd796d91589bf69cf723f68c38eba61a483c2a17
+X-Dgit-Generated: 4.8.1-1 845c9126f103d039326ba1cc06575de8a2d32d39
 Subject: tools-xcutils-rpath.diff
 
 Patch-Name: tools-xcutils-rpath.diff
 
 ---
 
---- xen-4.8.1~pre.2017.01.23.orig/tools/xcutils/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/xcutils/Makefile
+--- xen-4.8.1.orig/tools/xcutils/Makefile
++++ xen-4.8.1/tools/xcutils/Makefile
 @@ -19,6 +19,8 @@ CFLAGS += -Werror
  CFLAGS_readnotes.o  := $(CFLAGS_libxenevtchn) $(CFLAGS_libxenctrl) $(CFLAGS_libxenguest) -I$(XEN_ROOT)/tools/libxc $(CFLAGS_libxencall)
  CFLAGS_lsevtchn.o   := $(CFLAGS_libxenevtchn) $(CFLAGS_libxenctrl)
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-xenmon-install.diff xen-4.8.1/debian/patches/tools-xenmon-install.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-xenmon-install.diff	2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-xenmon-install.diff	2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
 From: Bastian Blank <waldi@debian.org>
 Date: Sat, 5 Jul 2014 11:47:31 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 3ddfd6f3b1e4b5acbb24ef0291eeb6edba20514d
+X-Dgit-Generated: 4.8.1-1 75dded97d0701561959c2fab12f0328058078b40
 Subject: tools-xenmon-install.diff
 
 Patch-Name: tools-xenmon-install.diff
 
 ---
 
---- xen-4.8.1~pre.2017.01.23.orig/tools/xenmon/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/xenmon/Makefile
+--- xen-4.8.1.orig/tools/xenmon/Makefile
++++ xen-4.8.1/tools/xenmon/Makefile
 @@ -13,6 +13,10 @@
  XEN_ROOT=$(CURDIR)/../..
  include $(XEN_ROOT)/tools/Rules.mk
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-xenmon-prefix.diff xen-4.8.1/debian/patches/tools-xenmon-prefix.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-xenmon-prefix.diff	2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-xenmon-prefix.diff	2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
 From: Bastian Blank <waldi@debian.org>
 Date: Sat, 5 Jul 2014 11:47:06 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 61efe37be33cc79b1c0ec9e9f337aeb54dde9f08
+X-Dgit-Generated: 4.8.1-1 3c1dc49f92bcdb9e031a419f3c0014b57fcb96a9
 Subject: tools-xenmon-prefix.diff
 
 Patch-Name: tools-xenmon-prefix.diff
 
 ---
 
---- xen-4.8.1~pre.2017.01.23.orig/tools/xenmon/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/xenmon/Makefile
+--- xen-4.8.1.orig/tools/xenmon/Makefile
++++ xen-4.8.1/tools/xenmon/Makefile
 @@ -18,6 +18,7 @@ CFLAGS  += $(CFLAGS_libxenevtchn)
  CFLAGS  += $(CFLAGS_libxenctrl)
  LDLIBS  += $(LDLIBS_libxenctrl)
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-xenpaging-prefix.diff xen-4.8.1/debian/patches/tools-xenpaging-prefix.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-xenpaging-prefix.diff	2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-xenpaging-prefix.diff	2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
 From: Bastian Blank <waldi@debian.org>
 Date: Sat, 5 Jul 2014 11:47:08 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 6566733aebb9e3bfd448434762fe15e6de6ec927
+X-Dgit-Generated: 4.8.1-1 6b66a39ea6db832a88d94c4d8e256f77e08fe1a3
 Subject: tools-xenpaging-prefix.diff
 
 Patch-Name: tools-xenpaging-prefix.diff
 
 ---
 
---- xen-4.8.1~pre.2017.01.23.orig/tools/xenpaging/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/xenpaging/Makefile
+--- xen-4.8.1.orig/tools/xenpaging/Makefile
++++ xen-4.8.1/tools/xenpaging/Makefile
 @@ -4,7 +4,7 @@ include $(XEN_ROOT)/tools/Rules.mk
  # xenpaging.c and file_ops.c incorrectly use libxc internals
  CFLAGS += $(CFLAGS_libxentoollog) $(CFLAGS_libxenevtchn) $(CFLAGS_libxenctrl) $(CFLAGS_libxenstore) $(PTHREAD_CFLAGS) -I$(XEN_ROOT)/tools/libxc $(CFLAGS_libxencall)
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-xenpmd-prefix.diff xen-4.8.1/debian/patches/tools-xenpmd-prefix.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-xenpmd-prefix.diff	2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-xenpmd-prefix.diff	2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
 From: Bastian Blank <waldi@debian.org>
 Date: Sat, 13 Dec 2014 19:37:02 +0100
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 f26ee91fd13377ac94dfb98d99f92bd5fb7afac1
+X-Dgit-Generated: 4.8.1-1 abbd6a5b077ff2f14d6e715c7f342f02f3b78ef8
 Subject: tools-xenpmd-prefix.diff
 
 Patch-Name: tools-xenpmd-prefix.diff
 
 ---
 
---- xen-4.8.1~pre.2017.01.23.orig/tools/xenpmd/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/xenpmd/Makefile
+--- xen-4.8.1.orig/tools/xenpmd/Makefile
++++ xen-4.8.1/tools/xenpmd/Makefile
 @@ -11,8 +11,8 @@ all: xenpmd
  
  .PHONY: install
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-xenstat-abiname.diff xen-4.8.1/debian/patches/tools-xenstat-abiname.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-xenstat-abiname.diff	2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-xenstat-abiname.diff	2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
 From: Bastian Blank <waldi@debian.org>
 Date: Sat, 5 Jul 2014 11:46:50 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 8b4f535d3fd07f59f7eeded0fb0533ece6dd03dd
+X-Dgit-Generated: 4.8.1-1 a968429393f380a5bf1eab604bd1720f31369fcd
 Subject: tools-xenstat-abiname.diff
 
 Patch-Name: tools-xenstat-abiname.diff
 
 ---
 
---- xen-4.8.1~pre.2017.01.23.orig/tools/xenstat/libxenstat/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/xenstat/libxenstat/Makefile
+--- xen-4.8.1.orig/tools/xenstat/libxenstat/Makefile
++++ xen-4.8.1/tools/xenstat/libxenstat/Makefile
 @@ -18,18 +18,14 @@ include $(XEN_ROOT)/tools/Rules.mk
  LDCONFIG=ldconfig
  MAKE_LINK=ln -sf
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-xenstat-prefix.diff xen-4.8.1/debian/patches/tools-xenstat-prefix.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-xenstat-prefix.diff	2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-xenstat-prefix.diff	2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
 From: Bastian Blank <waldi@debian.org>
 Date: Sat, 5 Jul 2014 11:47:09 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 72db0c1ccf6eae7961706b4d1ceddb7b15adf23d
+X-Dgit-Generated: 4.8.1-1 fa062a38ebfa9a8d1e52ee698c72aff4cb39e969
 Subject: tools-xenstat-prefix.diff
 
 Patch-Name: tools-xenstat-prefix.diff
 
 ---
 
---- xen-4.8.1~pre.2017.01.23.orig/tools/xenstat/libxenstat/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/xenstat/libxenstat/Makefile
+--- xen-4.8.1.orig/tools/xenstat/libxenstat/Makefile
++++ xen-4.8.1/tools/xenstat/libxenstat/Makefile
 @@ -20,7 +20,7 @@ MAKE_LINK=ln -sf
  
  LIB=src/libxenstat.a
@@ -31,8 +31,8 @@
  
  PYLIB=bindings/swig/python/_xenstat.so
  PYMOD=bindings/swig/python/xenstat.py
---- xen-4.8.1~pre.2017.01.23.orig/tools/xenstat/xentop/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/xenstat/xentop/Makefile
+--- xen-4.8.1.orig/tools/xenstat/xentop/Makefile
++++ xen-4.8.1/tools/xenstat/xentop/Makefile
 @@ -19,7 +19,9 @@ all install xentop:
  else
  
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-xenstore-compatibility.diff xen-4.8.1/debian/patches/tools-xenstore-compatibility.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-xenstore-compatibility.diff	2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-xenstore-compatibility.diff	2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
 From: Bastian Blank <waldi@debian.org>
 Date: Sat, 5 Jul 2014 11:47:36 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 cd76fd2a0e5f54534a02691aaf30ff4ed585224b
+X-Dgit-Generated: 4.8.1-1 e0deca5e873be2aeb99ad58aed95eaa9c7c8ce35
 Subject: tools-xenstore-compatibility.diff
 
 Patch-Name: tools-xenstore-compatibility.diff
 
 ---
 
---- xen-4.8.1~pre.2017.01.23.orig/tools/xenstore/include/xenstore.h
-+++ xen-4.8.1~pre.2017.01.23/tools/xenstore/include/xenstore.h
+--- xen-4.8.1.orig/tools/xenstore/include/xenstore.h
++++ xen-4.8.1/tools/xenstore/include/xenstore.h
 @@ -25,6 +25,7 @@
  
  #define XS_OPEN_READONLY	1UL<<0
@@ -17,8 +17,8 @@
  
  /*
   * Setting XS_UNWATCH_FILTER arranges that after xs_unwatch, no
---- xen-4.8.1~pre.2017.01.23.orig/tools/xenstore/xenstore_client.c
-+++ xen-4.8.1~pre.2017.01.23/tools/xenstore/xenstore_client.c
+--- xen-4.8.1.orig/tools/xenstore/xenstore_client.c
++++ xen-4.8.1/tools/xenstore/xenstore_client.c
 @@ -636,7 +636,7 @@ main(int argc, char **argv)
  	    max_width = ws.ws_col - 2;
      }
@@ -28,8 +28,8 @@
      if (xsh == NULL) err(1, "xs_open");
  
  again:
---- xen-4.8.1~pre.2017.01.23.orig/tools/xenstore/xs.c
-+++ xen-4.8.1~pre.2017.01.23/tools/xenstore/xs.c
+--- xen-4.8.1.orig/tools/xenstore/xs.c
++++ xen-4.8.1/tools/xenstore/xs.c
 @@ -281,17 +281,19 @@ struct xs_handle *xs_daemon_open_readonl
  
  struct xs_handle *xs_domain_open(void)
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-xenstore-prefix.diff xen-4.8.1/debian/patches/tools-xenstore-prefix.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-xenstore-prefix.diff	2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-xenstore-prefix.diff	2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
 From: Bastian Blank <waldi@debian.org>
 Date: Sat, 5 Jul 2014 11:47:12 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 cc61a6cd98de364b188ada3498d5090c84ecd079
+X-Dgit-Generated: 4.8.1-1 dda6e65fe8f36391534f781ebdf0bc9f9e58192a
 Subject: tools-xenstore-prefix.diff
 
 Patch-Name: tools-xenstore-prefix.diff
 
 ---
 
---- xen-4.8.1~pre.2017.01.23.orig/tools/helpers/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/helpers/Makefile
+--- xen-4.8.1.orig/tools/helpers/Makefile
++++ xen-4.8.1/tools/helpers/Makefile
 @@ -31,7 +31,7 @@ xen-init-dom0: $(XEN_INIT_DOM0_OBJS)
  $(INIT_XENSTORE_DOMAIN_OBJS): _paths.h
  
@@ -18,8 +18,8 @@
  
  .PHONY: install
  install: all
---- xen-4.8.1~pre.2017.01.23.orig/tools/xenstore/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/xenstore/Makefile
+--- xen-4.8.1.orig/tools/xenstore/Makefile
++++ xen-4.8.1/tools/xenstore/Makefile
 @@ -20,6 +20,8 @@ LDFLAGS-$(CONFIG_SYSTEMD) += $(SYSTEMD_L
  CFLAGS  += $(CFLAGS-y)
  LDFLAGS += $(LDFLAGS-y)
@@ -29,16 +29,16 @@
  CLIENTS := xenstore-exists xenstore-list xenstore-read xenstore-rm xenstore-chmod
  CLIENTS += xenstore-write xenstore-ls xenstore-watch
  
-@@ -73,7 +75,7 @@ endif
+@@ -74,7 +76,7 @@ endif
  $(XENSTORED_OBJS): CFLAGS += $(CFLAGS_libxengnttab)
  
  xenstored: $(XENSTORED_OBJS)
--	$(CC) $^ $(LDFLAGS) $(LDLIBS_libxenevtchn) $(LDLIBS_libxengnttab) $(LDLIBS_libxenctrl) $(SOCKET_LIBS) -o $@ $(APPEND_LDFLAGS)
-+	$(CC) $^ $(LDFLAGS) $(LDLIBS_libxenevtchn) $(LDLIBS_libxengnttab) $(LDLIBS_libxenctrl) $(SOCKET_LIBS) $(call LDFLAGS_RPATH,../lib) -o $@ $(APPEND_LDFLAGS)
+-	$(CC) $^ $(LDFLAGS) $(LDLIBS_libxenevtchn) $(LDLIBS_libxengnttab) $(LDLIBS_libxenctrl) $(LDLIBS_xenstored) $(SOCKET_LIBS) -o $@ $(APPEND_LDFLAGS)
++	$(CC) $^ $(LDFLAGS) $(LDLIBS_libxenevtchn) $(LDLIBS_libxengnttab) $(LDLIBS_libxenctrl) $(SOCKET_LIBS) $(LDLIBS_xenstored) $(call LDFLAGS_RPATH,../lib) -o $@ $(APPEND_LDFLAGS)
  
  xenstored.a: $(XENSTORED_OBJS)
  	$(AR) cr $@ $^
-@@ -126,13 +128,13 @@ tarball: clean
+@@ -127,13 +129,13 @@ tarball: clean
  install: all
  	$(INSTALL_DIR) $(DESTDIR)$(bindir)
  	$(INSTALL_DIR) $(DESTDIR)$(includedir)
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-xentrace-prefix.diff xen-4.8.1/debian/patches/tools-xentrace-prefix.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-xentrace-prefix.diff	2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-xentrace-prefix.diff	2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
 From: Bastian Blank <waldi@debian.org>
 Date: Sat, 5 Jul 2014 11:47:14 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 7ccc23bf0b827cab54f38ba86e3eed55eb149436
+X-Dgit-Generated: 4.8.1-1 bded2269fb168938a662711d0a632d9d644bfc30
 Subject: tools-xentrace-prefix.diff
 
 Patch-Name: tools-xentrace-prefix.diff
 
 ---
 
---- xen-4.8.1~pre.2017.01.23.orig/tools/xentrace/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/xentrace/Makefile
+--- xen-4.8.1.orig/tools/xentrace/Makefile
++++ xen-4.8.1/tools/xentrace/Makefile
 @@ -8,6 +8,7 @@ CFLAGS += $(CFLAGS_libxenctrl)
  LDLIBS += $(LDLIBS_libxenevtchn)
  LDLIBS += $(LDLIBS_libxenctrl)
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/toolstestsx86_emulator-pass--no-pie--fno xen-4.8.1/debian/patches/toolstestsx86_emulator-pass--no-pie--fno
--- xen-4.8.1~pre.2017.01.23/debian/patches/toolstestsx86_emulator-pass--no-pie--fno	2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/toolstestsx86_emulator-pass--no-pie--fno	2017-04-18 18:07:28.000000000 +0100
@@ -1,6 +1,6 @@
 From: Ian Jackson <ian.jackson@citrix.com>
 Date: Tue, 1 Nov 2016 16:20:27 +0000
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 21ae6b2deae99f127e27ea1590a3821159e4c53a
+X-Dgit-Generated: 4.8.1-1 0b669a48e4ac450fded811b1ea297d644044d179
 Subject: tools/tests/x86_emulator: Pass -no-pie -fno-pic to gcc on x86_32
 
 The current build fails with GCC6 on Debian sid i386 (unstable):
@@ -33,8 +33,8 @@
 
 ---
 
---- xen-4.8.1~pre.2017.01.23.orig/tools/tests/x86_emulator/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/tests/x86_emulator/Makefile
+--- xen-4.8.1.orig/tools/tests/x86_emulator/Makefile
++++ xen-4.8.1/tools/tests/x86_emulator/Makefile
 @@ -45,6 +45,10 @@ x86_emulate/x86_emulate.c x86_emulate/x8
  
  HOSTCFLAGS += $(CFLAGS_xeninclude)
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/ubuntu-tools-libs-abiname.diff xen-4.8.1/debian/patches/ubuntu-tools-libs-abiname.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/ubuntu-tools-libs-abiname.diff	2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/ubuntu-tools-libs-abiname.diff	2017-04-18 18:07:28.000000000 +0100
@@ -1,13 +1,13 @@
 From: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
 Date: Thu, 6 Oct 2016 14:24:46 +0100
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 1c23a037b6c6db944485b6b965660123d57edf05
+X-Dgit-Generated: 4.8.1-1 a80895b1222bf96c423953a78171ca38ee847a9f
 Subject: ubuntu-tools-libs-abiname
 
 
 ---
 
---- xen-4.8.1~pre.2017.01.23.orig/tools/libs/call/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/libs/call/Makefile
+--- xen-4.8.1.orig/tools/libs/call/Makefile
++++ xen-4.8.1/tools/libs/call/Makefile
 @@ -39,22 +39,22 @@ headers.chk: $(wildcard include/*.h)
  libxencall.a: $(LIB_OBJS)
  	$(AR) rc $@ $^
@@ -47,8 +47,8 @@
  	rm -f headers.chk
  
  .PHONY: distclean
---- xen-4.8.1~pre.2017.01.23.orig/tools/libs/evtchn/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/libs/evtchn/Makefile
+--- xen-4.8.1.orig/tools/libs/evtchn/Makefile
++++ xen-4.8.1/tools/libs/evtchn/Makefile
 @@ -39,22 +39,22 @@ headers.chk: $(wildcard include/*.h)
  libxenevtchn.a: $(LIB_OBJS)
  	$(AR) rc $@ $^
@@ -88,8 +88,8 @@
  	rm -f headers.chk
  
  .PHONY: distclean
---- xen-4.8.1~pre.2017.01.23.orig/tools/libs/foreignmemory/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/libs/foreignmemory/Makefile
+--- xen-4.8.1.orig/tools/libs/foreignmemory/Makefile
++++ xen-4.8.1/tools/libs/foreignmemory/Makefile
 @@ -39,22 +39,22 @@ headers.chk: $(wildcard include/*.h)
  libxenforeignmemory.a: $(LIB_OBJS)
  	$(AR) rc $@ $^
@@ -129,8 +129,8 @@
  	rm -f headers.chk
  
  .PHONY: distclean
---- xen-4.8.1~pre.2017.01.23.orig/tools/libs/gnttab/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/libs/gnttab/Makefile
+--- xen-4.8.1.orig/tools/libs/gnttab/Makefile
++++ xen-4.8.1/tools/libs/gnttab/Makefile
 @@ -41,22 +41,22 @@ headers.chk: $(wildcard include/*.h)
  libxengnttab.a: $(LIB_OBJS)
  	$(AR) rc $@ $^
@@ -170,8 +170,8 @@
  	rm -f headers.chk
  
  .PHONY: distclean
---- xen-4.8.1~pre.2017.01.23.orig/tools/libs/toollog/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/libs/toollog/Makefile
+--- xen-4.8.1.orig/tools/libs/toollog/Makefile
++++ xen-4.8.1/tools/libs/toollog/Makefile
 @@ -34,22 +34,22 @@ headers.chk: $(wildcard include/*.h)
  libxentoollog.a: $(LIB_OBJS)
  	$(AR) rc $@ $^
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/version.diff xen-4.8.1/debian/patches/version.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/version.diff	2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/version.diff	2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
 From: Bastian Blank <waldi@debian.org>
 Date: Sat, 5 Jul 2014 11:46:43 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 d4c74ba58fa9aa7bde7b1a0a61a9361ce6e55919
+X-Dgit-Generated: 4.8.1-1 adc50830f6c334569f54255310fc489d139d542f
 Subject: version
 
 Patch-Name: version.diff
 
 ---
 
---- xen-4.8.1~pre.2017.01.23.orig/xen/Makefile
-+++ xen-4.8.1~pre.2017.01.23/xen/Makefile
+--- xen-4.8.1.orig/xen/Makefile
++++ xen-4.8.1/xen/Makefile
 @@ -160,7 +160,7 @@ delete-unfresh-files:
  	@mv -f $@.tmp $@
  
@@ -32,8 +32,8 @@
  	@mv -f $@.new $@
  
  include/asm-$(TARGET_ARCH)/asm-offsets.h: arch/$(TARGET_ARCH)/asm-offsets.s
---- xen-4.8.1~pre.2017.01.23.orig/xen/common/kernel.c
-+++ xen-4.8.1~pre.2017.01.23/xen/common/kernel.c
+--- xen-4.8.1.orig/xen/common/kernel.c
++++ xen-4.8.1/xen/common/kernel.c
 @@ -252,8 +252,8 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDL
  
          memset(&info, 0, sizeof(info));
@@ -45,8 +45,8 @@
          safe_strcpy(info.compile_date,   deny ? xen_deny() : xen_compile_date());
          if ( copy_to_guest(arg, &info, 1) )
              return -EFAULT;
---- xen-4.8.1~pre.2017.01.23.orig/xen/common/version.c
-+++ xen-4.8.1~pre.2017.01.23/xen/common/version.c
+--- xen-4.8.1.orig/xen/common/version.c
++++ xen-4.8.1/xen/common/version.c
 @@ -20,19 +20,24 @@ const char *xen_compile_time(void)
      return XEN_COMPILE_TIME;
  }
@@ -90,8 +90,8 @@
  const char *xen_deny(void)
  {
      return "<denied>";
---- xen-4.8.1~pre.2017.01.23.orig/xen/drivers/char/console.c
-+++ xen-4.8.1~pre.2017.01.23/xen/drivers/char/console.c
+--- xen-4.8.1.orig/xen/drivers/char/console.c
++++ xen-4.8.1/xen/drivers/char/console.c
 @@ -732,14 +732,11 @@ void __init console_init_preirq(void)
      serial_set_rx_handler(sercon_handle, serial_rx);
  
@@ -110,8 +110,8 @@
  
      if ( opt_sync_console )
      {
---- xen-4.8.1~pre.2017.01.23.orig/xen/include/xen/compile.h.in
-+++ xen-4.8.1~pre.2017.01.23/xen/include/xen/compile.h.in
+--- xen-4.8.1.orig/xen/include/xen/compile.h.in
++++ xen-4.8.1/xen/include/xen/compile.h.in
 @@ -1,8 +1,9 @@
  #define XEN_COMPILE_DATE	"@@date@@"
  #define XEN_COMPILE_TIME	"@@time@@"
@@ -130,8 +130,8 @@
  
  #define XEN_CHANGESET		"@@changeset@@"
 -#define XEN_BANNER		\
---- xen-4.8.1~pre.2017.01.23.orig/xen/include/xen/version.h
-+++ xen-4.8.1~pre.2017.01.23/xen/include/xen/version.h
+--- xen-4.8.1.orig/xen/include/xen/version.h
++++ xen-4.8.1/xen/include/xen/version.h
 @@ -6,9 +6,10 @@
  
  const char *xen_compile_date(void);
diff -Nru xen-4.8.1~pre.2017.01.23/docs/misc/xen-command-line.markdown xen-4.8.1/docs/misc/xen-command-line.markdown
--- xen-4.8.1~pre.2017.01.23/docs/misc/xen-command-line.markdown	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/docs/misc/xen-command-line.markdown	2017-04-10 14:21:48.000000000 +0100
@@ -1619,6 +1619,21 @@
 As the virtualisation is not 100% safe, don't use the vpmu flag on
 production systems (see http://xenbits.xen.org/xsa/advisory-163.html)!
 
+### vwfi
+> `= trap | native
+
+> Default: `trap`
+
+WFI is the ARM instruction to "wait for interrupt". WFE is similar and
+means "wait for event". This option, which is ARM specific, changes the
+way guest WFI and WFE are implemented in Xen. By default, Xen traps both
+instructions. In the case of WFI, Xen blocks the guest vcpu; in the case
+of WFE, Xen yield the guest vcpu. When setting vwfi to `native`, Xen
+doesn't trap either instruction, running them in guest context. Setting
+vwfi to `native` reduces irq latency significantly. It can also lead to
+suboptimal scheduling decisions, but only when the system is
+oversubscribed (i.e., in total there are more vCPUs than pCPUs).
+
 ### watchdog
 > `= force | <boolean>`
 
diff -Nru xen-4.8.1~pre.2017.01.23/tools/libxc/include/xenctrl.h xen-4.8.1/tools/libxc/include/xenctrl.h
--- xen-4.8.1~pre.2017.01.23/tools/libxc/include/xenctrl.h	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/libxc/include/xenctrl.h	2017-04-10 14:21:48.000000000 +0100
@@ -2710,6 +2710,14 @@
 int xc_livepatch_unload(xc_interface *xch, char *name, uint32_t timeout);
 int xc_livepatch_replace(xc_interface *xch, char *name, uint32_t timeout);
 
+/*
+ * Ensure cache coherency after memory modifications. A call to this function
+ * is only required on ARM as the x86 architecture provides cache coherency
+ * guarantees. Calling this function on x86 is allowed but has no effect.
+ */
+int xc_domain_cacheflush(xc_interface *xch, uint32_t domid,
+                         xen_pfn_t start_pfn, xen_pfn_t nr_pfns);
+
 /* Compat shims */
 #include "xenctrl_compat.h"
 
diff -Nru xen-4.8.1~pre.2017.01.23/tools/libxc/xc_domain.c xen-4.8.1/tools/libxc/xc_domain.c
--- xen-4.8.1~pre.2017.01.23/tools/libxc/xc_domain.c	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/libxc/xc_domain.c	2017-04-10 14:21:48.000000000 +0100
@@ -74,10 +74,10 @@
     /*
      * The x86 architecture provides cache coherency guarantees which prevent
      * the need for this hypercall.  Avoid the overhead of making a hypercall
-     * just for Xen to return -ENOSYS.
+     * just for Xen to return -ENOSYS.  It is safe to ignore this call on x86
+     * so we just return 0.
      */
-    errno = ENOSYS;
-    return -1;
+    return 0;
 #else
     DECLARE_DOMCTL;
     domctl.cmd = XEN_DOMCTL_cacheflush;
diff -Nru xen-4.8.1~pre.2017.01.23/tools/libxc/xc_private.c xen-4.8.1/tools/libxc/xc_private.c
--- xen-4.8.1~pre.2017.01.23/tools/libxc/xc_private.c	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/libxc/xc_private.c	2017-04-10 14:21:48.000000000 +0100
@@ -64,8 +64,7 @@
         goto err;
 
     xch->fmem = xenforeignmemory_open(xch->error_handler, 0);
-
-    if ( xch->xcall == NULL )
+    if ( xch->fmem == NULL )
         goto err;
 
     return xch;
diff -Nru xen-4.8.1~pre.2017.01.23/tools/libxc/xc_private.h xen-4.8.1/tools/libxc/xc_private.h
--- xen-4.8.1~pre.2017.01.23/tools/libxc/xc_private.h	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/libxc/xc_private.h	2017-04-10 14:21:48.000000000 +0100
@@ -366,9 +366,6 @@
 /* Optionally flush file to disk and discard page cache */
 void discard_file_cache(xc_interface *xch, int fd, int flush);
 
-int xc_domain_cacheflush(xc_interface *xch, uint32_t domid,
-			 xen_pfn_t start_pfn, xen_pfn_t nr_pfns);
-
 #define MAX_MMU_UPDATES 1024
 struct xc_mmu {
     mmu_update_t updates[MAX_MMU_UPDATES];
diff -Nru xen-4.8.1~pre.2017.01.23/tools/libxl/libxl.c xen-4.8.1/tools/libxl/libxl.c
--- xen-4.8.1~pre.2017.01.23/tools/libxl/libxl.c	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/libxl/libxl.c	2017-04-10 14:21:48.000000000 +0100
@@ -2255,7 +2255,8 @@
             case LIBXL_DISK_BACKEND_QDISK:
                 flexarray_append(back, "params");
                 flexarray_append(back, GCSPRINTF("%s:%s",
-                              libxl__device_disk_string_of_format(disk->format), disk->pdev_path));
+                              libxl__device_disk_string_of_format(disk->format),
+                              disk->pdev_path ? : ""));
                 if (libxl_defbool_val(disk->colo_enable)) {
                     flexarray_append(back, "colo-host");
                     flexarray_append(back, libxl__sprintf(gc, "%s", disk->colo_host));
diff -Nru xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/Makefile xen-4.8.1/tools/ocaml/xenstored/Makefile
--- xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/Makefile	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/ocaml/xenstored/Makefile	2017-04-10 14:21:48.000000000 +0100
@@ -53,6 +53,7 @@
 	domains \
 	connection \
 	connections \
+	history \
 	parse_arg \
 	process \
 	xenstored
diff -Nru xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/connection.ml xen-4.8.1/tools/ocaml/xenstored/connection.ml
--- xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/connection.ml	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/ocaml/xenstored/connection.ml	2017-04-10 14:21:48.000000000 +0100
@@ -296,3 +296,8 @@
 	let domid = get_domstr con in
 	let watches = List.map (fun (path, token) -> Printf.sprintf "watch %s: %s %s\n" domid path token) (list_watches con) in
 	String.concat "" watches
+
+let decr_conflict_credit doms con =
+	match con.dom with
+	| None -> () (* It's a socket connection. We don't know which domain we're in, so treat it as if it's free to conflict *)
+	| Some dom -> Domains.decr_conflict_credit doms dom
diff -Nru xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/connections.ml xen-4.8.1/tools/ocaml/xenstored/connections.ml
--- xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/connections.ml	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/ocaml/xenstored/connections.ml	2017-04-10 14:21:48.000000000 +0100
@@ -44,12 +44,14 @@
 	| Some p -> Hashtbl.add cons.ports p con;
 	| None -> ()
 
-let select cons =
-	Hashtbl.fold
-		(fun _ con (ins, outs) ->
-		 let fd = Connection.get_fd con in
-		 (fd :: ins,  if Connection.has_output con then fd :: outs else outs))
-		cons.anonymous ([], [])
+let select ?(only_if = (fun _ -> true)) cons =
+	Hashtbl.fold (fun _ con (ins, outs) ->
+		if (only_if con) then (
+			let fd = Connection.get_fd con in
+			(fd :: ins,  if Connection.has_output con then fd :: outs else outs)
+		) else (ins, outs)
+	)
+	cons.anonymous ([], [])
 
 let find cons =
 	Hashtbl.find cons.anonymous
diff -Nru xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/define.ml xen-4.8.1/tools/ocaml/xenstored/define.ml
--- xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/define.ml	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/ocaml/xenstored/define.ml	2017-04-10 14:21:48.000000000 +0100
@@ -29,6 +29,10 @@
 let maxtransaction = ref (20)
 let maxrequests = ref (-1)   (* maximum requests per transaction *)
 
+let conflict_burst_limit = ref 5.0
+let conflict_max_history_seconds = ref 0.05
+let conflict_rate_limit_is_aggregate = ref true
+
 let domid_self = 0x7FF0
 
 exception Not_a_directory of string
diff -Nru xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/domain.ml xen-4.8.1/tools/ocaml/xenstored/domain.ml
--- xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/domain.ml	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/ocaml/xenstored/domain.ml	2017-04-10 14:21:48.000000000 +0100
@@ -31,8 +31,13 @@
 	mutable io_credit: int; (* the rounds of ring process left to do, default is 0,
 	                           usually set to 1 when there is work detected, could
 	                           also set to n to give "lazy" clients extra credit *)
+	mutable conflict_credit: float; (* Must be positive to perform writes; a commit
+	                                   that later causes conflict with another
+	                                   domain's transaction costs credit. *)
+	mutable caused_conflicts: int64;
 }
 
+let is_dom0 d = d.id = 0
 let get_path dom = "/local/domain/" ^ (sprintf "%u" dom.id)
 let get_id domain = domain.id
 let get_interface d = d.interface
@@ -48,6 +53,10 @@
 let incr_io_credit domain = domain.io_credit <- domain.io_credit + 1
 let decr_io_credit domain = domain.io_credit <- max 0 (domain.io_credit - 1)
 
+let is_paused_for_conflict dom = dom.conflict_credit <= 0.0
+
+let is_free_to_conflict = is_dom0
+
 let string_of_port = function
 | None -> "None"
 | Some x -> string_of_int (Xeneventchn.to_int x)
@@ -84,6 +93,12 @@
 	port = None;
 	bad_client = false;
 	io_credit = 0;
+	conflict_credit = !Define.conflict_burst_limit;
+	caused_conflicts = 0L;
 }
 
-let is_dom0 d = d.id = 0
+let log_and_reset_conflict_stats logfn dom =
+	if dom.caused_conflicts > 0L then (
+		logfn dom.id dom.caused_conflicts;
+		dom.caused_conflicts <- 0L
+	)
diff -Nru xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/domains.ml xen-4.8.1/tools/ocaml/xenstored/domains.ml
--- xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/domains.ml	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/ocaml/xenstored/domains.ml	2017-04-10 14:21:48.000000000 +0100
@@ -15,20 +15,77 @@
  *)
 
 let debug fmt = Logging.debug "domains" fmt
+let error fmt = Logging.error "domains" fmt
+let warn fmt  = Logging.warn  "domains" fmt
 
 type domains = {
 	eventchn: Event.t;
 	table: (Xenctrl.domid, Domain.t) Hashtbl.t;
+
+	(* N.B. the Queue module is not thread-safe but oxenstored is single-threaded. *)
+	(* Domains queue up to regain conflict-credit; we have a queue for
+	   domains that are carrying some penalty and so are below the
+	   maximum credit, and another queue for domains that have run out of
+	   credit and so have had their access paused. *)
+	doms_conflict_paused: (Domain.t option ref) Queue.t;
+	doms_with_conflict_penalty: (Domain.t option ref) Queue.t;
+
+	(* A callback function to be called when we go from zero to one paused domain.
+	   This will be to reset the countdown until the next unit of credit is issued. *)
+	on_first_conflict_pause: unit -> unit;
+
+	(* If config is set to use individual instead of aggregate conflict-rate-limiting,
+	   we use these counts instead of the queues. The second one includes the first. *)
+	mutable n_paused: int;    (* Number of domains with zero or negative credit *)
+	mutable n_penalised: int; (* Number of domains with less than maximum credit *)
 }
 
-let init eventchn =
-	{ eventchn = eventchn; table = Hashtbl.create 10 }
+let init eventchn on_first_conflict_pause = {
+	eventchn = eventchn;
+	table = Hashtbl.create 10;
+	doms_conflict_paused = Queue.create ();
+	doms_with_conflict_penalty = Queue.create ();
+	on_first_conflict_pause = on_first_conflict_pause;
+	n_paused = 0;
+	n_penalised = 0;
+}
 let del doms id = Hashtbl.remove doms.table id
 let exist doms id = Hashtbl.mem doms.table id
 let find doms id = Hashtbl.find doms.table id
 let number doms = Hashtbl.length doms.table
 let iter doms fct = Hashtbl.iter (fun _ b -> fct b) doms.table
 
+let rec is_empty_queue q =
+	Queue.is_empty q ||
+		if !(Queue.peek q) = None
+		then (
+			ignore (Queue.pop q);
+			is_empty_queue q
+		) else false
+
+let all_at_max_credit doms =
+	if !Define.conflict_rate_limit_is_aggregate
+	then
+		(* Check both becuase if burst limit is 1.0 then a domain can go straight
+		 * from max-credit to paused without getting into the penalty queue. *)
+		is_empty_queue doms.doms_with_conflict_penalty
+		&& is_empty_queue doms.doms_conflict_paused
+	else doms.n_penalised = 0
+
+(* Functions to handle queues of domains given that the domain might be deleted while in a queue. *)
+let push dom queue =
+	Queue.push (ref (Some dom)) queue
+
+let rec pop queue =
+	match !(Queue.pop queue) with
+	| None -> pop queue
+	| Some x -> x
+
+let remove_from_queue dom queue =
+	Queue.iter (fun d -> match !d with
+		| None -> ()
+		| Some x -> if x=dom then d := None) queue
+
 let cleanup xc doms =
 	let notify = ref false in
 	let dead_dom = ref [] in
@@ -52,6 +109,11 @@
 		let dom = Hashtbl.find doms.table id in
 		Domain.close dom;
 		Hashtbl.remove doms.table id;
+		if dom.Domain.conflict_credit <= !Define.conflict_burst_limit
+		then (
+			remove_from_queue dom doms.doms_with_conflict_penalty;
+			if (dom.Domain.conflict_credit <= 0.) then remove_from_queue dom doms.doms_conflict_paused
+		)
 	) !dead_dom;
 	!notify, !dead_dom
 
@@ -82,3 +144,74 @@
 	Domain.bind_interdomain dom;
 	Domain.notify dom;
 	dom
+
+let decr_conflict_credit doms dom =
+	dom.Domain.caused_conflicts <- Int64.add 1L dom.Domain.caused_conflicts;
+	let before = dom.Domain.conflict_credit in
+	let after = max (-1.0) (before -. 1.0) in
+	debug "decr_conflict_credit dom%d %F -> %F" (Domain.get_id dom) before after;
+	dom.Domain.conflict_credit <- after;
+	let newly_penalised =
+		before >= !Define.conflict_burst_limit
+		&& after < !Define.conflict_burst_limit in
+	let newly_paused = before > 0.0 && after <= 0.0 in
+	if !Define.conflict_rate_limit_is_aggregate then (
+		if newly_penalised
+		&& after > 0.0
+		then (
+			push dom doms.doms_with_conflict_penalty
+		) else if newly_paused
+		then (
+			let first_pause = Queue.is_empty doms.doms_conflict_paused in
+			push dom doms.doms_conflict_paused;
+			if first_pause then doms.on_first_conflict_pause ()
+		) else (
+			(* The queues are correct already: no further action needed. *)
+		)
+	) else (
+		if newly_penalised then doms.n_penalised <- doms.n_penalised + 1;
+		if newly_paused then (
+			doms.n_paused <- doms.n_paused + 1;
+			if doms.n_paused = 1 then doms.on_first_conflict_pause ()
+		)
+	)
+
+(* Give one point of credit to one domain, and update the queues appropriately. *)
+let incr_conflict_credit_from_queue doms =
+	let process_queue q requeue_test =
+		let d = pop q in
+		let before = d.Domain.conflict_credit in (* just for debug-logging *)
+		d.Domain.conflict_credit <- min (d.Domain.conflict_credit +. 1.0) !Define.conflict_burst_limit;
+		debug "incr_conflict_credit_from_queue: dom%d: %F -> %F" (Domain.get_id d) before d.Domain.conflict_credit;
+		if requeue_test d.Domain.conflict_credit then (
+			push d q (* Make it queue up again for its next point of credit. *)
+		)
+	in
+	let paused_queue_test cred = cred <= 0.0 in
+	let penalty_queue_test cred = cred < !Define.conflict_burst_limit in
+	try process_queue doms.doms_conflict_paused paused_queue_test
+	with Queue.Empty -> (
+		try process_queue doms.doms_with_conflict_penalty penalty_queue_test
+		with Queue.Empty -> () (* Both queues are empty: nothing to do here. *)
+	)
+
+let incr_conflict_credit doms =
+	if !Define.conflict_rate_limit_is_aggregate
+	then incr_conflict_credit_from_queue doms
+	else (
+		(* Give a point of credit to every domain, subject only to the cap. *)
+		let inc dom =
+			let before = dom.Domain.conflict_credit in
+			let after = min (before +. 1.0) !Define.conflict_burst_limit in
+			dom.Domain.conflict_credit <- after;
+			debug "incr_conflict_credit dom%d: %F -> %F" (Domain.get_id dom) before after;
+
+			if before <= 0.0 && after > 0.0
+			then doms.n_paused <- doms.n_paused - 1;
+
+			if before < !Define.conflict_burst_limit
+			&& after >= !Define.conflict_burst_limit
+			then doms.n_penalised <- doms.n_penalised - 1
+		in
+		if doms.n_penalised > 0 then iter doms inc
+	)
diff -Nru xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/history.ml xen-4.8.1/tools/ocaml/xenstored/history.ml
--- xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/history.ml	1970-01-01 01:00:00.000000000 +0100
+++ xen-4.8.1/tools/ocaml/xenstored/history.ml	2017-04-10 14:21:48.000000000 +0100
@@ -0,0 +1,73 @@
+(*
+ * Copyright (c) 2017 Citrix Systems Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ *)
+
+type history_record = {
+	con: Connection.t;   (* connection that made a change *)
+	tid: int;            (* transaction id of the change (may be Transaction.none) *)
+	before: Store.t;     (* the store before the change *)
+	after: Store.t;      (* the store after the change *)
+	finish_count: int64; (* the commit-count at which the transaction finished *)
+}
+
+let history : history_record list ref = ref []
+
+(* Called from periodic_ops to ensure we don't discard symbols that are still needed. *)
+(* There is scope for optimisation here, since in consecutive commits one commit's `after`
+ * is the same thing as the next commit's `before`, but not all commits in history are
+ * consecutive. *)
+let mark_symbols () =
+	(* There are gaps where dom0's commits are missing. Otherwise we could assume that
+	 * each element's `before` is the same thing as the next element's `after`
+	 * since the next element is the previous commit *)
+	List.iter (fun hist_rec ->
+			Store.mark_symbols hist_rec.before;
+			Store.mark_symbols hist_rec.after;
+		)
+		!history
+
+(* Keep only enough commit-history to protect the running transactions that we are still tracking *)
+(* There is scope for optimisation here, replacing List.filter with something more efficient,
+ * probably on a different list-like structure. *)
+let trim ?txn () =
+	Transaction.trim_short_running_transactions txn;
+	history := match Transaction.oldest_short_running_transaction () with
+	| None -> [] (* We have no open transaction, so no history is needed *)
+	| Some (_, txn) -> (
+		(* keep records with finish_count recent enough to be relevant *)
+		List.filter (fun r -> r.finish_count > txn.Transaction.start_count) !history
+	)
+
+let end_transaction txn con tid commit =
+	let success = Connection.end_transaction con tid commit in
+	trim ~txn ();
+	success
+
+let push (x: history_record) =
+	let dom = x.con.Connection.dom in
+	match dom with
+	| None -> () (* treat socket connections as always free to conflict *)
+	| Some d -> if not (Domain.is_free_to_conflict d) then history := x :: !history
+
+(* Find the connections from records since commit-count [since] for which [f record] returns [true] *)
+let filter_connections ~ignore ~since ~f =
+	(* The "mem" call is an optimisation, to avoid calling f if we have picked con already. *)
+	(* Using a hash table rather than a list is to optimise the "mem" call. *)
+	List.fold_left (fun acc hist_rec ->
+		if hist_rec.finish_count > since
+		&& not (hist_rec.con == ignore)
+		&& not (Hashtbl.mem acc hist_rec.con)
+		&& f hist_rec
+		then Hashtbl.replace acc hist_rec.con ();
+		acc
+	) (Hashtbl.create 1023) !history
diff -Nru xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/oxenstored.conf.in xen-4.8.1/tools/ocaml/xenstored/oxenstored.conf.in
--- xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/oxenstored.conf.in	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/ocaml/xenstored/oxenstored.conf.in	2017-04-10 14:21:48.000000000 +0100
@@ -9,6 +9,38 @@
 # Activate transaction merge support
 merge-activate = true
 
+# Limits applied to domains whose writes cause other domains' transaction
+# commits to fail. Must include decimal point.
+
+# The burst limit is the number of conflicts a domain can cause to
+# fail in a short period; this value is used for both the initial and
+# the maximum value of each domain's conflict-credit, which falls by
+# one point for each conflict caused, and when it reaches zero the
+# domain's requests are ignored.
+conflict-burst-limit = 5.0
+
+# The conflict-credit is replenished over time:
+# one point is issued after each conflict-max-history-seconds, so this
+# is the minimum pause-time during which a domain will be ignored.
+conflict-max-history-seconds = 0.05
+
+# If the conflict-rate-limit-is-aggregate flag is true then after each
+# tick one point of conflict-credit is given to just one domain: the
+# one at the front of the queue. If false, then after each tick each
+# domain gets a point of conflict-credit.
+# 
+# In environments where it is known that every transaction will
+# involve a set of nodes that is writable by at most one other domain,
+# then it is safe to set this aggregate-limit flag to false for better
+# performance. (This can be determined by considering the layout of
+# the xenstore tree and permissions, together with the content of the
+# transactions that require protection.)
+# 
+# A transaction which involves a set of nodes which can be modified by
+# multiple other domains can suffer conflicts caused by any of those
+# domains, so the flag must be set to true.
+conflict-rate-limit-is-aggregate = true
+
 # Activate node permission system
 perms-activate = true
 
diff -Nru xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/process.ml xen-4.8.1/tools/ocaml/xenstored/process.ml
--- xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/process.ml	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/ocaml/xenstored/process.ml	2017-04-10 14:21:48.000000000 +0100
@@ -16,6 +16,7 @@
 
 let error fmt = Logging.error "process" fmt
 let info fmt = Logging.info "process" fmt
+let debug fmt = Logging.debug "process" fmt
 
 open Printf
 open Stdext
@@ -25,6 +26,7 @@
 exception Domain_not_match
 exception Invalid_Cmd_Args
 
+(* This controls the do_debug fn in this module, not the debug logging-function. *)
 let allow_debug = ref false
 
 let c_int_of_string s =
@@ -293,6 +295,11 @@
 	| Packet.Reply x -> write_answer_log ~ty ~tid ~con ~data:x
 	| Packet.Error e -> write_answer_log ~ty:(Xenbus.Xb.Op.Error) ~tid ~con ~data:e
 
+let record_commit ~con ~tid ~before ~after =
+	let inc r = r := Int64.add 1L !r in
+	let finish_count = inc Transaction.counter; !Transaction.counter in
+	History.push {History.con=con; tid=tid; before=before; after=after; finish_count=finish_count}
+
 (* Replay a stored transaction against a fresh store, check the responses are
    all equivalent: if so, commit the transaction. Otherwise send the abort to
    the client. *)
@@ -301,25 +308,57 @@
 	| Transaction.No ->
 		error "attempted to replay a non-full transaction";
 		false
-	| Transaction.Full(id, oldroot, cstore) ->
+	| Transaction.Full(id, oldstore, cstore) ->
 		let tid = Connection.start_transaction c cstore in
-		let new_t = Transaction.make tid cstore in
+		let replay_t = Transaction.make ~internal:true tid cstore in
 		let con = sprintf "r(%d):%s" id (Connection.get_domstr c) in
-		let perform_exn (request, response) =
-			write_access_log ~ty:request.Packet.ty ~tid ~con ~data:request.Packet.data;
+
+		let perform_exn ~wlog txn (request, response) =
+			if wlog then write_access_log ~ty:request.Packet.ty ~tid ~con ~data:request.Packet.data;
 			let fct = function_of_type_simple_op request.Packet.ty in
-			let response' = input_handle_error ~cons ~doms ~fct ~con:c ~t:new_t ~req:request in
-			write_response_log ~ty:request.Packet.ty ~tid ~con ~response:response';
-			if not(Packet.response_equal response response') then raise Transaction_again in
+			let response' = input_handle_error ~cons ~doms ~fct ~con:c ~t:txn ~req:request in
+			if wlog then write_response_log ~ty:request.Packet.ty ~tid ~con ~response:response';
+			if not(Packet.response_equal response response') then raise Transaction_again
+		in
 		finally
 		(fun () ->
 			try
 				Logging.start_transaction ~con ~tid;
-				List.iter perform_exn (Transaction.get_operations t);
-				Logging.end_transaction ~con ~tid;
+				List.iter (perform_exn ~wlog:true replay_t) (Transaction.get_operations t); (* May throw EAGAIN *)
 
-				Transaction.commit ~con new_t
-			with e ->
+				Logging.end_transaction ~con ~tid;
+				Transaction.commit ~con replay_t
+			with
+			| Transaction_again -> (
+				Transaction.failed_commits := Int64.add !Transaction.failed_commits 1L;
+				let victim_domstr = Connection.get_domstr c in
+				debug "Apportioning blame for EAGAIN in txn %d, domain=%s" id victim_domstr;
+				let punish guilty_con =
+					debug "Blaming domain %s for conflict with domain %s txn %d"
+						(Connection.get_domstr guilty_con) victim_domstr id;
+					Connection.decr_conflict_credit doms guilty_con
+				in
+				let judge_and_sentence hist_rec = (
+					let can_apply_on store = (
+						let store = Store.copy store in
+						let trial_t = Transaction.make ~internal:true Transaction.none store in
+						try List.iter (perform_exn ~wlog:false trial_t) (Transaction.get_operations t);
+							true
+						with Transaction_again -> false
+					) in
+					if can_apply_on hist_rec.History.before
+					&& not (can_apply_on hist_rec.History.after)
+					then (punish hist_rec.History.con; true)
+					else false
+				) in
+				let guilty_cons = History.filter_connections ~ignore:c ~since:t.Transaction.start_count ~f:judge_and_sentence in
+				if Hashtbl.length guilty_cons = 0 then (
+					debug "Found no culprit for conflict in %s: must be self or not in history." con;
+					Transaction.failed_commits_no_culprit := Int64.add !Transaction.failed_commits_no_culprit 1L
+				);
+				false
+			)
+			| e ->
 				info "transaction_replay %d caught: %s" tid (Printexc.to_string e);
 				false
 			)
@@ -358,13 +397,20 @@
 		| x :: _   -> raise (Invalid_argument x)
 		| _        -> raise Invalid_Cmd_Args
 		in
+	let commit = commit && not (Transaction.is_read_only t) in
 	let success =
 		let commit = if commit then Some (fun con trans -> transaction_replay con trans domains cons) else None in
-		Connection.end_transaction con (Transaction.get_id t) commit in
+		History.end_transaction t con (Transaction.get_id t) commit in
 	if not success then
 		raise Transaction_again;
-	if commit then
-		process_watch (List.rev (Transaction.get_paths t)) cons
+	if commit then begin
+		process_watch (List.rev (Transaction.get_paths t)) cons;
+		match t.Transaction.ty with
+		| Transaction.No ->
+			() (* no need to record anything *)
+		| Transaction.Full(id, oldstore, cstore) ->
+			record_commit ~con ~tid:id ~before:oldstore ~after:cstore
+	end
 
 let do_introduce con t domains cons data =
 	if not (Connection.is_dom0 con)
@@ -434,6 +480,37 @@
 	| _                              -> function_of_type_simple_op ty
 
 (**
+ * Determines which individual (non-transactional) operations we want to retain.
+ * We only want to retain operations that have side-effects in the store since
+ * these can be the cause of transactions failing.
+ *)
+let retain_op_in_history ty =
+	match ty with
+	| Xenbus.Xb.Op.Write
+	| Xenbus.Xb.Op.Mkdir
+	| Xenbus.Xb.Op.Rm
+	| Xenbus.Xb.Op.Setperms          -> true
+	| Xenbus.Xb.Op.Debug
+	| Xenbus.Xb.Op.Directory
+	| Xenbus.Xb.Op.Read
+	| Xenbus.Xb.Op.Getperms
+	| Xenbus.Xb.Op.Watch
+	| Xenbus.Xb.Op.Unwatch
+	| Xenbus.Xb.Op.Transaction_start
+	| Xenbus.Xb.Op.Transaction_end
+	| Xenbus.Xb.Op.Introduce
+	| Xenbus.Xb.Op.Release
+	| Xenbus.Xb.Op.Getdomainpath
+	| Xenbus.Xb.Op.Watchevent
+	| Xenbus.Xb.Op.Error
+	| Xenbus.Xb.Op.Isintroduced
+	| Xenbus.Xb.Op.Resume
+	| Xenbus.Xb.Op.Set_target
+	| Xenbus.Xb.Op.Restrict
+	| Xenbus.Xb.Op.Reset_watches
+	| Xenbus.Xb.Op.Invalid           -> false
+
+(**
  * Nothrow guarantee.
  *)
 let process_packet ~store ~cons ~doms ~con ~req =
@@ -448,7 +525,19 @@
 			else
 				Connection.get_transaction con tid
 			in
-		let response = input_handle_error ~cons ~doms ~fct ~con ~t ~req in
+
+		let execute () = input_handle_error ~cons ~doms ~fct ~con ~t ~req in
+
+		let response =
+			(* Note that transactions are recorded in history separately. *)
+			if tid = Transaction.none && retain_op_in_history ty then begin
+				let before = Store.copy store in
+				let response = execute () in
+				let after = Store.copy store in
+				record_commit ~con ~tid ~before ~after;
+				response
+			end else execute ()
+		in
 
 		let response = try
 			if tid <> Transaction.none then
diff -Nru xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/store.ml xen-4.8.1/tools/ocaml/xenstored/store.ml
--- xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/store.ml	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/ocaml/xenstored/store.ml	2017-04-10 14:21:48.000000000 +0100
@@ -211,6 +211,7 @@
 	lookup rnode path fct
 end
 
+(* The Store.t type *)
 type t =
 {
 	mutable stat_transaction_coalesce: int;
diff -Nru xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/transaction.ml xen-4.8.1/tools/ocaml/xenstored/transaction.ml
--- xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/transaction.ml	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/ocaml/xenstored/transaction.ml	2017-04-10 14:21:48.000000000 +0100
@@ -14,6 +14,8 @@
  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  * GNU Lesser General Public License for more details.
  *)
+let error fmt = Logging.error "transaction" fmt
+
 open Stdext
 
 let none = 0
@@ -69,34 +71,73 @@
 	else
 		false
 
-type ty = No | Full of (int * Store.Node.t * Store.t)
+type ty = No | Full of (
+	int *          (* Transaction id *)
+	Store.t *      (* Original store *)
+	Store.t        (* A pointer to the canonical store: its root changes on each transaction-commit *)
+)
 
 type t = {
 	ty: ty;
-	store: Store.t;
+	start_count: int64;
+	store: Store.t; (* This is the store that we change in write operations. *)
 	quota: Quota.t;
 	mutable paths: (Xenbus.Xb.Op.operation * Store.Path.t) list;
 	mutable operations: (Packet.request * Packet.response) list;
 	mutable read_lowpath: Store.Path.t option;
 	mutable write_lowpath: Store.Path.t option;
 }
+let get_id t = match t.ty with No -> none | Full (id, _, _) -> id
 
-let make id store =
-	let ty = if id = none then No else Full(id, Store.get_root store, store) in
-	{
+let counter = ref 0L
+let failed_commits = ref 0L
+let failed_commits_no_culprit = ref 0L
+let reset_conflict_stats () =
+	failed_commits := 0L;
+	failed_commits_no_culprit := 0L
+
+(* Scope for optimisation: different data-structure and functions to search/filter it *)
+let short_running_txns = ref []
+
+let oldest_short_running_transaction () =
+	let rec last = function
+		| [] -> None
+		| [x] -> Some x
+		| x :: xs -> last xs
+	in last !short_running_txns
+
+let trim_short_running_transactions txn =
+	let cutoff = Unix.gettimeofday () -. !Define.conflict_max_history_seconds in
+	let keep = match txn with
+		| None -> (function (start_time, _) -> start_time >= cutoff)
+		| Some t -> (function (start_time, tx) -> start_time >= cutoff && tx != t)
+	in
+	short_running_txns := List.filter
+		keep
+		!short_running_txns
+
+let make ?(internal=false) id store =
+	let ty = if id = none then No else Full(id, Store.copy store, store) in
+	let txn = {
 		ty = ty;
+		start_count = !counter;
 		store = if id = none then store else Store.copy store;
 		quota = Quota.copy store.Store.quota;
 		paths = [];
 		operations = [];
 		read_lowpath = None;
 		write_lowpath = None;
-	}
+	} in
+	if id <> none && not internal then (
+		let now = Unix.gettimeofday () in
+		short_running_txns := (now, txn) :: !short_running_txns
+	);
+	txn
 
-let get_id t = match t.ty with No -> none | Full (id, _, _) -> id
 let get_store t = t.store
 let get_paths t = t.paths
 
+let is_read_only t = t.paths = []
 let add_wop t ty path = t.paths <- (ty, path) :: t.paths
 let add_operation ~perm t request response =
 	if !Define.maxrequests >= 0
@@ -155,7 +196,7 @@
 	let has_commited =
 	match t.ty with
 	| No                         -> true
-	| Full (id, oldroot, cstore) ->
+	| Full (id, oldstore, cstore) ->       (* "cstore" meaning current canonical store *)
 		let commit_partial oldroot cstore store =
 			(* get the lowest path of the query and verify that it hasn't
 			   been modified by others transactions. *)
@@ -198,7 +239,7 @@
 		if !test_eagain && Random.int 3 = 0 then
 			false
 		else
-			try_commit oldroot cstore t.store
+			try_commit (Store.get_root oldstore) cstore t.store
 		in
 	if has_commited && has_write_ops then
 		Disk.write t.store;
diff -Nru xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/xenstored.ml xen-4.8.1/tools/ocaml/xenstored/xenstored.ml
--- xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/xenstored.ml	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/ocaml/xenstored/xenstored.ml	2017-04-10 14:21:48.000000000 +0100
@@ -53,14 +53,16 @@
 
 let process_domains store cons domains =
 	let do_io_domain domain =
-		if not (Domain.is_bad_domain domain) then
-			let io_credit = Domain.get_io_credit domain in
-			if io_credit > 0 then (
-				let con = Connections.find_domain cons (Domain.get_id domain) in
-				Process.do_input store cons domains con;
-				Process.do_output store cons domains con;
-				Domain.decr_io_credit domain;
-			) in
+		if Domain.is_bad_domain domain
+		|| Domain.get_io_credit domain <= 0
+		|| Domain.is_paused_for_conflict domain
+		then () (* nothing to do *)
+		else (
+			let con = Connections.find_domain cons (Domain.get_id domain) in
+			Process.do_input store cons domains con;
+			Process.do_output store cons domains con;
+			Domain.decr_io_credit domain
+		) in
 	Domains.iter domains do_io_domain
 
 let sigusr1_handler store =
@@ -89,6 +91,9 @@
 	let pidfile = ref default_pidfile in
 	let options = [
 		("merge-activate", Config.Set_bool Transaction.do_coalesce);
+		("conflict-burst-limit", Config.Set_float Define.conflict_burst_limit);
+		("conflict-max-history-seconds", Config.Set_float Define.conflict_max_history_seconds);
+		("conflict-rate-limit-is-aggregate", Config.Set_bool Define.conflict_rate_limit_is_aggregate);
 		("perms-activate", Config.Set_bool Perms.activate);
 		("quota-activate", Config.Set_bool Quota.activate);
 		("quota-maxwatch", Config.Set_int Define.maxwatch);
@@ -260,7 +265,23 @@
 
 	let store = Store.create () in
 	let eventchn = Event.init () in
-	let domains = Domains.init eventchn in
+	let next_frequent_ops = ref 0. in
+	let advance_next_frequent_ops () =
+		next_frequent_ops := (Unix.gettimeofday () +. !Define.conflict_max_history_seconds)
+	in
+	let delay_next_frequent_ops_by duration =
+		next_frequent_ops := !next_frequent_ops +. duration
+	in
+	let domains = Domains.init eventchn advance_next_frequent_ops in
+
+	(* For things that need to be done periodically but more often
+	 * than the periodic_ops function *)
+	let frequent_ops () =
+		if Unix.gettimeofday () > !next_frequent_ops then (
+			History.trim ();
+			Domains.incr_conflict_credit domains;
+			advance_next_frequent_ops ()
+		) in
 	let cons = Connections.create () in
 
 	let quit = ref false in
@@ -356,6 +377,7 @@
 	let last_scan_time = ref 0. in
 
 	let periodic_ops now =
+		debug "periodic_ops starting";
 		(* we garbage collect the string->int dictionary after a sizeable amount of operations,
 		 * there's no need to be really fast even if we got loose
 		 * objects since names are often reuse.
@@ -365,6 +387,7 @@
 			Symbol.mark_all_as_unused ();
 			Store.mark_symbols store;
 			Connections.iter cons Connection.mark_symbols;
+			History.mark_symbols ();
 			Symbol.garbage ()
 		end;
 
@@ -374,7 +397,11 @@
 
 		(* make sure we don't print general stats faster than 2 min *)
 		if now > (!last_stat_time +. 120.) then (
+			info "Transaction conflict statistics for last %F seconds:" (now -. !last_stat_time);
 			last_stat_time := now;
+			Domains.iter domains (Domain.log_and_reset_conflict_stats (info "Dom%d caused %Ld conflicts"));
+			info "%Ld failed transactions; of these no culprit was found for %Ld" !Transaction.failed_commits !Transaction.failed_commits_no_culprit;
+			Transaction.reset_conflict_stats ();
 
 			let gc = Gc.stat () in
 			let (lanon, lanon_ops, lanon_watchs,
@@ -392,23 +419,38 @@
 			     gc.Gc.heap_words gc.Gc.heap_chunks
 			     gc.Gc.live_words gc.Gc.live_blocks
 			     gc.Gc.free_words gc.Gc.free_blocks
-		)
-		in
+		);
+		let elapsed = Unix.gettimeofday () -. now in
+		debug "periodic_ops took %F seconds." elapsed;
+		delay_next_frequent_ops_by elapsed
+	in
 
-		let period_ops_interval = 15. in
-		let period_start = ref 0. in
+	let period_ops_interval = 15. in
+	let period_start = ref 0. in
 
 	let main_loop () =
-
+		let is_peaceful c =
+			match Connection.get_domain c with
+			| None -> true (* Treat socket-connections as exempt, and free to conflict. *)
+			| Some dom -> not (Domain.is_paused_for_conflict dom)
+		in
+		frequent_ops ();
 		let mw = Connections.has_more_work cons in
+		let peaceful_mw = List.filter is_peaceful mw in
 		List.iter
 			(fun c ->
 			 match Connection.get_domain c with
 			 | None -> () | Some d -> Domain.incr_io_credit d)
-			mw;
+			peaceful_mw;
+		let start_time = Unix.gettimeofday () in
 		let timeout =
-			if List.length mw > 0 then 0. else period_ops_interval in
-		let inset, outset = Connections.select cons in
+			let until_next_activity =
+				if Domains.all_at_max_credit domains
+				then period_ops_interval
+				else min (max 0. (!next_frequent_ops -. start_time)) period_ops_interval in
+			if peaceful_mw <> [] then 0. else until_next_activity
+		in
+		let inset, outset = Connections.select ~only_if:is_peaceful cons in
 		let rset, wset, _ =
 		try
 			Select.select (spec_fds @ inset) outset [] timeout
@@ -418,6 +460,7 @@
 			List.partition (fun fd -> List.mem fd spec_fds) rset in
 		if List.length sfds > 0 then
 			process_special_fds sfds;
+
 		if List.length cfds > 0 || List.length wset > 0 then
 			process_connection_fds store cons domains cfds wset;
 		if timeout <> 0. then (
@@ -425,6 +468,7 @@
 			if now > !period_start +. period_ops_interval then
 				(period_start := now; periodic_ops now)
 		);
+
 		process_domains store cons domains
 		in
 
diff -Nru xen-4.8.1~pre.2017.01.23/tools/tests/x86_emulator/test_x86_emulator.c xen-4.8.1/tools/tests/x86_emulator/test_x86_emulator.c
--- xen-4.8.1~pre.2017.01.23/tools/tests/x86_emulator/test_x86_emulator.c	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/tests/x86_emulator/test_x86_emulator.c	2017-04-10 14:21:48.000000000 +0100
@@ -163,6 +163,18 @@
     (ebx & (1U << 5)) != 0; \
 })
 
+static int read_segment(
+    enum x86_segment seg,
+    struct segment_register *reg,
+    struct x86_emulate_ctxt *ctxt)
+{
+    if ( !is_x86_user_segment(seg) )
+        return X86EMUL_UNHANDLEABLE;
+    memset(reg, 0, sizeof(*reg));
+    reg->attr.fields.p = 1;
+    return X86EMUL_OKAY;
+}
+
 static int read_cr(
     unsigned int reg,
     unsigned long *val,
@@ -215,6 +227,7 @@
     .write      = write,
     .cmpxchg    = cmpxchg,
     .cpuid      = cpuid,
+    .read_segment = read_segment,
     .read_cr    = read_cr,
     .get_fpu    = get_fpu,
 };
@@ -732,6 +745,27 @@
         goto fail;
     printf("okay\n");
 
+    printf("%-40s", "Testing mov %%cr4,%%esi (bad ModRM)...");
+    /*
+     * Mod = 1, Reg = 4, R/M = 6 would normally encode a memory reference of
+     * disp8(%esi), but mov to/from cr/dr are special and behave as if they
+     * were encoded with Mod == 3.
+     */
+    instr[0] = 0x0f; instr[1] = 0x20, instr[2] = 0x66;
+    instr[3] = 0; /* Supposed disp8. */
+    regs.esi = 0;
+    regs.eip = (unsigned long)&instr[0];
+    rc = x86_emulate(&ctxt, &emulops);
+    /*
+     * We don't care precicely what gets read from %cr4 into %esi, just so
+     * long as ModRM is treated as a register operand and 0(%esi) isn't
+     * followed as a memory reference.
+     */
+    if ( (rc != X86EMUL_OKAY) ||
+         (regs.eip != (unsigned long)&instr[3]) )
+        goto fail;
+    printf("okay\n");
+
 #define decl_insn(which) extern const unsigned char which[], which##_len[]
 #define put_insn(which, insn) ".pushsection .test, \"ax\", @progbits\n" \
                               #which ": " insn "\n"                     \
diff -Nru xen-4.8.1~pre.2017.01.23/tools/xenstore/Makefile xen-4.8.1/tools/xenstore/Makefile
--- xen-4.8.1~pre.2017.01.23/tools/xenstore/Makefile	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/xenstore/Makefile	2017-04-10 14:21:48.000000000 +0100
@@ -32,6 +32,7 @@
 XENSTORED_OBJS_$(CONFIG_MiniOS) = xenstored_minios.o
 
 XENSTORED_OBJS += $(XENSTORED_OBJS_y)
+LDLIBS_xenstored += -lrt
 
 ifneq ($(XENSTORE_STATIC_CLIENTS),y)
 LIBXENSTORE := libxenstore.so
@@ -73,7 +74,7 @@
 $(XENSTORED_OBJS): CFLAGS += $(CFLAGS_libxengnttab)
 
 xenstored: $(XENSTORED_OBJS)
-	$(CC) $^ $(LDFLAGS) $(LDLIBS_libxenevtchn) $(LDLIBS_libxengnttab) $(LDLIBS_libxenctrl) $(SOCKET_LIBS) -o $@ $(APPEND_LDFLAGS)
+	$(CC) $^ $(LDFLAGS) $(LDLIBS_libxenevtchn) $(LDLIBS_libxengnttab) $(LDLIBS_libxenctrl) $(LDLIBS_xenstored) $(SOCKET_LIBS) -o $@ $(APPEND_LDFLAGS)
 
 xenstored.a: $(XENSTORED_OBJS)
 	$(AR) cr $@ $^
diff -Nru xen-4.8.1~pre.2017.01.23/tools/xenstore/xenstored_core.c xen-4.8.1/tools/xenstore/xenstored_core.c
--- xen-4.8.1~pre.2017.01.23/tools/xenstore/xenstored_core.c	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/xenstore/xenstored_core.c	2017-04-10 14:21:48.000000000 +0100
@@ -358,6 +358,7 @@
 			   int *ptimeout)
 {
 	struct connection *conn;
+	struct wrl_timestampt now;
 
 	if (fds)
 		memset(fds, 0, sizeof(struct pollfd) * current_array_size);
@@ -377,8 +378,12 @@
 		xce_pollfd_idx = set_fd(xenevtchn_fd(xce_handle),
 					POLLIN|POLLPRI);
 
+	wrl_gettime_now(&now);
+	wrl_log_periodic(now);
+
 	list_for_each_entry(conn, &connections, list) {
 		if (conn->domain) {
+			wrl_check_timeout(conn->domain, now, ptimeout);
 			if (domain_can_read(conn) ||
 			    (domain_can_write(conn) &&
 			     !list_empty(&conn->out_list)))
@@ -833,6 +838,7 @@
 		corrupt(conn, "Could not delete '%s'", node->name);
 		return;
 	}
+
 	domain_entry_dec(conn, node);
 }
 
@@ -972,6 +978,7 @@
 	}
 
 	add_change_node(conn->transaction, name, false);
+	wrl_apply_debit_direct(conn);
 	fire_watches(conn, in, name, false);
 	send_ack(conn, XS_WRITE);
 }
@@ -1003,6 +1010,7 @@
 			return;
 		}
 		add_change_node(conn->transaction, name, false);
+		wrl_apply_debit_direct(conn);
 		fire_watches(conn, in, name, false);
 	}
 	send_ack(conn, XS_MKDIR);
@@ -1129,6 +1137,7 @@
 
 	if (_rm(conn, node, name)) {
 		add_change_node(conn->transaction, name, true);
+		wrl_apply_debit_direct(conn);
 		fire_watches(conn, in, name, true);
 		send_ack(conn, XS_RM);
 	}
@@ -1205,6 +1214,7 @@
 	}
 
 	add_change_node(conn->transaction, name, false);
+	wrl_apply_debit_direct(conn);
 	fire_watches(conn, in, name, false);
 	send_ack(conn, XS_SET_PERMS);
 }
diff -Nru xen-4.8.1~pre.2017.01.23/tools/xenstore/xenstored_core.h xen-4.8.1/tools/xenstore/xenstored_core.h
--- xen-4.8.1~pre.2017.01.23/tools/xenstore/xenstored_core.h	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/xenstore/xenstored_core.h	2017-04-10 14:21:48.000000000 +0100
@@ -33,6 +33,12 @@
 #include "list.h"
 #include "tdb.h"
 
+#define MIN(a, b) (((a) < (b))? (a) : (b))
+
+typedef int32_t wrl_creditt;
+#define WRL_CREDIT_MAX (1000*1000*1000)
+/* ^ satisfies non-overflow condition for wrl_xfer_credit */
+
 struct buffered_data
 {
 	struct list_head list;
diff -Nru xen-4.8.1~pre.2017.01.23/tools/xenstore/xenstored_domain.c xen-4.8.1/tools/xenstore/xenstored_domain.c
--- xen-4.8.1~pre.2017.01.23/tools/xenstore/xenstored_domain.c	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/xenstore/xenstored_domain.c	2017-04-10 14:21:48.000000000 +0100
@@ -21,6 +21,8 @@
 #include <unistd.h>
 #include <stdlib.h>
 #include <stdarg.h>
+#include <time.h>
+#include <syslog.h>
 
 #include "utils.h"
 #include "talloc.h"
@@ -74,6 +76,11 @@
 
 	/* number of watch for this domain */
 	int nbwatch;
+
+	/* write rate limit */
+	wrl_creditt wrl_credit; /* [ -wrl_config_writecost, +_dburst ] */
+	struct wrl_timestampt wrl_timestamp;
+	bool wrl_delay_logged;
 };
 
 static LIST_HEAD(domains);
@@ -206,6 +213,8 @@
 
 	fire_watches(NULL, domain, "@releaseDomain", false);
 
+	wrl_domain_destroy(domain);
+
 	return 0;
 }
 
@@ -253,6 +262,9 @@
 bool domain_can_read(struct connection *conn)
 {
 	struct xenstore_domain_interface *intf = conn->domain->interface;
+
+	if (domain_is_unprivileged(conn) && conn->domain->wrl_credit < 0)
+		return false;
 	return (intf->req_cons != intf->req_prod);
 }
 
@@ -284,6 +296,8 @@
 	domain->domid = domid;
 	domain->path = talloc_domain_path(domain, domid);
 
+	wrl_domain_new(domain);
+
 	list_add(&domain->list, &domains);
 	talloc_set_destructor(domain, destroy_domain);
 
@@ -751,6 +765,233 @@
 		: 0;
 }
 
+static wrl_creditt wrl_config_writecost      = WRL_FACTOR;
+static wrl_creditt wrl_config_rate           = WRL_RATE   * WRL_FACTOR;
+static wrl_creditt wrl_config_dburst         = WRL_DBURST * WRL_FACTOR;
+static wrl_creditt wrl_config_gburst         = WRL_GBURST * WRL_FACTOR;
+static wrl_creditt wrl_config_newdoms_dburst =
+	                         WRL_DBURST * WRL_NEWDOMS * WRL_FACTOR;
+
+long wrl_ntransactions;
+
+static long wrl_ndomains;
+static wrl_creditt wrl_reserve; /* [-wrl_config_newdoms_dburst, +_gburst ] */
+static time_t wrl_log_last_warning; /* 0: no previous warning */
+
+void wrl_gettime_now(struct wrl_timestampt *now_wt)
+{
+	struct timespec now_ts;
+	int r;
+
+	r = clock_gettime(CLOCK_MONOTONIC, &now_ts);
+	if (r)
+		barf_perror("Could not find time (clock_gettime failed)");
+
+	now_wt->sec = now_ts.tv_sec;
+	now_wt->msec = now_ts.tv_nsec / 1000000;
+}
+
+static void wrl_xfer_credit(wrl_creditt *debit,  wrl_creditt debit_floor,
+			    wrl_creditt *credit, wrl_creditt credit_ceil)
+	/*
+	 * Transfers zero or more credit from "debit" to "credit".
+	 * Transfers as much as possible while maintaining
+	 * debit >= debit_floor and credit <= credit_ceil.
+	 * (If that's violated already, does nothing.)
+	 *
+	 * Sufficient conditions to avoid overflow, either of:
+	 *  |every argument| <= 0x3fffffff
+	 *  |every argument| <= 1E9
+	 *  |every argument| <= WRL_CREDIT_MAX
+	 * (And this condition is preserved.)
+	 */
+{
+	wrl_creditt xfer = MIN( *debit      - debit_floor,
+			        credit_ceil - *credit      );
+	if (xfer > 0) {
+		*debit -= xfer;
+		*credit += xfer;
+	}
+}
+
+void wrl_domain_new(struct domain *domain)
+{
+	domain->wrl_credit = 0;
+	wrl_gettime_now(&domain->wrl_timestamp);
+	wrl_ndomains++;
+	/* Steal up to DBURST from the reserve */
+	wrl_xfer_credit(&wrl_reserve, -wrl_config_newdoms_dburst,
+			&domain->wrl_credit, wrl_config_dburst);
+}
+
+void wrl_domain_destroy(struct domain *domain)
+{
+	wrl_ndomains--;
+	/*
+	 * Don't bother recalculating domain's credit - this just
+	 * means we don't give the reserve the ending domain's credit
+	 * for time elapsed since last update.
+	 */
+	wrl_xfer_credit(&domain->wrl_credit, 0,
+			&wrl_reserve, wrl_config_dburst);
+}
+
+void wrl_credit_update(struct domain *domain, struct wrl_timestampt now)
+{
+	/*
+	 * We want to calculate
+	 *    credit += (now - timestamp) * RATE / ndoms;
+	 * But we want it to saturate, and to avoid floating point.
+	 * To avoid rounding errors from constantly adding small
+	 * amounts of credit, we only add credit for whole milliseconds.
+	 */
+	long seconds      = now.sec -  domain->wrl_timestamp.sec;
+	long milliseconds = now.msec - domain->wrl_timestamp.msec;
+	long msec;
+	int64_t denom, num;
+	wrl_creditt surplus;
+
+	seconds = MIN(seconds, 1000*1000); /* arbitrary, prevents overflow */
+	msec = seconds * 1000 + milliseconds;
+
+	if (msec < 0)
+                /* shouldn't happen with CLOCK_MONOTONIC */
+		msec = 0;
+
+	/* 32x32 -> 64 cannot overflow */
+	denom = (int64_t)msec * wrl_config_rate;
+	num  =  (int64_t)wrl_ndomains * 1000;
+	/* denom / num <= 1E6 * wrl_config_rate, so with
+	   reasonable wrl_config_rate, denom / num << 2^64 */
+
+	/* at last! */
+	domain->wrl_credit = MIN( (int64_t)domain->wrl_credit + denom / num,
+				  WRL_CREDIT_MAX );
+	/* (maybe briefly violating the DBURST cap on wrl_credit) */
+
+	/* maybe take from the reserve to make us nonnegative */
+	wrl_xfer_credit(&wrl_reserve,        0,
+			&domain->wrl_credit, 0);
+
+	/* return any surplus (over DBURST) to the reserve */
+	surplus = 0;
+	wrl_xfer_credit(&domain->wrl_credit, wrl_config_dburst,
+			&surplus,            WRL_CREDIT_MAX);
+	wrl_xfer_credit(&surplus,     0,
+			&wrl_reserve, wrl_config_gburst);
+	/* surplus is now implicitly discarded */
+
+	domain->wrl_timestamp = now;
+
+	trace("wrl: dom %4d %6ld  msec  %9ld credit   %9ld reserve"
+	      "  %9ld discard\n",
+	      domain->domid,
+	      msec,
+	      (long)domain->wrl_credit, (long)wrl_reserve,
+	      (long)surplus);
+}
+
+void wrl_check_timeout(struct domain *domain,
+		       struct wrl_timestampt now,
+		       int *ptimeout)
+{
+	uint64_t num, denom;
+	int wakeup;
+
+	wrl_credit_update(domain, now);
+
+	if (domain->wrl_credit >= 0)
+		/* not blocked */
+		return;
+
+	if (!*ptimeout)
+		/* already decided on immediate wakeup,
+		   so no need to calculate our timeout */
+		return;
+
+	/* calculate  wakeup = now + -credit / (RATE / ndoms); */
+
+	/* credit cannot go more -ve than one transaction,
+	 * so the first multiplication cannot overflow even 32-bit */
+	num   = (uint64_t)(-domain->wrl_credit * 1000) * wrl_ndomains;
+	denom = wrl_config_rate;
+
+	wakeup = MIN( num / denom /* uint64_t */, INT_MAX );
+	if (*ptimeout==-1 || wakeup < *ptimeout)
+		*ptimeout = wakeup;
+
+	trace("wrl: domain %u credit=%ld (reserve=%ld) SLEEPING for %d\n",
+	      domain->domid,
+	      (long)domain->wrl_credit, (long)wrl_reserve,
+	      wakeup);
+}
+
+#define WRL_LOG(now, ...) \
+	(syslog(LOG_WARNING, "write rate limit: " __VA_ARGS__))
+
+void wrl_apply_debit_actual(struct domain *domain)
+{
+	struct wrl_timestampt now;
+
+	if (!domain)
+		/* sockets escape the write rate limit */
+		return;
+
+	wrl_gettime_now(&now);
+	wrl_credit_update(domain, now);
+
+	domain->wrl_credit -= wrl_config_writecost;
+	trace("wrl: domain %u credit=%ld (reserve=%ld)\n",
+	      domain->domid,
+	      (long)domain->wrl_credit, (long)wrl_reserve);
+
+	if (domain->wrl_credit < 0) {
+		if (!domain->wrl_delay_logged) {
+			domain->wrl_delay_logged = true;
+			WRL_LOG(now, "domain %ld is affected",
+				(long)domain->domid);
+		} else if (!wrl_log_last_warning) {
+			WRL_LOG(now, "rate limiting restarts");
+		}
+		wrl_log_last_warning = now.sec;
+	}
+}
+
+void wrl_log_periodic(struct wrl_timestampt now)
+{
+	if (wrl_log_last_warning &&
+	    (now.sec - wrl_log_last_warning) > WRL_LOGEVERY) {
+		WRL_LOG(now, "not in force recently");
+		wrl_log_last_warning = 0;
+	}
+}
+
+void wrl_apply_debit_direct(struct connection *conn)
+{
+	if (!conn)
+		/* some writes are generated internally */
+		return;
+
+	if (conn->transaction)
+		/* these are accounted for when the transaction ends */
+		return;
+
+	if (!wrl_ntransactions)
+		/* we don't conflict with anyone */
+		return;
+
+	wrl_apply_debit_actual(conn->domain);
+}
+
+void wrl_apply_debit_trans_commit(struct connection *conn)
+{
+	if (wrl_ntransactions <= 1)
+		/* our own transaction appears in the counter */
+		return;
+
+	wrl_apply_debit_actual(conn->domain);
+}
+
 /*
  * Local variables:
  *  c-file-style: "linux"
diff -Nru xen-4.8.1~pre.2017.01.23/tools/xenstore/xenstored_domain.h xen-4.8.1/tools/xenstore/xenstored_domain.h
--- xen-4.8.1~pre.2017.01.23/tools/xenstore/xenstored_domain.h	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/xenstore/xenstored_domain.h	2017-04-10 14:21:48.000000000 +0100
@@ -65,4 +65,31 @@
 void domain_watch_dec(struct connection *conn);
 int domain_watch(struct connection *conn);
 
+/* Write rate limiting */
+
+#define WRL_FACTOR   1000 /* for fixed-point arithmetic */
+#define WRL_RATE      200
+#define WRL_DBURST     10
+#define WRL_GBURST   1000
+#define WRL_NEWDOMS     5
+#define WRL_LOGEVERY  120 /* seconds */
+
+struct wrl_timestampt {
+	time_t sec;
+	int msec;
+};
+
+extern long wrl_ntransactions;
+
+void wrl_gettime_now(struct wrl_timestampt *now_ts);
+void wrl_domain_new(struct domain *domain);
+void wrl_domain_destroy(struct domain *domain);
+void wrl_credit_update(struct domain *domain, struct wrl_timestampt now);
+void wrl_check_timeout(struct domain *domain,
+                       struct wrl_timestampt now,
+                       int *ptimeout);
+void wrl_log_periodic(struct wrl_timestampt now);
+void wrl_apply_debit_direct(struct connection *conn);
+void wrl_apply_debit_trans_commit(struct connection *conn);
+
 #endif /* _XENSTORED_DOMAIN_H */
diff -Nru xen-4.8.1~pre.2017.01.23/tools/xenstore/xenstored_transaction.c xen-4.8.1/tools/xenstore/xenstored_transaction.c
--- xen-4.8.1~pre.2017.01.23/tools/xenstore/xenstored_transaction.c	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/xenstore/xenstored_transaction.c	2017-04-10 14:21:48.000000000 +0100
@@ -120,6 +120,7 @@
 {
 	struct transaction *trans = _transaction;
 
+	wrl_ntransactions--;
 	trace_destroy(trans, "transaction");
 	if (trans->tdb)
 		tdb_close(trans->tdb);
@@ -183,6 +184,7 @@
 	talloc_steal(conn, trans);
 	talloc_set_destructor(trans, destroy_transaction);
 	conn->transaction_started++;
+	wrl_ntransactions++;
 
 	snprintf(id_str, sizeof(id_str), "%u", trans->id);
 	send_reply(conn, XS_TRANSACTION_START, id_str, strlen(id_str)+1);
@@ -218,6 +220,9 @@
 			send_error(conn, EAGAIN);
 			return;
 		}
+
+		wrl_apply_debit_trans_commit(conn);
+
 		if (!replace_tdb(trans->tdb_name, trans->tdb)) {
 			send_error(conn, errno);
 			return;
diff -Nru xen-4.8.1~pre.2017.01.23/xen/Makefile xen-4.8.1/xen/Makefile
--- xen-4.8.1~pre.2017.01.23/xen/Makefile	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/Makefile	2017-04-10 14:21:48.000000000 +0100
@@ -2,7 +2,7 @@
 # All other places this is stored (eg. compile.h) should be autogenerated.
 export XEN_VERSION       = 4
 export XEN_SUBVERSION    = 8
-export XEN_EXTRAVERSION ?= .1-pre$(XEN_VENDORVERSION)
+export XEN_EXTRAVERSION ?= .1$(XEN_VENDORVERSION)
 export XEN_FULLVERSION   = $(XEN_VERSION).$(XEN_SUBVERSION)$(XEN_EXTRAVERSION)
 -include xen-version
 
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/arm/alternative.c xen-4.8.1/xen/arch/arm/alternative.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/arm/alternative.c	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/arm/alternative.c	2017-04-10 14:21:48.000000000 +0100
@@ -25,6 +25,7 @@
 #include <xen/vmap.h>
 #include <xen/smp.h>
 #include <xen/stop_machine.h>
+#include <xen/virtual_region.h>
 #include <asm/alternative.h>
 #include <asm/atomic.h>
 #include <asm/byteorder.h>
@@ -155,8 +156,12 @@
         int ret;
         struct alt_region region;
         mfn_t xen_mfn = _mfn(virt_to_mfn(_start));
-        unsigned int xen_order = get_order_from_bytes(_end - _start);
+        paddr_t xen_size = _end - _start;
+        unsigned int xen_order = get_order_from_bytes(xen_size);
         void *xenmap;
+        struct virtual_region patch_region = {
+            .list = LIST_HEAD_INIT(patch_region.list),
+        };
 
         BUG_ON(patched);
 
@@ -170,6 +175,15 @@
         BUG_ON(!xenmap);
 
         /*
+         * If we generate a new branch instruction, the target will be
+         * calculated in this re-mapped Xen region. So we have to register
+         * this re-mapped Xen region as a virtual region temporarily.
+         */
+        patch_region.start = xenmap;
+        patch_region.end = xenmap + xen_size;
+        register_virtual_region(&patch_region);
+
+        /*
          * Find the virtual address of the alternative region in the new
          * mapping.
          * alt_instr contains relative offset, so the function
@@ -183,6 +197,8 @@
         /* The patching is not expected to fail during boot. */
         BUG_ON(ret != 0);
 
+        unregister_virtual_region(&patch_region);
+
         vunmap(xenmap);
 
         /* Barriers provided by the cache flushing */
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/arm/domain_build.c xen-4.8.1/xen/arch/arm/domain_build.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/arm/domain_build.c	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/arm/domain_build.c	2017-04-10 14:21:48.000000000 +0100
@@ -48,20 +48,6 @@
     p2m_type_t p2mt;
 };
 
-static const struct dt_device_match dev_map_attrs[] __initconst =
-{
-    {
-        __DT_MATCH_COMPATIBLE("mmio-sram"),
-        __DT_MATCH_PROP("no-memory-wc"),
-        .data = (void *) (uintptr_t) p2m_mmio_direct_dev,
-    },
-    {
-        __DT_MATCH_COMPATIBLE("mmio-sram"),
-        .data = (void *) (uintptr_t) p2m_mmio_direct_nc,
-    },
-    { /* sentinel */ },
-};
-
 //#define DEBUG_11_ALLOCATION
 #ifdef DEBUG_11_ALLOCATION
 # define D11PRINT(fmt, args...) printk(XENLOG_DEBUG fmt, ##args)
@@ -1159,21 +1145,6 @@
     return 0;
 }
 
-static p2m_type_t lookup_map_attr(struct dt_device_node *node,
-                                  p2m_type_t parent_p2mt)
-{
-    const struct dt_device_match *r;
-
-    /* Search and if nothing matches, use the parent's attributes.  */
-    r = dt_match_node(dev_map_attrs, node);
-
-    /*
-     * If this node does not dictate specific mapping attributes,
-     * it inherits its parent's attributes.
-     */
-    return r ? (uintptr_t) r->data : parent_p2mt;
-}
-
 static int handle_node(struct domain *d, struct kernel_info *kinfo,
                        struct dt_device_node *node,
                        p2m_type_t p2mt)
@@ -1264,7 +1235,6 @@
                "WARNING: Path %s is reserved, skip the node as we may re-use the path.\n",
                path);
 
-    p2mt = lookup_map_attr(node, p2mt);
     res = handle_device(d, node, p2mt);
     if ( res)
         return res;
@@ -1319,7 +1289,7 @@
 
 static int prepare_dtb(struct domain *d, struct kernel_info *kinfo)
 {
-    const p2m_type_t default_p2mt = p2m_mmio_direct_dev;
+    const p2m_type_t default_p2mt = p2m_mmio_direct_c;
     const void *fdt;
     int new_size;
     int ret;
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/arm/gic.c xen-4.8.1/xen/arch/arm/gic.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/arm/gic.c	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/arm/gic.c	2017-04-10 14:21:48.000000000 +0100
@@ -205,7 +205,10 @@
          */
         if ( test_bit(_IRQ_INPROGRESS, &desc->status) ||
              !test_bit(_IRQ_DISABLED, &desc->status) )
+        {
+            vgic_unlock_rank(v_target, rank, flags);
             return -EBUSY;
+        }
     }
 
     clear_bit(_IRQ_GUEST, &desc->status);
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/arm/irq.c xen-4.8.1/xen/arch/arm/irq.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/arm/irq.c	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/arm/irq.c	2017-04-10 14:21:48.000000000 +0100
@@ -477,26 +477,32 @@
      */
     if ( desc->action != NULL )
     {
-        struct domain *ad = irq_get_domain(desc);
-
-        if ( test_bit(_IRQ_GUEST, &desc->status) && d == ad )
+        if ( test_bit(_IRQ_GUEST, &desc->status) )
         {
-            if ( irq_get_guest_info(desc)->virq != virq )
+            struct domain *ad = irq_get_domain(desc);
+
+            if ( d == ad )
+            {
+                if ( irq_get_guest_info(desc)->virq != virq )
+                {
+                    printk(XENLOG_G_ERR
+                           "d%u: IRQ %u is already assigned to vIRQ %u\n",
+                           d->domain_id, irq, irq_get_guest_info(desc)->virq);
+                    retval = -EBUSY;
+                }
+            }
+            else
             {
-                printk(XENLOG_G_ERR
-                       "d%u: IRQ %u is already assigned to vIRQ %u\n",
-                       d->domain_id, irq, irq_get_guest_info(desc)->virq);
+                printk(XENLOG_G_ERR "IRQ %u is already used by domain %u\n",
+                       irq, ad->domain_id);
                 retval = -EBUSY;
             }
-            goto out;
         }
-
-        if ( test_bit(_IRQ_GUEST, &desc->status) )
-            printk(XENLOG_G_ERR "IRQ %u is already used by domain %u\n",
-                   irq, ad->domain_id);
         else
+        {
             printk(XENLOG_G_ERR "IRQ %u is already used by Xen\n", irq);
-        retval = -EBUSY;
+            retval = -EBUSY;
+        }
         goto out;
     }
 
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/arm/mm.c xen-4.8.1/xen/arch/arm/mm.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/arm/mm.c	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/arm/mm.c	2017-04-10 14:21:48.000000000 +0100
@@ -390,6 +390,16 @@
 
     clean_and_invalidate_dcache_va_range(v, PAGE_SIZE);
     unmap_domain_page(v);
+
+    /*
+     * For some of the instruction cache (such as VIPT), the entire I-Cache
+     * needs to be flushed to guarantee that all the aliases of a given
+     * physical address will be removed from the cache.
+     * Invalidating the I-Cache by VA highly depends on the behavior of the
+     * I-Cache (See D4.9.2 in ARM DDI 0487A.k_iss10775). Instead of using flush
+     * by VA on select platforms, we just flush the entire cache here.
+     */
+    invalidate_icache();
 }
 
 void __init arch_init_memory(void)
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/arm/p2m.c xen-4.8.1/xen/arch/arm/p2m.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/arm/p2m.c	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/arm/p2m.c	2017-04-10 14:21:48.000000000 +0100
@@ -135,13 +135,12 @@
 {
     register_t hcr;
     struct p2m_domain *p2m = &n->domain->arch.p2m;
+    uint8_t *last_vcpu_ran;
 
     if ( is_idle_vcpu(n) )
         return;
 
     hcr = READ_SYSREG(HCR_EL2);
-    WRITE_SYSREG(hcr & ~HCR_VM, HCR_EL2);
-    isb();
 
     WRITE_SYSREG64(p2m->vttbr, VTTBR_EL2);
     isb();
@@ -156,6 +155,17 @@
 
     WRITE_SYSREG(hcr, HCR_EL2);
     isb();
+
+    last_vcpu_ran = &p2m->last_vcpu_ran[smp_processor_id()];
+
+    /*
+     * Flush local TLB for the domain to prevent wrong TLB translation
+     * when running multiple vCPU of the same domain on a single pCPU.
+     */
+    if ( *last_vcpu_ran != INVALID_VCPU_ID && *last_vcpu_ran != n->vcpu_id )
+        flush_tlb_local();
+
+    *last_vcpu_ran = n->vcpu_id;
 }
 
 static void p2m_flush_tlb(struct p2m_domain *p2m)
@@ -734,6 +744,7 @@
     unsigned int i;
     lpae_t *table;
     mfn_t mfn;
+    struct page_info *pg;
 
     /* Nothing to do if the entry is invalid. */
     if ( !p2m_valid(entry) )
@@ -771,7 +782,10 @@
     mfn = _mfn(entry.p2m.base);
     ASSERT(mfn_valid(mfn_x(mfn)));
 
-    free_domheap_page(mfn_to_page(mfn_x(mfn)));
+    pg = mfn_to_page(mfn_x(mfn));
+
+    page_list_del(pg, &p2m->pages);
+    free_domheap_page(pg);
 }
 
 static bool p2m_split_superpage(struct p2m_domain *p2m, lpae_t *entry,
@@ -982,9 +996,10 @@
 
     /*
      * The radix-tree can only work on 4KB. This is only used when
-     * memaccess is enabled.
+     * memaccess is enabled and during shutdown.
      */
-    ASSERT(!p2m->mem_access_enabled || page_order == 0);
+    ASSERT(!p2m->mem_access_enabled || page_order == 0 ||
+           p2m->domain->is_dying);
     /*
      * The access type should always be p2m_access_rwx when the mapping
      * is removed.
@@ -1176,7 +1191,7 @@
     if ( !(nr && iomem_access_permitted(d, mfn_x(mfn), mfn_x(mfn) + nr - 1)) )
         return 0;
 
-    res = map_mmio_regions(d, gfn, nr, mfn);
+    res = p2m_insert_mapping(d, gfn, nr, mfn, p2m_mmio_direct_c);
     if ( res < 0 )
     {
         printk(XENLOG_G_ERR "Unable to map MFNs [%#"PRI_mfn" - %#"PRI_mfn" in Dom%d\n",
@@ -1308,6 +1323,7 @@
 {
     struct p2m_domain *p2m = &d->arch.p2m;
     int rc = 0;
+    unsigned int cpu;
 
     rwlock_init(&p2m->lock);
     INIT_PAGE_LIST_HEAD(&p2m->pages);
@@ -1336,6 +1352,17 @@
 
     rc = p2m_alloc_table(d);
 
+    /*
+     * Make sure that the type chosen to is able to store the an vCPU ID
+     * between 0 and the maximum of virtual CPUS supported as long as
+     * the INVALID_VCPU_ID.
+     */
+    BUILD_BUG_ON((1 << (sizeof(p2m->last_vcpu_ran[0]) * 8)) < MAX_VIRT_CPUS);
+    BUILD_BUG_ON((1 << (sizeof(p2m->last_vcpu_ran[0])* 8)) < INVALID_VCPU_ID);
+
+    for_each_possible_cpu(cpu)
+       p2m->last_vcpu_ran[cpu] = INVALID_VCPU_ID;
+
     return rc;
 }
 
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/arm/psci.c xen-4.8.1/xen/arch/arm/psci.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/arm/psci.c	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/arm/psci.c	2017-04-10 14:21:48.000000000 +0100
@@ -147,7 +147,7 @@
     psci_ver = call_smc(PSCI_0_2_FN_PSCI_VERSION, 0, 0, 0);
 
     /* For the moment, we only support PSCI 0.2 and PSCI 1.x */
-    if ( psci_ver != PSCI_VERSION(0, 2) && PSCI_VERSION_MAJOR(psci_ver != 1) )
+    if ( psci_ver != PSCI_VERSION(0, 2) && PSCI_VERSION_MAJOR(psci_ver) != 1 )
     {
         printk("Error: Unrecognized PSCI version %u.%u\n",
                PSCI_VERSION_MAJOR(psci_ver), PSCI_VERSION_MINOR(psci_ver));
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/arm/setup.c xen-4.8.1/xen/arch/arm/setup.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/arm/setup.c	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/arm/setup.c	2017-04-10 14:21:48.000000000 +0100
@@ -784,6 +784,8 @@
 
     smp_init_cpus();
     cpus = smp_get_max_cpus();
+    printk(XENLOG_INFO "SMP: Allowing %u CPUs\n", cpus);
+    nr_cpu_ids = cpus;
 
     init_xen_time();
 
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/arm/traps.c xen-4.8.1/xen/arch/arm/traps.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/arm/traps.c	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/arm/traps.c	2017-04-10 14:21:48.000000000 +0100
@@ -101,6 +101,19 @@
 
 integer_param("debug_stack_lines", debug_stack_lines);
 
+static enum {
+	TRAP,
+	NATIVE,
+} vwfi;
+
+static void __init parse_vwfi(const char *s)
+{
+	if ( !strcmp(s, "native") )
+		vwfi = NATIVE;
+	else
+		vwfi = TRAP;
+}
+custom_param("vwfi", parse_vwfi);
 
 void init_traps(void)
 {
@@ -127,8 +140,8 @@
 
     /* Setup hypervisor traps */
     WRITE_SYSREG(HCR_PTW|HCR_BSU_INNER|HCR_AMO|HCR_IMO|HCR_FMO|HCR_VM|
-                 HCR_TWE|HCR_TWI|HCR_TSC|HCR_TAC|HCR_SWIO|HCR_TIDCP|HCR_FB,
-                 HCR_EL2);
+                 (vwfi != NATIVE ? (HCR_TWI|HCR_TWE) : 0) |
+                 HCR_TSC|HCR_TAC|HCR_SWIO|HCR_TIDCP|HCR_FB,HCR_EL2);
     isb();
 }
 
@@ -643,7 +656,7 @@
     };
     mode = cpsr & PSR_MODE_MASK;
 
-    if ( mode > ARRAY_SIZE(mode_strings) )
+    if ( mode >= ARRAY_SIZE(mode_strings) )
         return "Unknown";
     return mode_strings[mode] ? : "Unknown";
 }
@@ -2280,6 +2293,20 @@
         return inject_undef64_exception(regs, hsr.len);
 
     /*
+     *  ICC_SRE_EL2.Enable = 0
+     *
+     *  GIC Architecture Specification (IHI 0069C): Section 8.1.9
+     */
+    case HSR_SYSREG_ICC_SRE_EL1:
+        /*
+         * Trapped when the guest is using GICv2 whilst the platform
+         * interrupt controller is GICv3. In this case, the register
+         * should be emulate as RAZ/WI to tell the guest to use the GIC
+         * memory mapped interface (i.e GICv2 compatibility).
+         */
+        return handle_raz_wi(regs, regidx, hsr.sysreg.read, hsr, 1);
+
+    /*
      * HCR_EL2.TIDCP
      *
      * ARMv8 (DDI 0487A.d): D1-1501 Table D1-43
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/arm/vgic-v2.c xen-4.8.1/xen/arch/arm/vgic-v2.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/arm/vgic-v2.c	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/arm/vgic-v2.c	2017-04-10 14:21:48.000000000 +0100
@@ -79,7 +79,7 @@
     offset &= ~(NR_TARGETS_PER_ITARGETSR - 1);
 
     for ( i = 0; i < NR_TARGETS_PER_ITARGETSR; i++, offset++ )
-        reg |= (1 << rank->vcpu[offset]) << (i * NR_BITS_PER_TARGET);
+        reg |= (1 << read_atomic(&rank->vcpu[offset])) << (i * NR_BITS_PER_TARGET);
 
     return reg;
 }
@@ -152,7 +152,7 @@
         /* The vCPU ID always starts from 0 */
         new_target--;
 
-        old_target = rank->vcpu[offset];
+        old_target = read_atomic(&rank->vcpu[offset]);
 
         /* Only migrate the vIRQ if the target vCPU has changed */
         if ( new_target != old_target )
@@ -162,7 +162,7 @@
                              virq);
         }
 
-        rank->vcpu[offset] = new_target;
+        write_atomic(&rank->vcpu[offset], new_target);
     }
 }
 
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/arm/vgic-v3.c xen-4.8.1/xen/arch/arm/vgic-v3.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/arm/vgic-v3.c	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/arm/vgic-v3.c	2017-04-10 14:21:48.000000000 +0100
@@ -107,7 +107,7 @@
     /* Get the index in the rank */
     offset &= INTERRUPT_RANK_MASK;
 
-    return vcpuid_to_vaffinity(rank->vcpu[offset]);
+    return vcpuid_to_vaffinity(read_atomic(&rank->vcpu[offset]));
 }
 
 /*
@@ -135,7 +135,7 @@
     offset &= virq & INTERRUPT_RANK_MASK;
 
     new_vcpu = vgic_v3_irouter_to_vcpu(d, irouter);
-    old_vcpu = d->vcpu[rank->vcpu[offset]];
+    old_vcpu = d->vcpu[read_atomic(&rank->vcpu[offset])];
 
     /*
      * From the spec (see 8.9.13 in IHI 0069A), any write with an
@@ -153,7 +153,7 @@
     if ( new_vcpu != old_vcpu )
         vgic_migrate_irq(old_vcpu, new_vcpu, virq);
 
-    rank->vcpu[offset] = new_vcpu->vcpu_id;
+    write_atomic(&rank->vcpu[offset], new_vcpu->vcpu_id);
 }
 
 static inline bool vgic_reg64_check_access(struct hsr_dabt dabt)
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/arm/vgic.c xen-4.8.1/xen/arch/arm/vgic.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/arm/vgic.c	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/arm/vgic.c	2017-04-10 14:21:48.000000000 +0100
@@ -85,7 +85,7 @@
     rank->index = index;
 
     for ( i = 0; i < NR_INTERRUPT_PER_RANK; i++ )
-        rank->vcpu[i] = vcpu;
+        write_atomic(&rank->vcpu[i], vcpu);
 }
 
 int domain_vgic_register(struct domain *d, int *mmio_count)
@@ -218,28 +218,11 @@
     return 0;
 }
 
-/* The function should be called by rank lock taken. */
-static struct vcpu *__vgic_get_target_vcpu(struct vcpu *v, unsigned int virq)
-{
-    struct vgic_irq_rank *rank = vgic_rank_irq(v, virq);
-
-    ASSERT(spin_is_locked(&rank->lock));
-
-    return v->domain->vcpu[rank->vcpu[virq & INTERRUPT_RANK_MASK]];
-}
-
-/* takes the rank lock */
 struct vcpu *vgic_get_target_vcpu(struct vcpu *v, unsigned int virq)
 {
-    struct vcpu *v_target;
     struct vgic_irq_rank *rank = vgic_rank_irq(v, virq);
-    unsigned long flags;
-
-    vgic_lock_rank(v, rank, flags);
-    v_target = __vgic_get_target_vcpu(v, virq);
-    vgic_unlock_rank(v, rank, flags);
-
-    return v_target;
+    int target = read_atomic(&rank->vcpu[virq & INTERRUPT_RANK_MASK]);
+    return v->domain->vcpu[target];
 }
 
 static int vgic_get_virq_priority(struct vcpu *v, unsigned int virq)
@@ -326,7 +309,7 @@
 
     while ( (i = find_next_bit(&mask, 32, i)) < 32 ) {
         irq = i + (32 * n);
-        v_target = __vgic_get_target_vcpu(v, irq);
+        v_target = vgic_get_target_vcpu(v, irq);
         p = irq_to_pending(v_target, irq);
         clear_bit(GIC_IRQ_GUEST_ENABLED, &p->status);
         gic_remove_from_queues(v_target, irq);
@@ -368,7 +351,7 @@
 
     while ( (i = find_next_bit(&mask, 32, i)) < 32 ) {
         irq = i + (32 * n);
-        v_target = __vgic_get_target_vcpu(v, irq);
+        v_target = vgic_get_target_vcpu(v, irq);
         p = irq_to_pending(v_target, irq);
         set_bit(GIC_IRQ_GUEST_ENABLED, &p->status);
         spin_lock_irqsave(&v_target->arch.vgic.lock, flags);
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/x86/domain.c xen-4.8.1/xen/arch/x86/domain.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/x86/domain.c	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/x86/domain.c	2017-04-10 14:21:48.000000000 +0100
@@ -1315,16 +1315,24 @@
         return 0;
     }
 
-    if ( seg != x86_seg_tr && !reg->attr.fields.s )
+    if ( seg == x86_seg_tr )
     {
-        gprintk(XENLOG_ERR,
-                "System segment provided for a code or data segment\n");
-        return -EINVAL;
-    }
+        if ( reg->attr.fields.s )
+        {
+            gprintk(XENLOG_ERR, "Code or data segment provided for TR\n");
+            return -EINVAL;
+        }
 
-    if ( seg == x86_seg_tr && reg->attr.fields.s )
+        if ( reg->attr.fields.type != SYS_DESC_tss_busy )
+        {
+            gprintk(XENLOG_ERR, "Non-32-bit-TSS segment provided for TR\n");
+            return -EINVAL;
+        }
+    }
+    else if ( !reg->attr.fields.s )
     {
-        gprintk(XENLOG_ERR, "Code or data segment provided for TR\n");
+        gprintk(XENLOG_ERR,
+                "System segment provided for a code or data segment\n");
         return -EINVAL;
     }
 
@@ -1387,7 +1395,8 @@
 #define SEG(s, r) ({                                                        \
     s = (struct segment_register){ .base = (r)->s ## _base,                 \
                                    .limit = (r)->s ## _limit,               \
-                                   .attr.bytes = (r)->s ## _ar };           \
+                                   .attr.bytes = (r)->s ## _ar |            \
+                                       (x86_seg_##s != x86_seg_tr ? 1 : 2) }; \
     check_segment(&s, x86_seg_ ## s); })
 
         rc = SEG(cs, regs);
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/x86/efi/efi-boot.h xen-4.8.1/xen/arch/x86/efi/efi-boot.h
--- xen-4.8.1~pre.2017.01.23/xen/arch/x86/efi/efi-boot.h	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/x86/efi/efi-boot.h	2017-04-10 14:21:48.000000000 +0100
@@ -13,7 +13,11 @@
 static multiboot_info_t __initdata mbi = {
     .flags = MBI_MODULES | MBI_LOADERNAME
 };
-static module_t __initdata mb_modules[3];
+/*
+ * The array size needs to be one larger than the number of modules we
+ * support - see __start_xen().
+ */
+static module_t __initdata mb_modules[5];
 
 static void __init edd_put_string(u8 *dst, size_t n, const char *src)
 {
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/x86/hvm/hvm.c xen-4.8.1/xen/arch/x86/hvm/hvm.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/x86/hvm/hvm.c	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/x86/hvm/hvm.c	2017-04-10 14:21:48.000000000 +0100
@@ -387,13 +387,20 @@
     }
 
     delta_tsc = guest_tsc - tsc;
-    v->arch.hvm_vcpu.msr_tsc_adjust += delta_tsc
-                          - v->arch.hvm_vcpu.cache_tsc_offset;
     v->arch.hvm_vcpu.cache_tsc_offset = delta_tsc;
 
     hvm_funcs.set_tsc_offset(v, v->arch.hvm_vcpu.cache_tsc_offset, at_tsc);
 }
 
+static void hvm_set_guest_tsc_msr(struct vcpu *v, u64 guest_tsc)
+{
+    uint64_t tsc_offset = v->arch.hvm_vcpu.cache_tsc_offset;
+
+    hvm_set_guest_tsc(v, guest_tsc);
+    v->arch.hvm_vcpu.msr_tsc_adjust += v->arch.hvm_vcpu.cache_tsc_offset
+                          - tsc_offset;
+}
+
 void hvm_set_guest_tsc_adjust(struct vcpu *v, u64 tsc_adjust)
 {
     v->arch.hvm_vcpu.cache_tsc_offset += tsc_adjust
@@ -3940,7 +3947,7 @@
         break;
 
     case MSR_IA32_TSC:
-        hvm_set_guest_tsc(v, msr_content);
+        hvm_set_guest_tsc_msr(v, msr_content);
         break;
 
     case MSR_IA32_TSC_ADJUST:
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/x86/hvm/mtrr.c xen-4.8.1/xen/arch/x86/hvm/mtrr.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/x86/hvm/mtrr.c	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/x86/hvm/mtrr.c	2017-04-10 14:21:48.000000000 +0100
@@ -776,17 +776,19 @@
     if ( v->domain != d )
         v = d->vcpu ? d->vcpu[0] : NULL;
 
-    if ( !mfn_valid(mfn_x(mfn)) ||
-         rangeset_contains_range(mmio_ro_ranges, mfn_x(mfn),
-                                 mfn_x(mfn) + (1UL << order) - 1) )
-    {
-        *ipat = 1;
-        return MTRR_TYPE_UNCACHABLE;
-    }
-
+    /* Mask, not add, for order so it works with INVALID_MFN on unmapping */
     if ( rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
-                                 mfn_x(mfn) + (1UL << order) - 1) )
+                                 mfn_x(mfn) | ((1UL << order) - 1)) )
+    {
+        if ( !order || rangeset_contains_range(mmio_ro_ranges, mfn_x(mfn),
+                                               mfn_x(mfn) | ((1UL << order) - 1)) )
+        {
+            *ipat = 1;
+            return MTRR_TYPE_UNCACHABLE;
+        }
+        /* Force invalid memory type so resolve_misconfig() will split it */
         return -1;
+    }
 
     if ( direct_mmio )
     {
@@ -798,6 +800,12 @@
         return MTRR_TYPE_WRBACK;
     }
 
+    if ( !mfn_valid(mfn_x(mfn)) )
+    {
+        *ipat = 1;
+        return MTRR_TYPE_UNCACHABLE;
+    }
+
     if ( !need_iommu(d) && !cache_flush_permitted(d) )
     {
         *ipat = 1;
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/x86/hvm/svm/svm.c xen-4.8.1/xen/arch/x86/hvm/svm/svm.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/x86/hvm/svm/svm.c	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/x86/hvm/svm/svm.c	2017-04-10 14:21:48.000000000 +0100
@@ -353,7 +353,7 @@
     data->msr_cstar        = vmcb->cstar;
     data->msr_syscall_mask = vmcb->sfmask;
     data->msr_efer         = v->arch.hvm_vcpu.guest_efer;
-    data->msr_flags        = -1ULL;
+    data->msr_flags        = 0;
 }
 
 
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/x86/hvm/svm/vmcb.c xen-4.8.1/xen/arch/x86/hvm/svm/vmcb.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/x86/hvm/svm/vmcb.c	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/x86/hvm/svm/vmcb.c	2017-04-10 14:21:48.000000000 +0100
@@ -72,6 +72,9 @@
     struct arch_svm_struct *arch_svm = &v->arch.hvm_svm;
     struct vmcb_struct *vmcb = arch_svm->vmcb;
 
+    /* Build-time check of the size of VMCB AMD structure. */
+    BUILD_BUG_ON(sizeof(*vmcb) != PAGE_SIZE);
+
     vmcb->_general1_intercepts = 
         GENERAL1_INTERCEPT_INTR        | GENERAL1_INTERCEPT_NMI         |
         GENERAL1_INTERCEPT_SMI         | GENERAL1_INTERCEPT_INIT        |
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/x86/hvm/vmx/vmcs.c xen-4.8.1/xen/arch/x86/hvm/vmx/vmcs.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/x86/hvm/vmx/vmcs.c	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/x86/hvm/vmx/vmcs.c	2017-04-10 14:21:48.000000000 +0100
@@ -552,6 +552,20 @@
     local_irq_restore(flags);
 }
 
+void vmx_vmcs_reload(struct vcpu *v)
+{
+    /*
+     * As we may be running with interrupts disabled, we can't acquire
+     * v->arch.hvm_vmx.vmcs_lock here. However, with interrupts disabled
+     * the VMCS can't be taken away from us anymore if we still own it.
+     */
+    ASSERT(v->is_running || !local_irq_is_enabled());
+    if ( v->arch.hvm_vmx.vmcs_pa == this_cpu(current_vmcs) )
+        return;
+
+    vmx_load_vmcs(v);
+}
+
 int vmx_cpu_up_prepare(unsigned int cpu)
 {
     /*
@@ -1090,6 +1104,9 @@
             vmx_disable_intercept_for_msr(v, MSR_IA32_BNDCFGS, MSR_TYPE_R | MSR_TYPE_W);
     }
 
+    /* All guest MSR state is dirty. */
+    v->arch.hvm_vmx.msr_state.flags = ((1u << VMX_MSR_COUNT) - 1);
+
     /* I/O access bitmap. */
     __vmwrite(IO_BITMAP_A, __pa(d->arch.hvm_domain.io_bitmap));
     __vmwrite(IO_BITMAP_B, __pa(d->arch.hvm_domain.io_bitmap) + PAGE_SIZE);
@@ -1652,10 +1669,7 @@
     bool_t debug_state;
 
     if ( v->arch.hvm_vmx.active_cpu == smp_processor_id() )
-    {
-        if ( v->arch.hvm_vmx.vmcs_pa != this_cpu(current_vmcs) )
-            vmx_load_vmcs(v);
-    }
+        vmx_vmcs_reload(v);
     else
     {
         /*
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/x86/hvm/vmx/vmx.c xen-4.8.1/xen/arch/x86/hvm/vmx/vmx.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/x86/hvm/vmx/vmx.c	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/x86/hvm/vmx/vmx.c	2017-04-10 14:21:48.000000000 +0100
@@ -739,13 +739,12 @@
 static void vmx_save_cpu_state(struct vcpu *v, struct hvm_hw_cpu *data)
 {
     struct vmx_msr_state *guest_state = &v->arch.hvm_vmx.msr_state;
-    unsigned long guest_flags = guest_state->flags;
 
     data->shadow_gs = v->arch.hvm_vmx.shadow_gs;
     data->msr_cstar = v->arch.hvm_vmx.cstar;
 
     /* save msrs */
-    data->msr_flags        = guest_flags;
+    data->msr_flags        = 0;
     data->msr_lstar        = guest_state->msrs[VMX_INDEX_MSR_LSTAR];
     data->msr_star         = guest_state->msrs[VMX_INDEX_MSR_STAR];
     data->msr_syscall_mask = guest_state->msrs[VMX_INDEX_MSR_SYSCALL_MASK];
@@ -756,7 +755,7 @@
     struct vmx_msr_state *guest_state = &v->arch.hvm_vmx.msr_state;
 
     /* restore msrs */
-    guest_state->flags = data->msr_flags & 7;
+    guest_state->flags = ((1u << VMX_MSR_COUNT) - 1);
     guest_state->msrs[VMX_INDEX_MSR_LSTAR]        = data->msr_lstar;
     guest_state->msrs[VMX_INDEX_MSR_STAR]         = data->msr_star;
     guest_state->msrs[VMX_INDEX_MSR_SYSCALL_MASK] = data->msr_syscall_mask;
@@ -896,6 +895,18 @@
     if ( unlikely(!this_cpu(vmxon)) )
         return;
 
+    if ( !v->is_running )
+    {
+        /*
+         * When this vCPU isn't marked as running anymore, a remote pCPU's
+         * attempt to pause us (from vmx_vmcs_enter()) won't have a reason
+         * to spin in vcpu_sleep_sync(), and hence that pCPU might have taken
+         * away the VMCS from us. As we're running with interrupts disabled,
+         * we also can't call vmx_vmcs_enter().
+         */
+        vmx_vmcs_reload(v);
+    }
+
     vmx_fpu_leave(v);
     vmx_save_guest_msrs(v);
     vmx_restore_host_msrs();
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/x86/mm/p2m-pt.c xen-4.8.1/xen/arch/x86/mm/p2m-pt.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/x86/mm/p2m-pt.c	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/x86/mm/p2m-pt.c	2017-04-10 14:21:48.000000000 +0100
@@ -452,7 +452,7 @@
                      mfn |= _PAGE_PSE_PAT >> PAGE_SHIFT;
                 }
                 else
-                     mfn &= ~(_PAGE_PSE_PAT >> PAGE_SHIFT);
+                     mfn &= ~((unsigned long)_PAGE_PSE_PAT >> PAGE_SHIFT);
                 flags |= _PAGE_PSE;
             }
             e = l1e_from_pfn(mfn, flags);
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/x86/mm/p2m.c xen-4.8.1/xen/arch/x86/mm/p2m.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/x86/mm/p2m.c	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/x86/mm/p2m.c	2017-04-10 14:21:48.000000000 +0100
@@ -2048,7 +2048,8 @@
     ASSERT(page_list_empty(&p2m->pod.super));
     ASSERT(page_list_empty(&p2m->pod.single));
 
-    if ( p2m->np2m_base == P2M_BASE_EADDR )
+    /* No need to flush if it's already empty */
+    if ( p2m_is_nestedp2m(p2m) && p2m->np2m_base == P2M_BASE_EADDR )
     {
         p2m_unlock(p2m);
         return;
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/x86/setup.c xen-4.8.1/xen/arch/x86/setup.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/x86/setup.c	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/x86/setup.c	2017-04-10 14:21:48.000000000 +0100
@@ -890,6 +890,17 @@
         mod[i].reserved = 0;
     }
 
+    if ( efi_enabled )
+    {
+        /*
+         * This needs to remain in sync with xen_in_range() and the
+         * respective reserve_e820_ram() invocation below.
+         */
+        mod[mbi->mods_count].mod_start = PFN_DOWN(mbi->mem_upper);
+        mod[mbi->mods_count].mod_end = __pa(__2M_rwdata_end) -
+                                       (mbi->mem_upper & PAGE_MASK);
+    }
+
     modules_headroom = bzimage_headroom(bootstrap_map(mod), mod->mod_end);
     bootstrap_map(NULL);
 
@@ -925,7 +936,7 @@
                      1UL << (PAGE_SHIFT + 32)) )
             e = min(HYPERVISOR_VIRT_END - DIRECTMAP_VIRT_START,
                     1UL << (PAGE_SHIFT + 32));
-#define reloc_size ((__pa(&_end) + mask) & ~mask)
+#define reloc_size ((__pa(__2M_rwdata_end) + mask) & ~mask)
         /* Is the region suitable for relocating Xen? */
         if ( !xen_phys_start && e <= limit )
         {
@@ -1070,8 +1081,9 @@
             if ( mod[j].reserved )
                 continue;
 
-            /* Don't overlap with other modules. */
-            end = consider_modules(s, e, size, mod, mbi->mods_count, j);
+            /* Don't overlap with other modules (or Xen itself). */
+            end = consider_modules(s, e, size, mod,
+                                   mbi->mods_count + efi_enabled, j);
 
             if ( highmem_start && end > highmem_start )
                 continue;
@@ -1096,9 +1108,9 @@
          */
         while ( !kexec_crash_area.start )
         {
-            /* Don't overlap with modules. */
-            e = consider_modules(s, e, PAGE_ALIGN(kexec_crash_area.size),
-                                 mod, mbi->mods_count, -1);
+            /* Don't overlap with modules (or Xen itself). */
+            e = consider_modules(s, e, PAGE_ALIGN(kexec_crash_area.size), mod,
+                                 mbi->mods_count + efi_enabled, -1);
             if ( s >= e )
                 break;
             if ( e > kexec_crash_area_limit )
@@ -1122,8 +1134,10 @@
 
     if ( !xen_phys_start )
         panic("Not enough memory to relocate Xen.");
-    reserve_e820_ram(&boot_e820, efi_enabled ? mbi->mem_upper : __pa(&_start),
-                     __pa(&_end));
+
+    /* This needs to remain in sync with xen_in_range(). */
+    reserve_e820_ram(&boot_e820, efi_enabled ? mbi->mem_upper : __pa(_stext),
+                     __pa(__2M_rwdata_end));
 
     /* Late kexec reservation (dynamic start address). */
     kexec_reserve_area(&boot_e820);
@@ -1672,7 +1686,7 @@
     paddr_t start, end;
     int i;
 
-    enum { region_s3, region_text, region_bss, nr_regions };
+    enum { region_s3, region_ro, region_rw, nr_regions };
     static struct {
         paddr_t s, e;
     } xen_regions[nr_regions] __hwdom_initdata;
@@ -1683,12 +1697,20 @@
         /* S3 resume code (and other real mode trampoline code) */
         xen_regions[region_s3].s = bootsym_phys(trampoline_start);
         xen_regions[region_s3].e = bootsym_phys(trampoline_end);
-        /* hypervisor code + data */
-        xen_regions[region_text].s =__pa(&_stext);
-        xen_regions[region_text].e = __pa(&__init_begin);
-        /* bss */
-        xen_regions[region_bss].s = __pa(&__bss_start);
-        xen_regions[region_bss].e = __pa(&__bss_end);
+
+        /*
+         * This needs to remain in sync with the uses of the same symbols in
+         * - __start_xen() (above)
+         * - is_xen_fixed_mfn()
+         * - tboot_shutdown()
+         */
+
+        /* hypervisor .text + .rodata */
+        xen_regions[region_ro].s = __pa(&_stext);
+        xen_regions[region_ro].e = __pa(&__2M_rodata_end);
+        /* hypervisor .data + .bss */
+        xen_regions[region_rw].s = __pa(&__2M_rwdata_start);
+        xen_regions[region_rw].e = __pa(&__2M_rwdata_end);
     }
 
     start = (paddr_t)mfn << PAGE_SHIFT;
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/x86/tboot.c xen-4.8.1/xen/arch/x86/tboot.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/x86/tboot.c	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/x86/tboot.c	2017-04-10 14:21:48.000000000 +0100
@@ -12,6 +12,7 @@
 #include <asm/processor.h>
 #include <asm/e820.h>
 #include <asm/tboot.h>
+#include <asm/setup.h>
 #include <crypto/vmac.h>
 
 /* tboot=<physical address of shared page> */
@@ -282,7 +283,7 @@
 
         if ( !mfn_valid(mfn) )
             continue;
-        if ( (mfn << PAGE_SHIFT) < __pa(&_end) )
+        if ( is_xen_fixed_mfn(mfn) )
             continue; /* skip Xen */
         if ( (mfn >= PFN_DOWN(g_tboot_shared->tboot_base - 3 * PAGE_SIZE))
              && (mfn < PFN_UP(g_tboot_shared->tboot_base
@@ -363,20 +364,22 @@
     if ( shutdown_type == TB_SHUTDOWN_S3 )
     {
         /*
-         * Xen regions for tboot to MAC
+         * Xen regions for tboot to MAC. This needs to remain in sync with
+         * xen_in_range().
          */
         g_tboot_shared->num_mac_regions = 3;
         /* S3 resume code (and other real mode trampoline code) */
         g_tboot_shared->mac_regions[0].start = bootsym_phys(trampoline_start);
         g_tboot_shared->mac_regions[0].size = bootsym_phys(trampoline_end) -
                                               bootsym_phys(trampoline_start);
-        /* hypervisor code + data */
+        /* hypervisor .text + .rodata */
         g_tboot_shared->mac_regions[1].start = (uint64_t)__pa(&_stext);
-        g_tboot_shared->mac_regions[1].size = __pa(&__init_begin) -
+        g_tboot_shared->mac_regions[1].size = __pa(&__2M_rodata_end) -
                                               __pa(&_stext);
-        /* bss */
-        g_tboot_shared->mac_regions[2].start = (uint64_t)__pa(&__bss_start);
-        g_tboot_shared->mac_regions[2].size = __pa(&__bss_end) - __pa(&__bss_start);
+        /* hypervisor .data + .bss */
+        g_tboot_shared->mac_regions[2].start = (uint64_t)__pa(&__2M_rwdata_start);
+        g_tboot_shared->mac_regions[2].size = __pa(&__2M_rwdata_end) -
+                                              __pa(&__2M_rwdata_start);
 
         /*
          * MAC domains and other Xen memory
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/x86/x86_emulate/x86_emulate.c xen-4.8.1/xen/arch/x86/x86_emulate/x86_emulate.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/x86/x86_emulate/x86_emulate.c	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/x86/x86_emulate/x86_emulate.c	2017-04-10 14:21:48.000000000 +0100
@@ -331,7 +331,11 @@
 
 #define copy_REX_VEX(ptr, rex, vex) do { \
     if ( (vex).opcx != vex_none ) \
+    { \
+        if ( !mode_64bit() ) \
+            vex.reg |= 8; \
         ptr[0] = 0xc4, ptr[1] = (vex).raw[0], ptr[2] = (vex).raw[1]; \
+    } \
     else if ( mode_64bit() ) \
         ptr[1] = rex | REX_PREFIX; \
 } while (0)
@@ -870,15 +874,15 @@
     put_fpu(&fic);                                      \
 } while (0)
 
-#define emulate_fpu_insn_stub(_bytes...)                                \
+#define emulate_fpu_insn_stub(bytes...)                                 \
 do {                                                                    \
-    uint8_t *buf = get_stub(stub);                                      \
-    unsigned int _nr = sizeof((uint8_t[]){ _bytes });                   \
-    struct fpu_insn_ctxt fic = { .insn_bytes = _nr };                   \
-    memcpy(buf, ((uint8_t[]){ _bytes, 0xc3 }), _nr + 1);                \
-    get_fpu(X86EMUL_FPU_fpu, &fic);                                     \
-    stub.func();                                                        \
-    put_fpu(&fic);                                                      \
+    unsigned int nr_ = sizeof((uint8_t[]){ bytes });                    \
+    struct fpu_insn_ctxt fic_ = { .insn_bytes = nr_ };                  \
+    memcpy(get_stub(stub), ((uint8_t[]){ bytes, 0xc3 }), nr_ + 1);      \
+    get_fpu(X86EMUL_FPU_fpu, &fic_);                                    \
+    asm volatile ( "call *%[stub]" : "+m" (fic_) :                      \
+                   [stub] "rm" (stub.func) );                           \
+    put_fpu(&fic_);                                                     \
     put_stub(stub);                                                     \
 } while (0)
 
@@ -893,7 +897,7 @@
                    "call *%[func];"                                     \
                    _POST_EFLAGS("[eflags]", "[mask]", "[tmp]")          \
                    : [eflags] "+g" (_regs.eflags),                      \
-                     [tmp] "=&r" (tmp_)                                 \
+                     [tmp] "=&r" (tmp_), "+m" (fic_)                    \
                    : [func] "rm" (stub.func),                           \
                      [mask] "i" (EFLG_ZF|EFLG_PF|EFLG_CF) );            \
     put_fpu(&fic_);                                                     \
@@ -1356,6 +1360,11 @@
         }
         memset(sreg, 0, sizeof(*sreg));
         sreg->sel = sel;
+
+        /* Since CPL == SS.DPL, we need to put back DPL. */
+        if ( seg == x86_seg_ss )
+            sreg->attr.fields.dpl = sel;
+
         return X86EMUL_OKAY;
     }
 
@@ -2017,16 +2026,21 @@
             default:
                 BUG(); /* Shouldn't be possible. */
             case 2:
-                if ( in_realmode(ctxt, ops) || (state->regs->eflags & EFLG_VM) )
+                if ( state->regs->eflags & EFLG_VM )
                     break;
                 /* fall through */
             case 4:
-                if ( modrm_mod != 3 )
+                if ( modrm_mod != 3 || in_realmode(ctxt, ops) )
                     break;
                 /* fall through */
             case 8:
                 /* VEX / XOP / EVEX */
                 generate_exception_if(rex_prefix || vex.pfx, EXC_UD, -1);
+                /*
+                 * With operand size override disallowed (see above), op_bytes
+                 * should not have changed from its default.
+                 */
+                ASSERT(op_bytes == def_op_bytes);
 
                 vex.raw[0] = modrm;
                 if ( b == 0xc5 )
@@ -2053,6 +2067,12 @@
                             op_bytes = 8;
                         }
                     }
+                    else
+                    {
+                        /* Operand size fixed at 4 (no override via W bit). */
+                        op_bytes = 4;
+                        vex.b = 1;
+                    }
                     switch ( b )
                     {
                     case 0x62:
@@ -2071,7 +2091,7 @@
                         break;
                     }
                 }
-                if ( mode_64bit() && !vex.r )
+                if ( !vex.r )
                     rex_prefix |= REX_R;
 
                 ext = vex.opcx;
@@ -2113,12 +2133,21 @@
 
                 opcode |= b | MASK_INSR(vex.pfx, X86EMUL_OPC_PFX_MASK);
 
+                if ( !(d & ModRM) )
+                {
+                    modrm_reg = modrm_rm = modrm_mod = modrm = 0;
+                    break;
+                }
+
                 modrm = insn_fetch_type(uint8_t);
                 modrm_mod = (modrm & 0xc0) >> 6;
 
                 break;
             }
+    }
 
+    if ( d & ModRM )
+    {
         modrm_reg = ((rex_prefix & 4) << 1) | ((modrm & 0x38) >> 3);
         modrm_rm  = modrm & 0x07;
 
@@ -2182,6 +2211,17 @@
                     break;
                 }
                 break;
+            case 0x20: /* mov cr,reg */
+            case 0x21: /* mov dr,reg */
+            case 0x22: /* mov reg,cr */
+            case 0x23: /* mov reg,dr */
+                /*
+                 * Mov to/from cr/dr ignore the encoding of Mod, and behave as
+                 * if they were encoded as reg/reg instructions.  No futher
+                 * disp/SIB bytes are fetched.
+                 */
+                modrm_mod = 3;
+                break;
             }
             break;
 
@@ -4730,7 +4770,7 @@
     case X86EMUL_OPC(0x0f, 0x21): /* mov dr,reg */
     case X86EMUL_OPC(0x0f, 0x22): /* mov reg,cr */
     case X86EMUL_OPC(0x0f, 0x23): /* mov reg,dr */
-        generate_exception_if(ea.type != OP_REG, EXC_UD, -1);
+        ASSERT(ea.type == OP_REG); /* Early operand adjustment ensures this. */
         generate_exception_if(!mode_ring0(), EXC_GP, 0);
         modrm_reg |= lock_prefix << 3;
         if ( b & 2 )
@@ -5050,6 +5090,7 @@
     }
 
     case X86EMUL_OPC(0x0f, 0xa3): bt: /* bt */
+        generate_exception_if(lock_prefix, EXC_UD, 0);
         emulate_2op_SrcV_nobyte("bt", src, dst, _regs.eflags);
         dst.type = OP_NONE;
         break;
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/x86/x86_emulate/x86_emulate.h xen-4.8.1/xen/arch/x86/x86_emulate/x86_emulate.h
--- xen-4.8.1~pre.2017.01.23/xen/arch/x86/x86_emulate/x86_emulate.h	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/x86/x86_emulate/x86_emulate.h	2017-04-10 14:21:48.000000000 +0100
@@ -71,7 +71,7 @@
  * Attribute for segment selector. This is a copy of bit 40:47 & 52:55 of the
  * segment descriptor. It happens to match the format of an AMD SVM VMCB.
  */
-typedef union __attribute__((__packed__)) segment_attributes {
+typedef union segment_attributes {
     uint16_t bytes;
     struct
     {
@@ -91,7 +91,7 @@
  * Full state of a segment register (visible and hidden portions).
  * Again, this happens to match the format of an AMD SVM VMCB.
  */
-struct __attribute__((__packed__)) segment_register {
+struct segment_register {
     uint16_t   sel;
     segment_attributes_t attr;
     uint32_t   limit;
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/x86/xen.lds.S xen-4.8.1/xen/arch/x86/xen.lds.S
--- xen-4.8.1~pre.2017.01.23/xen/arch/x86/xen.lds.S	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/x86/xen.lds.S	2017-04-10 14:21:48.000000000 +0100
@@ -299,7 +299,7 @@
 }
 
 ASSERT(__image_base__ > XEN_VIRT_START ||
-       _end <= XEN_VIRT_END - NR_CPUS * PAGE_SIZE,
+       __2M_rwdata_end <= XEN_VIRT_END - NR_CPUS * PAGE_SIZE,
        "Xen image overlaps stubs area")
 
 #ifdef CONFIG_KEXEC
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/x86/xstate.c xen-4.8.1/xen/arch/x86/xstate.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/x86/xstate.c	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/x86/xstate.c	2017-04-10 14:21:48.000000000 +0100
@@ -92,7 +92,7 @@
 
     if ( bsp )
     {
-        xstate_features = fls(xfeature_mask);
+        xstate_features = flsl(xfeature_mask);
         xstate_offsets = xzalloc_array(unsigned int, xstate_features);
         if ( !xstate_offsets )
             return -ENOMEM;
diff -Nru xen-4.8.1~pre.2017.01.23/xen/common/memory.c xen-4.8.1/xen/common/memory.c
--- xen-4.8.1~pre.2017.01.23/xen/common/memory.c	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/common/memory.c	2017-04-10 14:21:48.000000000 +0100
@@ -437,8 +437,8 @@
         goto fail_early;
     }
 
-    if ( !guest_handle_okay(exch.in.extent_start, exch.in.nr_extents) ||
-         !guest_handle_okay(exch.out.extent_start, exch.out.nr_extents) )
+    if ( !guest_handle_subrange_okay(exch.in.extent_start, exch.nr_exchanged,
+                                     exch.in.nr_extents - 1) )
     {
         rc = -EFAULT;
         goto fail_early;
@@ -448,11 +448,27 @@
     {
         in_chunk_order  = exch.out.extent_order - exch.in.extent_order;
         out_chunk_order = 0;
+
+        if ( !guest_handle_subrange_okay(exch.out.extent_start,
+                                         exch.nr_exchanged >> in_chunk_order,
+                                         exch.out.nr_extents - 1) )
+        {
+            rc = -EFAULT;
+            goto fail_early;
+        }
     }
     else
     {
         in_chunk_order  = 0;
         out_chunk_order = exch.in.extent_order - exch.out.extent_order;
+
+        if ( !guest_handle_subrange_okay(exch.out.extent_start,
+                                         exch.nr_exchanged << out_chunk_order,
+                                         exch.out.nr_extents - 1) )
+        {
+            rc = -EFAULT;
+            goto fail_early;
+        }
     }
 
     d = rcu_lock_domain_by_any_id(exch.in.domid);
diff -Nru xen-4.8.1~pre.2017.01.23/xen/common/sched_credit2.c xen-4.8.1/xen/common/sched_credit2.c
--- xen-4.8.1~pre.2017.01.23/xen/common/sched_credit2.c	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/common/sched_credit2.c	2017-04-10 14:21:48.000000000 +0100
@@ -491,12 +491,15 @@
 }
 
 /*
- * Clear the bits of all the siblings of cpu from mask.
+ * Clear the bits of all the siblings of cpu from mask (if necessary).
  */
 static inline
 void smt_idle_mask_clear(unsigned int cpu, cpumask_t *mask)
 {
-    cpumask_andnot(mask, mask, per_cpu(cpu_sibling_mask, cpu));
+    const cpumask_t *cpu_siblings = per_cpu(cpu_sibling_mask, cpu);
+
+    if ( cpumask_subset(cpu_siblings, mask) )
+        cpumask_andnot(mask, mask, per_cpu(cpu_sibling_mask, cpu));
 }
 
 /*
@@ -510,24 +513,26 @@
  */
 static int get_fallback_cpu(struct csched2_vcpu *svc)
 {
-    int cpu;
+    struct vcpu *v = svc->vcpu;
+    int cpu = v->processor;
+
+    cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+                cpupool_domain_cpumask(v->domain));
 
-    if ( likely(cpumask_test_cpu(svc->vcpu->processor,
-                                 svc->vcpu->cpu_hard_affinity)) )
-        return svc->vcpu->processor;
-
-    cpumask_and(cpumask_scratch, svc->vcpu->cpu_hard_affinity,
-                &svc->rqd->active);
-    cpu = cpumask_first(cpumask_scratch);
-    if ( likely(cpu < nr_cpu_ids) )
+    if ( likely(cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu))) )
         return cpu;
 
-    cpumask_and(cpumask_scratch, svc->vcpu->cpu_hard_affinity,
-                cpupool_domain_cpumask(svc->vcpu->domain));
+    if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu),
+                                   &svc->rqd->active)) )
+    {
+        cpumask_and(cpumask_scratch_cpu(cpu), &svc->rqd->active,
+                    cpumask_scratch_cpu(cpu));
+        return cpumask_first(cpumask_scratch_cpu(cpu));
+    }
 
-    ASSERT(!cpumask_empty(cpumask_scratch));
+    ASSERT(!cpumask_empty(cpumask_scratch_cpu(cpu)));
 
-    return cpumask_first(cpumask_scratch);
+    return cpumask_first(cpumask_scratch_cpu(cpu));
 }
 
 /*
@@ -898,6 +903,14 @@
 
 void burn_credits(struct csched2_runqueue_data *rqd, struct csched2_vcpu *, s_time_t);
 
+static inline void
+tickle_cpu(unsigned int cpu, struct csched2_runqueue_data *rqd)
+{
+    __cpumask_set_cpu(cpu, &rqd->tickled);
+    smt_idle_mask_clear(cpu, &rqd->smt_idle);
+    cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
+}
+
 /*
  * Check what processor it is best to 'wake', for picking up a vcpu that has
  * just been put (back) in the runqueue. Logic is as follows:
@@ -941,6 +954,9 @@
                     (unsigned char *)&d);
     }
 
+    cpumask_and(cpumask_scratch_cpu(cpu), new->vcpu->cpu_hard_affinity,
+                cpupool_domain_cpumask(new->vcpu->domain));
+
     /*
      * First of all, consider idle cpus, checking if we can just
      * re-use the pcpu where we were running before.
@@ -953,7 +969,7 @@
         cpumask_andnot(&mask, &rqd->idle, &rqd->smt_idle);
     else
         cpumask_copy(&mask, &rqd->smt_idle);
-    cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
+    cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu));
     i = cpumask_test_or_cycle(cpu, &mask);
     if ( i < nr_cpu_ids )
     {
@@ -968,7 +984,7 @@
      * gone through the scheduler yet.
      */
     cpumask_andnot(&mask, &rqd->idle, &rqd->tickled);
-    cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
+    cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu));
     i = cpumask_test_or_cycle(cpu, &mask);
     if ( i < nr_cpu_ids )
     {
@@ -984,7 +1000,7 @@
      */
     cpumask_andnot(&mask, &rqd->active, &rqd->idle);
     cpumask_andnot(&mask, &mask, &rqd->tickled);
-    cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
+    cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu));
     if ( cpumask_test_cpu(cpu, &mask) )
     {
         cur = CSCHED2_VCPU(curr_on_cpu(cpu));
@@ -1062,9 +1078,8 @@
                     sizeof(d),
                     (unsigned char *)&d);
     }
-    __cpumask_set_cpu(ipid, &rqd->tickled);
-    smt_idle_mask_clear(ipid, &rqd->smt_idle);
-    cpu_raise_softirq(ipid, SCHEDULE_SOFTIRQ);
+
+    tickle_cpu(ipid, rqd);
 
     if ( unlikely(new->tickled_cpu != -1) )
         SCHED_STAT_CRANK(tickled_cpu_overwritten);
@@ -1104,18 +1119,28 @@
 
     list_for_each( iter, &rqd->svc )
     {
+        unsigned int svc_cpu;
         struct csched2_vcpu * svc;
         int start_credit;
 
         svc = list_entry(iter, struct csched2_vcpu, rqd_elem);
+        svc_cpu = svc->vcpu->processor;
 
         ASSERT(!is_idle_vcpu(svc->vcpu));
         ASSERT(svc->rqd == rqd);
 
+        /*
+         * If svc is running, it is our responsibility to make sure, here,
+         * that the credit it has spent so far get accounted.
+         */
+        if ( svc->vcpu == curr_on_cpu(svc_cpu) )
+            burn_credits(rqd, svc, now);
+
         start_credit = svc->credit;
 
-        /* And add INIT * m, avoiding integer multiplication in the
-         * common case. */
+        /*
+         * Add INIT * m, avoiding integer multiplication in the common case.
+         */
         if ( likely(m==1) )
             svc->credit += CSCHED2_CREDIT_INIT;
         else
@@ -1378,7 +1403,9 @@
     SCHED_STAT_CRANK(vcpu_sleep);
 
     if ( curr_on_cpu(vc->processor) == vc )
-        cpu_raise_softirq(vc->processor, SCHEDULE_SOFTIRQ);
+    {
+        tickle_cpu(vc->processor, svc->rqd);
+    }
     else if ( __vcpu_on_runq(svc) )
     {
         ASSERT(svc->rqd == RQD(ops, vc->processor));
@@ -1492,7 +1519,7 @@
 csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
 {
     struct csched2_private *prv = CSCHED2_PRIV(ops);
-    int i, min_rqi = -1, new_cpu;
+    int i, min_rqi = -1, new_cpu, cpu = vc->processor;
     struct csched2_vcpu *svc = CSCHED2_VCPU(vc);
     s_time_t min_avgload = MAX_LOAD;
 
@@ -1512,7 +1539,7 @@
      * just grab the prv lock.  Instead, we'll have to trylock, and
      * do something else reasonable if we fail.
      */
-    ASSERT(spin_is_locked(per_cpu(schedule_data, vc->processor).schedule_lock));
+    ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
 
     if ( !read_trylock(&prv->lock) )
     {
@@ -1526,6 +1553,9 @@
         goto out;
     }
 
+    cpumask_and(cpumask_scratch_cpu(cpu), vc->cpu_hard_affinity,
+                cpupool_domain_cpumask(vc->domain));
+
     /*
      * First check to see if we're here because someone else suggested a place
      * for us to move.
@@ -1537,13 +1567,13 @@
             printk(XENLOG_WARNING "%s: target runqueue disappeared!\n",
                    __func__);
         }
-        else
+        else if ( cpumask_intersects(cpumask_scratch_cpu(cpu),
+                                     &svc->migrate_rqd->active) )
         {
-            cpumask_and(cpumask_scratch, vc->cpu_hard_affinity,
+            cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
                         &svc->migrate_rqd->active);
-            new_cpu = cpumask_any(cpumask_scratch);
-            if ( new_cpu < nr_cpu_ids )
-                goto out_up;
+            new_cpu = cpumask_any(cpumask_scratch_cpu(cpu));
+            goto out_up;
         }
         /* Fall-through to normal cpu pick */
     }
@@ -1571,12 +1601,12 @@
          */
         if ( rqd == svc->rqd )
         {
-            if ( cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) )
+            if ( cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active) )
                 rqd_avgload = max_t(s_time_t, rqd->b_avgload - svc->avgload, 0);
         }
         else if ( spin_trylock(&rqd->lock) )
         {
-            if ( cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) )
+            if ( cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active) )
                 rqd_avgload = rqd->b_avgload;
 
             spin_unlock(&rqd->lock);
@@ -1598,9 +1628,9 @@
         goto out_up;
     }
 
-    cpumask_and(cpumask_scratch, vc->cpu_hard_affinity,
+    cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
                 &prv->rqd[min_rqi].active);
-    new_cpu = cpumask_any(cpumask_scratch);
+    new_cpu = cpumask_any(cpumask_scratch_cpu(cpu));
     BUG_ON(new_cpu >= nr_cpu_ids);
 
  out_up:
@@ -1675,6 +1705,8 @@
                     struct csched2_runqueue_data *trqd, 
                     s_time_t now)
 {
+    int cpu = svc->vcpu->processor;
+
     if ( unlikely(tb_init_done) )
     {
         struct {
@@ -1696,8 +1728,8 @@
         svc->migrate_rqd = trqd;
         __set_bit(_VPF_migrating, &svc->vcpu->pause_flags);
         __set_bit(__CSFLAG_runq_migrate_request, &svc->flags);
-        cpu_raise_softirq(svc->vcpu->processor, SCHEDULE_SOFTIRQ);
         SCHED_STAT_CRANK(migrate_requested);
+        tickle_cpu(cpu, svc->rqd);
     }
     else
     {
@@ -1711,9 +1743,11 @@
         }
         __runq_deassign(svc);
 
-        cpumask_and(cpumask_scratch, svc->vcpu->cpu_hard_affinity,
+        cpumask_and(cpumask_scratch_cpu(cpu), svc->vcpu->cpu_hard_affinity,
+                    cpupool_domain_cpumask(svc->vcpu->domain));
+        cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
                     &trqd->active);
-        svc->vcpu->processor = cpumask_any(cpumask_scratch);
+        svc->vcpu->processor = cpumask_any(cpumask_scratch_cpu(cpu));
         ASSERT(svc->vcpu->processor < nr_cpu_ids);
 
         __runq_assign(svc, trqd);
@@ -1737,8 +1771,14 @@
 static bool_t vcpu_is_migrateable(struct csched2_vcpu *svc,
                                   struct csched2_runqueue_data *rqd)
 {
+    struct vcpu *v = svc->vcpu;
+    int cpu = svc->vcpu->processor;
+
+    cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+                cpupool_domain_cpumask(v->domain));
+
     return !(svc->flags & CSFLAG_runq_migrate_request) &&
-           cpumask_intersects(svc->vcpu->cpu_hard_affinity, &rqd->active);
+           cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active);
 }
 
 static void balance_load(const struct scheduler *ops, int cpu, s_time_t now)
@@ -1928,10 +1968,40 @@
 csched2_vcpu_migrate(
     const struct scheduler *ops, struct vcpu *vc, unsigned int new_cpu)
 {
+    struct domain *d = vc->domain;
     struct csched2_vcpu * const svc = CSCHED2_VCPU(vc);
     struct csched2_runqueue_data *trqd;
+    s_time_t now = NOW();
+
+    /*
+     * Being passed a target pCPU which is outside of our cpupool is only
+     * valid if we are shutting down (or doing ACPI suspend), and we are
+     * moving everyone to BSP, no matter whether or not BSP is inside our
+     * cpupool.
+     *
+     * And since there indeed is the chance that it is not part of it, all
+     * we must do is remove _and_ unassign the vCPU from any runqueue, as
+     * well as updating v->processor with the target, so that the suspend
+     * process can continue.
+     *
+     * It will then be during resume that a new, meaningful, value for
+     * v->processor will be chosen, and during actual domain unpause that
+     * the vCPU will be assigned to and added to the proper runqueue.
+     */
+    if ( unlikely(!cpumask_test_cpu(new_cpu, cpupool_domain_cpumask(d))) )
+    {
+        ASSERT(system_state == SYS_STATE_suspend);
+        if ( __vcpu_on_runq(svc) )
+        {
+            __runq_remove(svc);
+            update_load(ops, svc->rqd, NULL, -1, now);
+        }
+        __runq_deassign(svc);
+        vc->processor = new_cpu;
+        return;
+    }
 
-    /* Check if new_cpu is valid */
+    /* If here, new_cpu must be a valid Credit2 pCPU, and in our affinity. */
     ASSERT(cpumask_test_cpu(new_cpu, &CSCHED2_PRIV(ops)->initialized));
     ASSERT(cpumask_test_cpu(new_cpu, vc->cpu_hard_affinity));
 
@@ -1946,7 +2016,7 @@
      * pointing to a pcpu where we can't run any longer.
      */
     if ( trqd != svc->rqd )
-        migrate(ops, svc, trqd, NOW());
+        migrate(ops, svc, trqd, now);
     else
         vc->processor = new_cpu;
 }
diff -Nru xen-4.8.1~pre.2017.01.23/xen/common/schedule.c xen-4.8.1/xen/common/schedule.c
--- xen-4.8.1~pre.2017.01.23/xen/common/schedule.c	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/common/schedule.c	2017-04-10 14:21:48.000000000 +0100
@@ -84,7 +84,27 @@
           : (typeof((opsptr)->fn(opsptr, ##__VA_ARGS__)))0 )
 
 #define DOM2OP(_d)    (((_d)->cpupool == NULL) ? &ops : ((_d)->cpupool->sched))
-#define VCPU2OP(_v)   (DOM2OP((_v)->domain))
+static inline struct scheduler *VCPU2OP(const struct vcpu *v)
+{
+    struct domain *d = v->domain;
+
+    if ( likely(d->cpupool != NULL) )
+        return d->cpupool->sched;
+
+    /*
+     * If d->cpupool is NULL, this is a vCPU of the idle domain. And this
+     * case is special because the idle domain does not really belong to
+     * a cpupool and, hence, doesn't really have a scheduler). In fact, its
+     * vCPUs (may) run on pCPUs which are in different pools, with different
+     * schedulers.
+     *
+     * What we want, in this case, is the scheduler of the pCPU where this
+     * particular idle vCPU is running. And, since v->processor never changes
+     * for idle vCPUs, it is safe to use it, with no locks, to figure that out.
+     */
+    ASSERT(is_idle_domain(d));
+    return per_cpu(scheduler, v->processor);
+}
 #define VCPU2ONLINE(_v) cpupool_domain_cpumask((_v)->domain)
 
 static inline void trace_runstate_change(struct vcpu *v, int new_state)
@@ -633,8 +653,11 @@
 
 void restore_vcpu_affinity(struct domain *d)
 {
+    unsigned int cpu = smp_processor_id();
     struct vcpu *v;
 
+    ASSERT(system_state == SYS_STATE_resume);
+
     for_each_vcpu ( d, v )
     {
         spinlock_t *lock = vcpu_schedule_lock_irq(v);
@@ -643,18 +666,34 @@
         {
             cpumask_copy(v->cpu_hard_affinity, v->cpu_hard_affinity_saved);
             v->affinity_broken = 0;
+
         }
 
-        if ( v->processor == smp_processor_id() )
+        /*
+         * During suspend (in cpu_disable_scheduler()), we moved every vCPU
+         * to BSP (which, as of now, is pCPU 0), as a temporary measure to
+         * allow the nonboot processors to have their data structure freed
+         * and go to sleep. But nothing guardantees that the BSP is a valid
+         * pCPU for a particular domain.
+         *
+         * Therefore, here, before actually unpausing the domains, we should
+         * set v->processor of each of their vCPUs to something that will
+         * make sense for the scheduler of the cpupool in which they are in.
+         */
+        cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+                    cpupool_domain_cpumask(v->domain));
+        v->processor = cpumask_any(cpumask_scratch_cpu(cpu));
+
+        if ( v->processor == cpu )
         {
             set_bit(_VPF_migrating, &v->pause_flags);
-            vcpu_schedule_unlock_irq(lock, v);
+            spin_unlock_irq(lock);;
             vcpu_sleep_nosync(v);
             vcpu_migrate(v);
         }
         else
         {
-            vcpu_schedule_unlock_irq(lock, v);
+            spin_unlock_irq(lock);
         }
     }
 
diff -Nru xen-4.8.1~pre.2017.01.23/xen/drivers/passthrough/iommu.c xen-4.8.1/xen/drivers/passthrough/iommu.c
--- xen-4.8.1~pre.2017.01.23/xen/drivers/passthrough/iommu.c	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/drivers/passthrough/iommu.c	2017-04-10 14:21:48.000000000 +0100
@@ -244,8 +244,7 @@
     if ( !iommu_enabled || !dom_iommu(d)->platform_ops )
         return;
 
-    if ( need_iommu(d) )
-        iommu_teardown(d);
+    iommu_teardown(d);
 
     arch_iommu_domain_destroy(d);
 }
diff -Nru xen-4.8.1~pre.2017.01.23/xen/include/asm-arm/config.h xen-4.8.1/xen/include/asm-arm/config.h
--- xen-4.8.1~pre.2017.01.23/xen/include/asm-arm/config.h	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/include/asm-arm/config.h	2017-04-10 14:21:48.000000000 +0100
@@ -46,6 +46,8 @@
 #define MAX_VIRT_CPUS 8
 #endif
 
+#define INVALID_VCPU_ID MAX_VIRT_CPUS
+
 #define asmlinkage /* Nothing needed */
 
 #define __LINUX_ARM_ARCH__ 7
diff -Nru xen-4.8.1~pre.2017.01.23/xen/include/asm-arm/cpufeature.h xen-4.8.1/xen/include/asm-arm/cpufeature.h
--- xen-4.8.1~pre.2017.01.23/xen/include/asm-arm/cpufeature.h	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/include/asm-arm/cpufeature.h	2017-04-10 14:21:48.000000000 +0100
@@ -24,7 +24,7 @@
 #define cpu_has_arm       (boot_cpu_feature32(arm) == 1)
 #define cpu_has_thumb     (boot_cpu_feature32(thumb) >= 1)
 #define cpu_has_thumb2    (boot_cpu_feature32(thumb) >= 3)
-#define cpu_has_jazelle   (boot_cpu_feature32(jazelle) >= 0)
+#define cpu_has_jazelle   (boot_cpu_feature32(jazelle) > 0)
 #define cpu_has_thumbee   (boot_cpu_feature32(thumbee) == 1)
 #define cpu_has_aarch32   (cpu_has_arm || cpu_has_thumb)
 
diff -Nru xen-4.8.1~pre.2017.01.23/xen/include/asm-arm/p2m.h xen-4.8.1/xen/include/asm-arm/p2m.h
--- xen-4.8.1~pre.2017.01.23/xen/include/asm-arm/p2m.h	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/include/asm-arm/p2m.h	2017-04-10 14:21:48.000000000 +0100
@@ -95,6 +95,9 @@
 
     /* back pointer to domain */
     struct domain *domain;
+
+    /* Keeping track on which CPU this p2m was used and for which vCPU */
+    uint8_t last_vcpu_ran[NR_CPUS];
 };
 
 /*
diff -Nru xen-4.8.1~pre.2017.01.23/xen/include/asm-arm/page.h xen-4.8.1/xen/include/asm-arm/page.h
--- xen-4.8.1~pre.2017.01.23/xen/include/asm-arm/page.h	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/include/asm-arm/page.h	2017-04-10 14:21:48.000000000 +0100
@@ -292,24 +292,20 @@
 
 static inline int invalidate_dcache_va_range(const void *p, unsigned long size)
 {
-    size_t off;
     const void *end = p + size;
+    size_t cacheline_mask = cacheline_bytes - 1;
 
     dsb(sy);           /* So the CPU issues all writes to the range */
 
-    off = (unsigned long)p % cacheline_bytes;
-    if ( off )
+    if ( (uintptr_t)p & cacheline_mask )
     {
-        p -= off;
+        p = (void *)((uintptr_t)p & ~cacheline_mask);
         asm volatile (__clean_and_invalidate_dcache_one(0) : : "r" (p));
         p += cacheline_bytes;
-        size -= cacheline_bytes - off;
     }
-    off = (unsigned long)end % cacheline_bytes;
-    if ( off )
+    if ( (uintptr_t)end & cacheline_mask )
     {
-        end -= off;
-        size -= off;
+        end = (void *)((uintptr_t)end & ~cacheline_mask);
         asm volatile (__clean_and_invalidate_dcache_one(0) : : "r" (end));
     }
 
@@ -323,9 +319,10 @@
 
 static inline int clean_dcache_va_range(const void *p, unsigned long size)
 {
-    const void *end;
+    const void *end = p + size;
     dsb(sy);           /* So the CPU issues all writes to the range */
-    for ( end = p + size; p < end; p += cacheline_bytes )
+    p = (void *)((uintptr_t)p & ~(cacheline_bytes - 1));
+    for ( ; p < end; p += cacheline_bytes )
         asm volatile (__clean_dcache_one(0) : : "r" (p));
     dsb(sy);           /* So we know the flushes happen before continuing */
     /* ARM callers assume that dcache_* functions cannot fail. */
@@ -335,9 +332,10 @@
 static inline int clean_and_invalidate_dcache_va_range
     (const void *p, unsigned long size)
 {
-    const void *end;
+    const void *end = p + size;
     dsb(sy);         /* So the CPU issues all writes to the range */
-    for ( end = p + size; p < end; p += cacheline_bytes )
+    p = (void *)((uintptr_t)p & ~(cacheline_bytes - 1));
+    for ( ; p < end; p += cacheline_bytes )
         asm volatile (__clean_and_invalidate_dcache_one(0) : : "r" (p));
     dsb(sy);         /* So we know the flushes happen before continuing */
     /* ARM callers assume that dcache_* functions cannot fail. */
diff -Nru xen-4.8.1~pre.2017.01.23/xen/include/asm-arm/sysregs.h xen-4.8.1/xen/include/asm-arm/sysregs.h
--- xen-4.8.1~pre.2017.01.23/xen/include/asm-arm/sysregs.h	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/include/asm-arm/sysregs.h	2017-04-10 14:21:48.000000000 +0100
@@ -90,6 +90,7 @@
 #define HSR_SYSREG_ICC_SGI1R_EL1  HSR_SYSREG(3,0,c12,c11,5)
 #define HSR_SYSREG_ICC_ASGI1R_EL1 HSR_SYSREG(3,1,c12,c11,6)
 #define HSR_SYSREG_ICC_SGI0R_EL1  HSR_SYSREG(3,2,c12,c11,7)
+#define HSR_SYSREG_ICC_SRE_EL1    HSR_SYSREG(3,0,c12,c12,5)
 #define HSR_SYSREG_CONTEXTIDR_EL1 HSR_SYSREG(3,0,c13,c0,1)
 
 #define HSR_SYSREG_PMCR_EL0       HSR_SYSREG(3,3,c9,c12,0)
diff -Nru xen-4.8.1~pre.2017.01.23/xen/include/asm-arm/vgic.h xen-4.8.1/xen/include/asm-arm/vgic.h
--- xen-4.8.1~pre.2017.01.23/xen/include/asm-arm/vgic.h	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/include/asm-arm/vgic.h	2017-04-10 14:21:48.000000000 +0100
@@ -69,7 +69,7 @@
     unsigned long status;
     struct irq_desc *desc; /* only set it the irq corresponds to a physical irq */
     unsigned int irq;
-#define GIC_INVALID_LR         ~(uint8_t)0
+#define GIC_INVALID_LR         (uint8_t)~0
     uint8_t lr;
     uint8_t priority;
     /* inflight is used to append instances of pending_irq to
@@ -107,7 +107,9 @@
 
     /*
      * It's more convenient to store a target VCPU per vIRQ
-     * than the register ITARGETSR/IROUTER itself
+     * than the register ITARGETSR/IROUTER itself.
+     * Use atomic operations to read/write the vcpu fields to avoid
+     * taking the rank lock.
      */
     uint8_t vcpu[32];
 };
diff -Nru xen-4.8.1~pre.2017.01.23/xen/include/asm-x86/hvm/svm/vmcb.h xen-4.8.1/xen/include/asm-x86/hvm/svm/vmcb.h
--- xen-4.8.1~pre.2017.01.23/xen/include/asm-x86/hvm/svm/vmcb.h	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/include/asm-x86/hvm/svm/vmcb.h	2017-04-10 14:21:48.000000000 +0100
@@ -308,7 +308,7 @@
 /* Definition of segment state is borrowed by the generic HVM code. */
 typedef struct segment_register svm_segment_register_t;
 
-typedef union __packed
+typedef union
 {
     u64 bytes;
     struct 
@@ -322,7 +322,7 @@
     } fields;
 } eventinj_t;
 
-typedef union __packed
+typedef union
 {
     u64 bytes;
     struct 
@@ -340,7 +340,7 @@
     } fields;
 } vintr_t;
 
-typedef union __packed
+typedef union
 {
     u64 bytes;
     struct 
@@ -357,7 +357,7 @@
     } fields;
 } ioio_info_t;
 
-typedef union __packed
+typedef union
 {
     u64 bytes;
     struct
@@ -366,7 +366,7 @@
     } fields;
 } lbrctrl_t;
 
-typedef union __packed
+typedef union
 {
     uint32_t bytes;
     struct
@@ -401,7 +401,7 @@
 #define IOPM_SIZE   (12 * 1024)
 #define MSRPM_SIZE  (8  * 1024)
 
-struct __packed vmcb_struct {
+struct vmcb_struct {
     u32 _cr_intercepts;         /* offset 0x00 - cleanbit 0 */
     u32 _dr_intercepts;         /* offset 0x04 - cleanbit 0 */
     u32 _exception_intercepts;  /* offset 0x08 - cleanbit 0 */
diff -Nru xen-4.8.1~pre.2017.01.23/xen/include/asm-x86/hvm/vmx/vmcs.h xen-4.8.1/xen/include/asm-x86/hvm/vmx/vmcs.h
--- xen-4.8.1~pre.2017.01.23/xen/include/asm-x86/hvm/vmx/vmcs.h	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/include/asm-x86/hvm/vmx/vmcs.h	2017-04-10 14:21:48.000000000 +0100
@@ -238,6 +238,7 @@
 void vmx_vmcs_enter(struct vcpu *v);
 bool_t __must_check vmx_vmcs_try_enter(struct vcpu *v);
 void vmx_vmcs_exit(struct vcpu *v);
+void vmx_vmcs_reload(struct vcpu *v);
 
 #define CPU_BASED_VIRTUAL_INTR_PENDING        0x00000004
 #define CPU_BASED_USE_TSC_OFFSETING           0x00000008
diff -Nru xen-4.8.1~pre.2017.01.23/xen/include/asm-x86/mm.h xen-4.8.1/xen/include/asm-x86/mm.h
--- xen-4.8.1~pre.2017.01.23/xen/include/asm-x86/mm.h	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/include/asm-x86/mm.h	2017-04-10 14:21:48.000000000 +0100
@@ -253,8 +253,8 @@
 #define is_xen_heap_mfn(mfn) \
     (__mfn_valid(mfn) && is_xen_heap_page(__mfn_to_page(mfn)))
 #define is_xen_fixed_mfn(mfn)                     \
-    ((((mfn) << PAGE_SHIFT) >= __pa(&_start)) &&  \
-     (((mfn) << PAGE_SHIFT) <= __pa(&_end)))
+    ((((mfn) << PAGE_SHIFT) >= __pa(&_stext)) &&  \
+     (((mfn) << PAGE_SHIFT) <= __pa(&__2M_rwdata_end)))
 
 #define PRtype_info "016lx"/* should only be used for printk's */
 
diff -Nru xen-4.8.1~pre.2017.01.23/xen/include/asm-x86/x86_64/uaccess.h xen-4.8.1/xen/include/asm-x86/x86_64/uaccess.h
--- xen-4.8.1~pre.2017.01.23/xen/include/asm-x86/x86_64/uaccess.h	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/include/asm-x86/x86_64/uaccess.h	2017-04-10 14:21:48.000000000 +0100
@@ -29,8 +29,9 @@
 /*
  * Valid if in +ve half of 48-bit address space, or above Xen-reserved area.
  * This is also valid for range checks (addr, addr+size). As long as the
- * start address is outside the Xen-reserved area then we will access a
- * non-canonical address (and thus fault) before ever reaching VIRT_START.
+ * start address is outside the Xen-reserved area, sequential accesses
+ * (starting at addr) will hit a non-canonical address (and thus fault)
+ * before ever reaching VIRT_START.
  */
 #define __addr_ok(addr) \
     (((unsigned long)(addr) < (1UL<<47)) || \
@@ -40,7 +41,8 @@
     (__addr_ok(addr) || is_compat_arg_xlat_range(addr, size))
 
 #define array_access_ok(addr, count, size) \
-    (access_ok(addr, (count)*(size)))
+    (likely(((count) ?: 0UL) < (~0UL / (size))) && \
+     access_ok(addr, (count) * (size)))
 
 #define __compat_addr_ok(d, addr) \
     ((unsigned long)(addr) < HYPERVISOR_COMPAT_VIRT_START(d))
diff -Nru xen-4.8.1~pre.2017.01.23/xen/include/public/arch-x86/hvm/save.h xen-4.8.1/xen/include/public/arch-x86/hvm/save.h
--- xen-4.8.1~pre.2017.01.23/xen/include/public/arch-x86/hvm/save.h	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/include/public/arch-x86/hvm/save.h	2017-04-10 14:21:48.000000000 +0100
@@ -135,7 +135,7 @@
     uint64_t shadow_gs;
 
     /* msr content saved/restored. */
-    uint64_t msr_flags;
+    uint64_t msr_flags; /* Obsolete, ignored. */
     uint64_t msr_lstar;
     uint64_t msr_star;
     uint64_t msr_cstar;
@@ -249,7 +249,7 @@
     uint64_t shadow_gs;
 
     /* msr content saved/restored. */
-    uint64_t msr_flags;
+    uint64_t msr_flags; /* Obsolete, ignored. */
     uint64_t msr_lstar;
     uint64_t msr_star;
     uint64_t msr_cstar;
diff -Nru xen-4.8.1~pre.2017.01.23/xen/include/public/memory.h xen-4.8.1/xen/include/public/memory.h
--- xen-4.8.1~pre.2017.01.23/xen/include/public/memory.h	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/include/public/memory.h	2017-04-10 14:21:48.000000000 +0100
@@ -222,9 +222,9 @@
                                     * XENMEM_add_to_physmap_batch only. */
 #define XENMAPSPACE_dev_mmio     5 /* device mmio region
                                       ARM only; the region is mapped in
-                                      Stage-2 using the memory attribute
-                                      "Device-nGnRE" (previously named
-                                      "Device" on ARMv7) */
+                                      Stage-2 using the Normal Memory
+                                      Inner/Outer Write-Back Cacheable
+                                      memory attribute. */
 /* ` } */
 
 /*
diff -Nru xen-4.8.1~pre.2017.01.23/xen/include/xsm/dummy.h xen-4.8.1/xen/include/xsm/dummy.h
--- xen-4.8.1~pre.2017.01.23/xen/include/xsm/dummy.h	2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/include/xsm/dummy.h	2017-04-10 14:21:48.000000000 +0100
@@ -712,18 +712,13 @@
     XSM_ASSERT_ACTION(XSM_OTHER);
     switch ( op )
     {
-    case XENPMU_mode_set:
-    case XENPMU_mode_get:
-    case XENPMU_feature_set:
-    case XENPMU_feature_get:
-        return xsm_default_action(XSM_PRIV, d, current->domain);
     case XENPMU_init:
     case XENPMU_finish:
     case XENPMU_lvtpc_set:
     case XENPMU_flush:
         return xsm_default_action(XSM_HOOK, d, current->domain);
     default:
-        return -EPERM;
+        return xsm_default_action(XSM_PRIV, d, current->domain);
     }
 }

--- End Message ---

--- Begin Message ---

To: Ian Jackson <Ian.Jackson@eu.citrix.com>, 860563-done@bugs.debian.org

Subject: Re: Bug#860563: unblock: xen/4.8.1-1

From: Niels Thykier <niels@thykier.net>

Date: Wed, 19 Apr 2017 09:34:00 +0000

Message-id: <18f1ec24-79db-aec7-3764-c02eafafb66d@thykier.net>

In-reply-to: <[🔎] 20170418173655.30710.96739.reportbug@mariner.uk.xensource.com>

References: <[🔎] 20170418173655.30710.96739.reportbug@mariner.uk.xensource.com>
Ian Jackson:
> Package: release.debian.org
> Severity: normal
> User: release.debian.org@packages.debian.org
> Usertags: unblock
> 
> Please unblock package xen
> 
> unblock xen/4.8.1-1
> 
> 
> This update includes three security fixes and a large number of other
> important bugfixes.
> 
> [...]
> 
> Thanks for your attention and I hope this approach meets with your
> approval.
> 
> Regards,
> Ian.
> 
> [...]

Unblocked, thanks.

~Niels
--- End Message ---

Reply to:

References:
- Bug#860563: unblock: xen/4.8.1-1
  - From: Ian Jackson <Ian.Jackson@eu.citrix.com>

Prev by Date: Bug#860593: marked as done (unblock: reportbug/7.1.6)
Next by Date: Bug#860717: unblock: python-passlib/1.7.1-1
Previous by thread: Bug#860563: unblock: xen/4.8.1-1
Next by thread: Bug#860573: unblock: php-horde-imp/6.2.17-2
Index(es):
- Date
- Thread