--- Begin Message ---
Package: release.debian.org
Severity: normal
User: release.debian.org@packages.debian.org
Usertags: unblock
Please unblock package xen
unblock xen/4.8.1-1
This update includes three security fixes and a large number of other
important bugfixes.
When preparing this update I had to choose between either (i) taking
the upstream 4.8.1 stable point release and reverting any changes I
felt inappropriate, or (ii) cherry picking the commits I felt
appropriate.
Looking at the git log [1] I concluded that the majority of the
non-security fixes were clearly bugfixes. Many of those bugfixes are
for crashes or races.
I decided that the lower risk approach would be to start with all the
commits from upstream, and revert any that ought to be excluded. This
reduces the risk of dropping an important bugfix.
Reviewing the commit log in detail there were two commits for which
the justification for backporting seemed quite unclear to me:
"xen/arm: *: Relax hw domain mapping attributes" - two commits, one
for ACPI and one for DT; and "x86/ept: allow write-combining on
!mfn_valid() MMIO mappings again". I queried these with other
upstream developers and came to the conclusion that they ought to be
included.
There are a number of other commits which are clear bugfixes, with a
low risk of regression, but also a low impact. I think it is probably
better to include these and ship Xen 4.8.1 in stretch, than to revert
them.
[1] git-log-4.8.1-1.txt, attached.
I'm afraid the debdiff will be hard to read - not because the changes
interact so much, but because there are quite a lot of them.
In the debdiff you will see a change to Config.mk. That change has no
effect on the Debian package build and could be stripped out. I chose
not to do this because I felt that messing with things was more likely
to break things than to fix them (see above).
Thanks for your attention and I hope this approach meets with your
approval.
Regards,
Ian.
-- System Information:
Debian Release: 8.6
APT prefers stable-updates
APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: i386 (x86_64)
Kernel: Linux 3.16.0-4-amd64 (SMP w/8 CPU cores)
Locale: LANG=C.UTF-8, LC_CTYPE=C.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: sysvinit (via /sbin/init)
commit 5ebb4de45c501ae12964a244ccd85fe1169a5f7c
Author: Jan Beulich <jbeulich@suse.com>
Date: Mon Apr 10 15:21:48 2017 +0200
update Xen version to 4.8.1
commit e1c62cdf782085605ea1186912fc419dd9464027
Author: Thomas Sanders <thomas.sanders@citrix.com>
Date: Tue Mar 28 18:57:52 2017 +0100
oxenstored: trim history in the frequent_ops function
We were trimming the history of commits only at the end of each
transaction (regardless of how it ended).
Therefore if non-transactional writes were being made but no
transactions were being ended, the history would grow
indefinitely. Now we trim the history at regular intervals.
Signed-off-by: Thomas Sanders <thomas.sanders@citrix.com>
commit 336afa82ca86fe61f9c46f89ae6726ff94754f34
Author: Thomas Sanders <thomas.sanders@citrix.com>
Date: Mon Mar 27 14:36:34 2017 +0100
oxenstored transaction conflicts: improve logging
For information related to transaction conflicts, potentially frequent
logging at "info" priority has been changed to "debug" priority, and
once per two minutes there is an "info" priority summary.
Additional detailed logging has been added at "debug" priority.
Reported-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Thomas Sanders <thomas.sanders@citrix.com>
commit 3ee0d82af271897e7e8f74949a4c50d47d460309
Author: Thomas Sanders <thomas.sanders@citrix.com>
Date: Fri Mar 24 19:55:03 2017 +0000
oxenstored: don't wake to issue no conflict-credit
In the main loop, when choosing the timeout for the select function
call, we were setting it so as to wake up to issue conflict-credit to
any domains that could accept it. When xenstore is idle, this would
mean waking up every 50ms (by default) to do no work. With this
commit, we check whether any domain is below its cap, and if not then
we set the timeout for longer (the same timeout as before the
conflict-protection feature was added).
Reported-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Thomas Sanders <thomas.sanders@citrix.com>
Reviewed-by: Jonathan Davies <jonathan.davies@citrix.com>
commit 84ee808e363887910984b3eb443466ce42e8010f
Author: Thomas Sanders <thomas.sanders@citrix.com>
Date: Fri Mar 24 16:16:10 2017 +0000
oxenstored: do not commit read-only transactions
The packet telling us to end the transaction has always carried an
argument telling us whether to commit.
If the transaction made no modifications to the tree, now we ignore
that argument and do not commit: it is just a waste of effort.
This makes read-only transactions immune to conflicts, and means that
we do not need to store any of their details in the history that is
used for assigning blame for conflicts.
We count a transaction as a read-only transaction only if it contains
no operations that modified the tree.
This means that (for example) a transaction that creates a new node
then deletes it would NOT count as read-only, even though it makes no
change overall. A more sophisticated algorithm could judge the
transaction based on comparison of its initial and final states, but
this would add complexity and computational cost.
Reported-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Thomas Sanders <thomas.sanders@citrix.com>
Reviewed-by: Jonathan Davies <jonathan.davies@citrix.com>
commit cb778dee017504505a5f20aea1831abef31a3e97
Author: Thomas Sanders <thomas.sanders@citrix.com>
Date: Thu Mar 23 19:06:54 2017 +0000
oxenstored: allow self-conflicts
We already avoid inter-domain conflicts but now allow intra-domain
conflicts. Although there are no known practical examples of a domain
that might perform operations that conflict with its own transactions,
this is conceivable, so here we avoid changing those semantics
unnecessarily.
When a transaction commit fails with a conflict and we look through
the history of commits to see which connection(s) to blame, ignore
historical commits that were made by the same connection as the
failing commit.
Reported-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Thomas Sanders <thomas.sanders@citrix.com>
Reviewed-by: Jonathan Davies <jonathan.davies@citrix.com>
commit fa0b2b9555366e5836a5fdacb62bb054cdefc3d6
Author: Jonathan Davies <jonathan.davies@citrix.com>
Date: Thu Mar 23 14:28:16 2017 +0000
oxenstored: blame the connection that caused a transaction conflict
Blame each connection found to have made a commit that would cause this
transaction to fail. Each blamed connection is penalised by having its
conflict-credit decremented.
Note the change in semantics for the replay function: we no longer stop after
finding the first operation that can't be replayed. This allows us to identify
all operations that conflicted with this transaction, not just the one that
conflicted first.
Signed-off-by: Jonathan Davies <jonathan.davies@citrix.com>
Signed-off-by: Thomas Sanders <thomas.sanders@citrix.com>
v1 Reviewed-by: Christian Lindig <christian.lindig@citrix.com>
Changes since v1:
* use correct log levels for informational messages
Changes since v2:
* fix the blame algorithm and improve logging
(fix was reviewed by Jonathan Davies)
Reported-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Thomas Sanders <thomas.sanders@citrix.com>
commit 9ea503220d33b9efae45405eeac5a3a08a902833
Author: Jonathan Davies <jonathan.davies@citrix.com>
Date: Mon Mar 27 08:58:29 2017 +0000
oxenstored: track commit history
Since the list of historic activity cannot grow without bound, it is safe to use
this to track commits.
Reported-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Jonathan Davies <jonathan.davies@citrix.com>
Reviewed-by: Thomas Sanders <thomas.sanders@citrix.com>
commit c68276082ac2bea5caf2bff26cc89771598e0de9
Author: Thomas Sanders <thomas.sanders@citrix.com>
Date: Thu Mar 23 14:25:16 2017 +0000
oxenstored: discard old commit-history on txn end
The history of commits is to be used for working out which historical
commit(s) (including atomic writes) caused conflicts with a
currently-failing commit of a transaction. Any commit that was made
before the current transaction started cannot be relevant. Therefore
we never need to keep history from before the start of the
longest-running transaction that is open at any given time: whenever a
transaction ends (with or without a commit) then if it was the
longest-running open transaction we can delete history up until start
of the the next-longest-running open transaction.
Some transactions might stay open for a very long time, so if any
transaction exceeds conflict_max_history_seconds then we remove it
from consideration in this context, and will not guarantee to keep
remembering about historical commits made during such a transaction.
We implement this by keeping a list of all open transactions that have
not been open too long. When a transaction ends, we remove it from the
list, along with any that have been open longer than the maximum; then
we delete any history from before the start of the longest-running
transaction remaining in the list.
Reported-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Thomas Sanders <thomas.sanders@citrix.com>
Reviewed-by: Jonathan Davies <jonathan.davies@citrix.com>
Reviewed-by: Christian Lindig <christian.lindig@citrix.com>
commit 9a2c5b42ad29ea731ed95d7aae5b59df1c526eb3
Author: Jonathan Davies <jonathan.davies@citrix.com>
Date: Thu Mar 23 14:20:33 2017 +0000
oxenstored: only record operations with side-effects in history
There is no need to record "read" operations as they will never cause another
transaction to fail.
Reported-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Jonathan Davies <jonathan.davies@citrix.com>
Reviewed-by: Thomas Sanders <thomas.sanders@citrix.com>
commit 567051b61858424ec8725efe23641d12ee69791c
Author: Jonathan Davies <jonathan.davies@citrix.com>
Date: Tue Mar 14 13:20:07 2017 +0000
oxenstored: support commit history tracking
Add ability to track xenstore tree operations -- either non-transactional
operations or committed transactions.
For now, the call to actually retain commits is commented out because history
can grow without bound.
For now, we call record_commit for all non-transactional operations. A
subsequent patch will make it retain only the ones with side-effects.
Reported-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Jonathan Davies <jonathan.davies@citrix.com>
Signed-off-by: Thomas Sanders <thomas.sanders@citrix.com>
Reviewed-by: Christian Lindig <christian.lindig@citrix.com>
commit 4f4596a0e90ebf7ed971b1949244e3b2cbed5d11
Author: Jonathan Davies <jonathan.davies@citrix.com>
Date: Tue Mar 14 12:17:38 2017 +0000
oxenstored: add transaction info relevant to history-tracking
Specifically:
* retain the original store (not just the root) in full transactions
* store commit count at the time of the start of the transaction
Reported-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Jonathan Davies <jonathan.davies@citrix.com>
Reviewed-by: Thomas Sanders <thomas.sanders@citrix.com>
Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Christian Lindig <christian.lindig@citrix.com>
commit b795db0e3d04dff4fd31b380eb7dbc58c8926964
Author: Thomas Sanders <thomas.sanders@citrix.com>
Date: Tue Mar 14 12:15:52 2017 +0000
oxenstored: ignore domains with no conflict-credit
When processing connections, skip those from domains with no remaining
conflict-credit.
Also, issue a point of conflict-credit at regular intervals, the
period being set by the configuration option "conflict-max-history-
seconds". When issuing conflict-credit, we give a point either to
every domain at once (one each) or only to the single domain at the
front of the queue, depending on the configuration option
"conflict-rate-limit-is-aggregate".
Reported-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Thomas Sanders <thomas.sanders@citrix.com>
Reviewed-by: Jonathan Davies <jonathan.davies@citrix.com>
Reviewed-by: Christian Lindig <christian.lindig@citrix.com>
commit 6636c70b369ada87f08bcb1810d0715687bc1fe8
Author: Thomas Sanders <thomas.sanders@citrix.com>
Date: Tue Mar 14 12:15:52 2017 +0000
oxenstored: handling of domain conflict-credit
This commit gives each domain a conflict-credit variable, which will
later be used for limiting how often a domain can cause other domain's
transaction-commits to fail.
This commit also provides functions and data for manipulating domains
and their conflict-credit, and checking whether they have credit.
Reported-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Thomas Sanders <thomas.sanders@citrix.com>
Reviewed-by: Jonathan Davies <jonathan.davies@citrix.com>
Reviewed-by: Christian Lindig <christian.lindig@citrix.com>
commit f2c7ab1f47ea58b7bd397c42185e93ed1f162ac5
Author: Thomas Sanders <thomas.sanders@citrix.com>
Date: Tue Mar 14 12:15:52 2017 +0000
oxenstored: comments explaining some variables
It took a while of reading and reasoning to work out what these are
for, so here are comments to make life easier for everyone reading
this code in future.
Reported-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Thomas Sanders <thomas.sanders@citrix.com>
Reviewed-by: Jonathan Davies <jonathan.davies@citrix.com>
Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Christian Lindig <christian.lindig@citrix.com>
commit f3b7100424200938edc49c463e8aa1b8b73f2778
Author: Ian Jackson <ian.jackson@eu.citrix.com>
Date: Tue Mar 7 16:09:13 2017 +0000
xenstored: Log when the write transaction rate limit bites
Reported-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
plus:
xenstore: dont increment bool variable
Instead of incrementing a bool variable just set it to true.
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
commit 4cd02a2513dc224e343eaa8a88418a14ade092b3
Author: Ian Jackson <ian.jackson@eu.citrix.com>
Date: Tue Mar 7 16:09:12 2017 +0000
xenstored: apply a write transaction rate limit
This avoids a rogue client being about to stall another client (eg the
toolstack) indefinitely.
This is XSA-206.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Backported to 4.8 (not entirely trivial).
Reported-by: Juergen Gross <jgross@suse.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
commit e0354e65fec21a51e573bf82ef930cb97ed11c96
Author: Paul Durrant <paul.durrant@citrix.com>
Date: Wed Feb 22 13:27:34 2017 +0000
tools/libxenctrl: fix error check after opening libxenforeignmemory
Checking the value of xch->xcall is clearly incorrect. The code should be
checking xch->fmem (i.e. the return of the previously called function).
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 80a7d04f532ddc3500acd7988917708a536ae15f)
commit a085f0ca12a3db203f9dcfc96dc3722d0f0f3fbf
Author: Juergen Gross <jgross@suse.com>
Date: Wed Feb 15 12:11:12 2017 +0100
libxl: correct xenstore entry for empty cdrom
Specifying an empty cdrom device will result in a Xenstore entry
params = aio:(null)
as the physical device path isn't existing. This lets a domain booted
via OVMF hang as OVMF is checking for "aio:" only in order to detect
the empty cdrom case.
Use an empty string for the physical device path in this case. As a
cdrom device for HVM is always backed by qdisk we only need to cover this
backend.
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
commit ec7f9e1df2aa6cf8376d26eafca554c6521d2e7c
Author: Juergen Gross <jgross@suse.com>
Date: Tue Apr 4 14:55:55 2017 +0200
x86: use 64 bit mask when masking away mfn bits
When using _PAGE_PSE_PAT as base for a negated bit mask make sure it is
propagated to 64 bits when applied to a 64 bit value.
There seems to be only one place where this is a problem, so fix this
by casting _PAGE_PSE_PAT to 64 bits there.
Not doing so will probably lead to problems on hosts with more than
16 TB of memory.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
master commit: 4edb1a42e3320757e3559f17edf6903bc1777de3
master date: 2017-03-30 15:11:24 +0200
commit 06403aa5f28bf697051de0435ef942f4c0d25849
Author: Jan Beulich <jbeulich@suse.com>
Date: Tue Apr 4 14:55:00 2017 +0200
memory: properly check guest memory ranges in XENMEM_exchange handling
The use of guest_handle_okay() here (as introduced by the XSA-29 fix)
is insufficient here, guest_handle_subrange_okay() needs to be used
instead.
Note that the uses are okay in
- XENMEM_add_to_physmap_batch handling due to the size field being only
16 bits wide,
- livepatch_list() due to the limit of 1024 enforced on the
number-of-entries input (leaving aside the fact that this can be
called by a privileged domain only anyway),
- compat mode handling due to counts there being limited to 32 bits,
- everywhere else due to guest arrays being accessed sequentially from
index zero.
This is CVE-2017-7228 / XSA-212.
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 938fd2586eb081bcbd694f4c1f09ae6a263b0d90
master date: 2017-04-04 14:47:46 +0200
commit f3623bdbe5f7ff63e728865a8b986b2312231685
Author: Dario Faggioli <dario.faggioli@citrix.com>
Date: Fri Mar 31 08:33:20 2017 +0200
xen: sched: don't call hooks of the wrong scheduler via VCPU2OP
Within context_saved(), we call the context_saved hook,
and we use VCPU2OP() to determine from what scheduler.
VCPU2OP uses DOM2OP, which uses d->cpupool, which is
NULL when d is the idle domain. And in that case,
DOM2OP just returns ops, the scheduler of cpupool0.
Therefore, if:
- cpupool0's scheduler defines context_saved (like
Credit2 and RTDS do),
- we are not in cpupool0 (i.e., our scheduler is
not ops),
- we are context switching from idle,
we call VCPU2OP(idle_vcpu), which means
DOM2OP(idle->cpupool), which is ops.
Therefore, we both:
- check if context_saved is defined in the wrong
scheduler;
- if yes, call the wrong one.
When using Credit2 at boot, and also Credit2 in
the other cpupool, this is wrong but innocuous,
because it only involves the idle vcpus.
When using Credit2 at boot, and Credit1 in the
other cpupool, this is *totally* wrong, and
it's by chance it does not explode!
When using Credit2 and other schedulers I'm
developping, I hit the following assert (in
sched_credit2.c, on a CPU inside a cpupool that
does not use Credit2):
csched2_context_saved()
{
...
ASSERT(!vcpu_on_runq(svc));
...
}
Fix this by dealing explicitly, in VCPU2OP, with
idle vcpus, returning the scheduler of the pCPU
they (always) run on.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
master commit: a3653e6a279213ba4e883b2252415dc98633106a
master date: 2017-03-27 14:28:05 +0100
commit c95bad938f77a863f46bbce6cad74012714776bb
Author: Jan Beulich <jbeulich@suse.com>
Date: Fri Mar 31 08:32:51 2017 +0200
x86/EFI: avoid Xen image when looking for module/kexec position
When booting straight from EFI, we don't further try to relocate Xen.
As a result, so far we also didn't avoid the area Xen uses when looking
for a location to put modules or the kexec area. Introduce a fake
module slot to deal with that without having to fiddle with a lot of
code.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: e22e1c47958a4778cd7baa3980f74e52f525ba28
master date: 2017-03-20 09:27:12 +0100
commit 4ec1cb0b01332c0bbf0e4d232c1e33390ae1a95c
Author: Jan Beulich <jbeulich@suse.com>
Date: Fri Mar 31 08:32:22 2017 +0200
x86/EFI: avoid IOMMU faults on [_end,__2M_rwdata_end)
Commit c9a4a1c419 ("x86/layout: Correct Xen's idea of its own memory
layout") didn't go far enough with the conversion, causing IOMMU faults
when memory from that range was handed to a domain. We must not make
this memory available for allocation (the change is benign to xen.gz at
this point in time).
Note that the change to tboot_shutdown() is fixing another issue at
once: As it looks, the function so far skipped all memory below the Xen
image.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: d522571a408a7dd21a06705f6dd51cdafd2db4fc
master date: 2017-03-20 09:25:36 +0100
commit 093a1f1b1c894e397f8fe82a1d69d486e4ade33f
Author: Jan Beulich <jbeulich@suse.com>
Date: Fri Mar 31 08:31:53 2017 +0200
x86/EFI: avoid overrunning mb_modules[]
Commit 436fb462ab ("x86/microcode: enable boot time (pre-Dom0)
loading") added a 4th module without providing an array slot for it.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 02b37b7eff39e40828041b2fe480725ab8443258
master date: 2017-03-17 15:45:22 +0100
commit 47501b612494b98318079416a25ed6690c41deb1
Author: Roger Pau Monné <roger.pau@citrix.com>
Date: Fri Mar 31 08:31:14 2017 +0200
build/clang: fix XSM dummy policy when using clang 4.0
There seems to be some weird bug in clang 4.0 that prevents xsm_pmu_op from
working as expected, and vpmu.o ends up with a reference to
__xsm_action_mismatch_detected which makes the build fail:
[...]
ld -melf_x86_64_fbsd -T xen.lds -N prelink.o \
xen/common/symbols-dummy.o -o xen/.xen-syms.0
prelink.o: In function `xsm_default_action':
xen/include/xsm/dummy.h:80: undefined reference to `__xsm_action_mismatch_detected'
xen/xen/include/xsm/dummy.h:80: relocation truncated to fit: R_X86_64_PC32 against undefined symbol `__xsm_action_mismatch_detected'
ld: xen/xen/.xen-syms.0: hidden symbol `__xsm_action_mismatch_detected' isn't defined
Then doing a search in the objects files:
# find xen/ -type f -name '*.o' -print0 | xargs -0 bash -c \
'for filename; do nm "$filename" | \
grep -q __xsm_action_mismatch_detected && echo "$filename"; done' bash
xen/arch/x86/prelink.o
xen/arch/x86/cpu/vpmu.o
xen/arch/x86/cpu/built_in.o
xen/arch/x86/built_in.o
The current patch is the only way I've found to fix this so far, by simply
moving the XSM_PRIV check into the default case in xsm_pmu_op. This also fixes
the behavior of do_xenpmu_op, which will now return -EINVAL for unknown
XENPMU_* operations, instead of -EPERM when called by a privileged domain.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
master commit: 9e4d116faff4545a7f21c2b01008e94d68e6db58
master date: 2017-03-14 18:19:29 +0100
commit 2859b25a3ba9ba4eff6dba8d6e60dd9520ebbdb4
Author: Roger Pau Monné <roger.pau@citrix.com>
Date: Fri Mar 31 08:28:49 2017 +0200
x86: drop unneeded __packed attributes
There where a couple of unneeded packed attributes in several x86-specific
structures, that are obviously aligned. The only non-trivial one is
vmcb_struct, which has been checked to have the same layout with and without
the packed attribute using pahole. In that case add a build-time size check to
be on the safe side.
No functional change is expected as a result of this commit.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
master commit: 4036e7c592905c2292cdeba8269e969959427237
master date: 2017-03-07 17:11:06 +0100
commit ca41491f0507150139fc35ff6c9f076fdbe9487b
Author: Stefano Stabellini <sstabellini@kernel.org>
Date: Wed Mar 29 11:32:34 2017 -0700
arm: xen_size should be paddr_t for consistency
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Julien Grall <julien.grall@arm.com>
commit 26dec7af0d019ea0ace95421b756235a552a7877
Author: Wei Chen <Wei.Chen@arm.com>
Date: Mon Mar 27 16:40:50 2017 +0800
xen/arm: alternative: Register re-mapped Xen area as a temporary virtual region
While I was using the alternative patching in the SErrors patch series [1].
I used a branch instruction as alternative instruction.
ALTERNATIVE("nop",
"b skip_check",
SKIP_CHECK_PENDING_VSERROR)
Unfortunately, I got a system panic message with this code:
(XEN) build-id: f64081d86e7e88504b7d00e1486f25751c004e39
(XEN) alternatives: Patching with alt table 100b9480 -> 100b9498
(XEN) Xen BUG at alternative.c:61
(XEN) ----[ Xen-4.9-unstable arm32 debug=y Tainted: C ]----
(XEN) CPU: 0
(XEN) PC: 00252b68 alternative.c#__apply_alternatives+0x128/0x1d4
(XEN) CPSR: 800000da MODE:Hypervisor
(XEN) R0: 00000000 R1: 00000000 R2: 100b9490 R3: 100b949c
(XEN) R4: eafeff84 R5: 00000000 R6: 100b949c R7: 10079290
(XEN) R8: 100792ac R9: 00000001 R10:100b948c R11:002cfe04 R12:002932c0
(XEN) HYP: SP: 002cfdc4 LR: 00239128
(XEN)
(XEN) VTCR_EL2: 80003558
(XEN) VTTBR_EL2: 0000000000000000
(XEN)
(XEN) SCTLR_EL2: 30cd187f
(XEN) HCR_EL2: 000000000038663f
(XEN) TTBR0_EL2: 00000000bff09000
(XEN)
(XEN) ESR_EL2: 00000000
(XEN) HPFAR_EL2: 0000000000000000
(XEN) HDFAR: 00000000
(XEN) HIFAR: 00000000
(XEN)
(XEN) Xen stack trace from sp=002cfdc4:
(XEN) 00000000 00294328 002e0004 00000001 10079290 002cfe14 100b9490 00000000
(XEN) 10010000 10122700 00200000 002cfe1c 00000080 00252c14 00000000 002cfe64
(XEN) 00252dd8 00000007 00000000 000bfe00 100b9480 100b9498 002cfe1c 002cfe1c
(XEN) 10010000 10122700 00000000 00000000 00000000 00000000 00000000 00000000
(XEN) 00000000 00000000 00000000 002ddf30 00000000 003113e8 0030f018 002cfe9c
(XEN) 00238914 00000002 00000000 00000000 00000000 0028b000 00000002 00293800
(XEN) 00000002 0030f238 00000002 00290640 00000001 002cfea4 002a2840 002cff54
(XEN) 002a65fc 11112131 10011142 00000000 0028d194 00000000 00000000 00000000
(XEN) bdffb000 80000000 00000000 c0000000 00000000 00000002 00000000 c0000000
(XEN) 002b8060 00002000 002b8040 00000000 c0000000 bc000000 00000000 c0000000
(XEN) 00000000 be000000 00000000 00112701 00000000 bff12701 00000000 00000000
(XEN) 00000000 00000000 00000000 00000000 00000018 00000000 00000001 00000000
(XEN) 9fece000 80200000 80000000 00400000 00200550 00000000 00000000 00000000
(XEN) 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
(XEN) 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
(XEN) 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
(XEN) 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
(XEN) 00000000 00000000 00000000 00000000 00000000 00000000 00000000
(XEN) Xen call trace:
(XEN) [<00252b68>] alternative.c#__apply_alternatives+0x128/0x1d4 (PC)
(XEN) [<00239128>] is_active_kernel_text+0x10/0x28 (LR)
(XEN) [<00252dd8>] alternative.c#__apply_alternatives_multi_stop+0x1c4/0x204
(XEN) [<00238914>] stop_machine_run+0x1e8/0x254
(XEN) [<002a2840>] apply_alternatives_all+0x38/0x54
(XEN) [<002a65fc>] start_xen+0xcf4/0xf88
(XEN) [<00200550>] arm32/head.o#paging+0x94/0xd8
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Xen BUG at alternative.c:61
(XEN) ****************************************
This panic was triggered by the BUG(); in branch_insn_requires_update.
That's because in this case the alternative patching needs to update the
offset of the branch instruction. But the new target address of the branch
instruction could not pass the check of is_active_kernel_text();
The reason is that: When Xen is booting, it will call apply_alternatives_all
to do patching with alternative tables. In this progress, we should update
the offset of branch instructions if required. This means we should modify
the Xen text section. But Xen text section is marked as read-only and we
configure the hardware to not allow a region to be writable and executable at
the same time. So we re-map Xen in a temporary area for writing. In this case,
the calculation of the new target address of the branch instruction is based
on this re-mapped area. The new target address will point to a value in the
re-mapped area. But we haven't registered this area as an active kernel text.
So the check of is_active_kernel_text will always return false.
We have to register the re-mapped Xen area as a virtual region temporarily to
solve this problem.
1. https://lists.xenproject.org/archives/html/xen-devel/2017-03/msg01939.html
Signed-off-by: Wei Chen <Wei.Chen@arm.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
commit eca97a466dc8d8f99fbff8f51a117d6e8255ecdc
Author: Ian Jackson <ian.jackson@eu.citrix.com>
Date: Tue Mar 21 18:44:24 2017 +0000
QEMU_TAG update
commit c75fe6473b73705c9b9f7d8ecc3d043afef55727
Author: Stefano Stabellini <sstabellini@kernel.org>
Date: Fri Feb 10 18:05:22 2017 -0800
arm: read/write rank->vcpu atomically
We don't need a lock in vgic_get_target_vcpu anymore, solving the
following lock inversion bug: the rank lock should be taken first, then
the vgic lock. However, gic_update_one_lr is called with the vgic lock
held, and it calls vgic_get_target_vcpu, which tries to obtain the rank
lock.
Coverity-ID: 1381855
Coverity-ID: 1381853
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Julien Grall <julien.grall@arm.com>
commit af18ca94f3fdbea87687c07ba532269dbb483e64
Author: Julien Grall <julien.grall@arm.com>
Date: Wed Mar 8 18:06:02 2017 +0000
xen/arm: p2m: Perform local TLB invalidation on vCPU migration
The ARM architecture allows an OS to have per-CPU page tables, as it
guarantees that TLBs never migrate from one CPU to another.
This works fine until this is done in a guest. Consider the following
scenario:
- vcpu-0 maps P to V
- vpcu-1 maps P' to V
If run on the same physical CPU, vcpu-1 can hit in TLBs generated by
vcpu-0 accesses, and access the wrong physical page.
The solution to this is to keep a per-p2m map of which vCPU ran the last
on each given pCPU and invalidate local TLBs if two vPCUs from the same
VM run on the same CPU.
Unfortunately it is not possible to allocate per-cpu variable on the
fly. So for now the size of the array is NR_CPUS, this is fine because
we still have space in the structure domain. We may want to add an
helper to allocate per-cpu variable in the future.
Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
commit 30c2dd762bcf938475632e28fcbd8d6592a71d5d
Author: Julien Grall <julien.grall@arm.com>
Date: Wed Mar 8 18:06:01 2017 +0000
xen/arm: Introduce INVALID_VCPU_ID
Define INVALID_VCPU_ID as MAX_VIRT_CPUS to avoid casting problem later
on. At the moment it can always fit in uint8_t.
Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
commit 1780ea794780cf410fcb857d83add72ee088ff6e
Author: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
Date: Mon Feb 1 14:56:13 2016 +0530
xen/arm: Set nr_cpu_ids to available number of cpus
nr_cpu_ids for arm platforms is incorrectly set to NR_CPUS
irrespective of the number of cpus supported by platform.
Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
Reviewed-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
commit 42290f02715e62bfe9edf32daac1b224758b7ae4
Author: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Date: Thu Jan 26 14:16:02 2017 +0100
xen/arm: acpi: Relax hw domain mapping attributes to p2m_mmio_direct_c
Since the hardware domain is a trusted domain, we extend the
trust to include making final decisions on what attributes to
use when mapping memory regions.
For ACPI configured hardware domains, this patch relaxes the hardware
domains mapping attributes to p2m_mmio_direct_c. This will allow the
hardware domain to control the attributes via its S1 mappings.
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
commit bd684c2d0aae7edc587f8dfd3dbffef739c853e4
Author: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Date: Thu Jan 26 14:16:01 2017 +0100
Revert "xen/arm: Map mmio-sram nodes as un-cached memory"
This reverts commit 1e75ed8b64bc1a9b47e540e6f100f17ec6d97f1b.
The default attribute mapping for MMIO as been relaxed and now rely on
the hardware domain to set the correct memory attribute
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
commit 783b67073f4e0348af617a1f470f991814254ae2
Author: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Date: Thu Jan 26 14:16:00 2017 +0100
xen/arm: dt: Relax hw domain mapping attributes to p2m_mmio_direct_c
Since the hardware domain is a trusted domain, we extend the
trust to include making final decisions on what attributes to
use when mapping memory regions.
For device-tree configured hardware domains, this patch relaxes
the hardware domains mapping attributes to p2m_mmio_direct_c.
This will allow the hardware domain to control the attributes
via its S1 mappings.
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
commit 07f9ddfc9abe9d25288168dfe3c4b830b416f33b
Author: Tamas K Lengyel <tamas.lengyel@zentific.com>
Date: Fri Jan 27 11:25:23 2017 -0700
xen/arm: flush icache as well when XEN_DOMCTL_cacheflush is issued
When the toolstack modifies memory of a running ARM VM it may happen
that the underlying memory of a current vCPU PC is changed. Without
flushing the icache the vCPU may continue executing stale instructions.
Also expose the xc_domain_cacheflush through xenctrl.h.
Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
commit d31d0cd810b038f4711553d07b26aee6f4b80934
Author: Stefano Stabellini <sstabellini@kernel.org>
Date: Wed Dec 21 18:15:10 2016 -0800
xen/arm: fix GIC_INVALID_LR
GIC_INVALID_LR should be 0xff, but actually, defined as ~(uint8_t)0, is
0xffffffff. Fix the problem by placing the ~ operator before the cast.
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Julien Grall <julien.grall@arm.com>
commit b2e678e81dd9635eb33279e2817168d13b78c1fa
Author: Stefano Stabellini <sstabellini@kernel.org>
Date: Thu Dec 8 17:17:04 2016 -0800
fix out of bound access to mode_strings
mode == ARRAY_SIZE(mode_strings) causes an out of bound access to
the mode_strings array.
Coverity-ID: 1381859
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Julien Grall <julien.grall@arm.com>
commit 05946b58420c693748366b7c6f71ec2ec2456242
Author: Stefano Stabellini <sstabellini@kernel.org>
Date: Thu Dec 8 16:59:28 2016 -0800
missing vgic_unlock_rank in gic_remove_irq_from_guest
Add missing vgic_unlock_rank on the error path in
gic_remove_irq_from_guest.
Coverity-ID: 1381843
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Julien Grall <julien.grall@arm.com>
commit e020ff3fff796459399015460929edefa8c94568
Author: Artem Mygaiev <artem_mygaiev@epam.com>
Date: Tue Dec 6 16:16:45 2016 +0200
xen/arm: Fix macro for ARM Jazelle CPU feature identification
Fix macro for ARM Jazelle CPU feature identification: value of 0 indicates
that CPU does not support ARM Jazelle (ID_PFR0[11:8])
Coverity-ID: 1381849
Signed-off-by: Artem Mygaiev <artem_mygaiev@epam.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Julien Grall <julien.grall@arm.com>
commit 308c646ee6f11fa87d67343005942a3186a69206
Author: Julien Grall <julien.grall@arm.com>
Date: Mon Dec 5 17:43:23 2016 +0000
xen/arm: traps: Emulate ICC_SRE_EL1 as RAZ/WI
Recent Linux kernel (4.4 and onwards [1]) is checking whether it is possible
to enable sysreg access (ICC_SRE_EL1.SRE) when the ID register
(ID_AA64PRF0_EL1.GIC) is reporting the presence of the sysreg interface.
When the guest has been configured to use GICv2, the hypervisor will
disable sysreg access for this vm (via ICC_SRE_EL2.Enable) and therefore
access to system register such as ICC_SRE_EL1 are trapped in EL2.
However, ICC_SRE_EL1 is not emulated by the hypervisor. This means that
Linux will crash as soon as it is trying to access ICC_SRE_EL1.
To solve this problem, Xen can implement ICC_SRE_EL1 as read-as-zero
write-ignore. The emulation will only be used when sysreg are disabled
for EL1.
[1] 963fcd409 "arm64: cpufeatures: Check ICC_EL1_SRE.SRE before
enabling ARM64_HAS_SYSREG_GIC_CPUIF"
Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
commit fceae911f6e7af87cd31321385d779b47eff1857
Author: Artem Mygaiev <artem_mygaiev@epam.com>
Date: Wed Nov 30 15:53:11 2016 +0200
xen/arm: Fix misplaced parentheses for PSCI version check
Fix misplaced parentheses for PSCI version check
Signed-off-by: Artem Mygaiev <artem_mygaiev@epam.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
commit f66739326c9de51acc15e8b6b335b3781b4e3f48
Author: Oleksandr Tyshchenko <olekstysh@gmail.com>
Date: Fri Dec 2 18:38:16 2016 +0200
arm/irq: Reorder check when the IRQ is already used by someone
Call irq_get_domain for the IRQ we are interested in
only after making sure that it is the guest IRQ to avoid
ASSERT(test_bit(_IRQ_GUEST, &desc->status)) triggering.
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
commit 768b250b31361bf8acfef4b7dca61ee37c91f3f6
Author: Jun Sun <jsun@junsun.net>
Date: Mon Oct 10 12:27:56 2016 -0700
Don't clear HCR_VM bit when updating VTTBR.
Currently function p2m_restore_state() would clear HCR_VM bit, i.e.,
disabling stage2 translation, before updating VTTBR register. After
some research and talking to ARM support, I got confirmed that this is not
necessary. We are currently working on a new platform that would need this
to be removed.
The patch is tested on FVP foundation model.
Signed-off-by: Jun Sun <jsun@junsun.net>
Acked-by: Steve Capper <steve.capper@linaro.org>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
commit 049b13dce84655cd73ac4acc051e7df46af00a4f
Author: Andrew Cooper <andrew.cooper3@citrix.com>
Date: Tue Mar 14 12:43:25 2017 +0100
x86/emul: Correct the decoding of mov to/from cr/dr
The mov to/from cr/dr behave as if they were encoded with Mod = 3. When
encoded with Mod != 3, no displacement or SIB bytes are fetched.
Add a test with a deliberately malformed ModRM byte. (Also add the
automatically-generated simd.h to .gitignore.)
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: c2e316b2f220af06dab76b1219e61441c31f6ff9
master date: 2017-03-07 17:29:16 +0000
commit e26a2a00169bad403c9dcc597218080626cee861
Author: Jan Beulich <jbeulich@suse.com>
Date: Tue Mar 14 12:42:58 2017 +0100
x86emul: correct decoding of vzero{all,upper}
These VEX encoded insns aren't followed by a ModR/M byte.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 26735f30dffe1091686bbe921aacbea8ba371cc8
master date: 2017-03-02 16:08:27 +0100
commit 866f3636f832ecae0260b04e90b8de432aaa3129
Author: Dario Faggioli <dario.faggioli@citrix.com>
Date: Tue Mar 14 12:42:19 2017 +0100
xen: credit2: don't miss accounting while doing a credit reset.
A credit reset basically means going through all the
vCPUs of a runqueue and altering their credits, as a
consequence of a 'scheduling epoch' having come to an
end.
Blocked or runnable vCPUs are fine, all the credits
they've spent running so far have been accounted to
them when they were scheduled out.
But if a vCPU is running on a pCPU, when a reset event
occurs (on another pCPU), that does not get properly
accounted. Let's therefore begin to do so, for better
accuracy and fairness.
In fact, after this patch, we see this in a trace:
csched2:schedule cpu 10, rq# 1, busy, not tickled
csched2:burn_credits d1v5, credit = 9998353, delta = 202996
runstate_continue d1v5 running->running
...
csched2:schedule cpu 12, rq# 1, busy, not tickled
csched2:burn_credits d1v6, credit = -1327, delta = 9999544
csched2:reset_credits d0v13, credit_start = 10500000, credit_end = 10500000, mult = 1
csched2:reset_credits d0v14, credit_start = 10500000, credit_end = 10500000, mult = 1
csched2:reset_credits d0v7, credit_start = 10500000, credit_end = 10500000, mult = 1
csched2:burn_credits d1v5, credit = 201805, delta = 9796548
csched2:reset_credits d1v5, credit_start = 201805, credit_end = 10201805, mult = 1
csched2:burn_credits d1v6, credit = -1327, delta = 0
csched2:reset_credits d1v6, credit_start = -1327, credit_end = 9998673, mult = 1
Which shows how d1v5 actually executed for ~9.796 ms,
on pCPU 10, when reset_credit() is executed, on pCPU
12, because of d1v6's credits going below 0.
Without this patch, this 9.796ms are not accounted
to anyone. With this patch, d1v5 is charged for that,
and its credits drop down from 9796548 to 201805.
And this is important, as it means that it will
begin the new epoch with 10201805 credits, instead
of 10500000 (which he would have, before this patch).
Basically, we were forgetting one round of accounting
in epoch x, for the vCPUs that are running at the time
the epoch ends. And this meant favouring a little bit
these same vCPUs, in epoch x+1, providing them with
the chance of execute longer than their fair share.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
master commit: 4fa4f8a3cd5afd4980ad9517755d002dc316abdc
master date: 2017-03-01 16:56:34 +0000
commit 354c3e4c728b5e8f04dc8d9eabfa316e7823cbc5
Author: Dario Faggioli <dario.faggioli@citrix.com>
Date: Tue Mar 14 12:41:54 2017 +0100
xen: credit2: always mark a tickled pCPU as... tickled!
In fact, whether or not a pCPU has been tickled, and is
therefore about to re-schedule, is something we look at
and base decisions on in various places.
So, let's make sure that we do that basing on accurate
information.
While there, also tweak a little bit smt_idle_mask_clear()
(used for implementing SMT support), so that it only alter
the relevant cpumask when there is the actual need for this.
(This is only for reduced overhead, behavior remains the
same).
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
master commit: a76645240bd14e964e85dbc975a8989edea6aa27
master date: 2017-03-01 16:56:34 +0000
commit 8c2da8f4649bf5e29b6f3338132e36369e8f5700
Author: Andrew Cooper <andrew.cooper3@citrix.com>
Date: Tue Mar 14 12:41:21 2017 +0100
x86/layout: Correct Xen's idea of its own memory layout
c/s b4cd59fe "x86: reorder .data and .init when linking" had an unintended
side effect, where xen_in_range() and the tboot S3 MAC were no longer correct.
In practice, it means that Xen's .data section is excluded from consideration,
which means:
1) Default IOMMU construction for the hardware domain could create mappings.
2) .data isn't included in the tboot MAC checked on resume from S3.
Adjust the comments and virtual address anchors used to define the regions.
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: c9a4a1c419cebac83a8fb60c4532ad8ccc973dc4
master date: 2017-02-28 16:18:38 +0000
commit 6289c3b7c4756bca341ba59e4e246706040f7919
Author: Andrew Cooper <andrew.cooper3@citrix.com>
Date: Tue Mar 14 12:40:36 2017 +0100
x86/vmx: Don't leak host syscall MSR state into HVM guests
hvm_hw_cpu->msr_flags is in fact the VMX dirty bitmap of MSRs needing to be
restored when switching into guest context. It should never have been part of
the migration state to start with, and Xen must not make any decisions based
on the value seen during restore.
Identify it as obsolete in the header files, consistently save it as zero and
ignore it on restore.
The MSRs must be considered dirty during VMCS creation to cause the proper
defaults of 0 to be visible to the guest.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
master commit: 2f1add6e1c8789d979daaafa3d80ddc1bc375783
master date: 2017-02-21 11:06:39 +0000
commit 2e68fda962226d4de916d5ceab9d9d6037d94d45
Author: Stefano Stabellini <sstabellini@kernel.org>
Date: Thu Mar 2 17:15:26 2017 -0800
xen/arm: fix affected memory range by dcache clean functions
clean_dcache_va_range and clean_and_invalidate_dcache_va_range don't
calculate the range correctly when "end" is not cacheline aligned. As a
result, the last cacheline is not skipped. Fix the issue by aligning the
start address to the cacheline size.
In addition, make the code simpler and faster in
invalidate_dcache_va_range, by removing the module operation and using
bitmasks instead. Also remove the size adjustments in
invalidate_dcache_va_range, because the size variable is not used later
on.
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Tested-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
commit f85fc979a6859541dc1bf583817ca5cce9287e1e
Author: Stefano Stabellini <sstabellini@kernel.org>
Date: Wed Mar 1 11:43:15 2017 -0800
xen/arm: introduce vwfi parameter
Introduce new Xen command line parameter called "vwfi", which stands for
virtual wfi. The default is "trap": Xen traps guest wfi and wfe
instructions. In the case of wfi, Xen calls vcpu_block on the guest
vcpu; in the case of guest wfe, Xen calls vcpu_yield on the guest vcpu.
The behavior can be changed by setting vwfi to "native", in that case
Xen doesn't trap neither wfi nor wfe, running them in guest context.
The result is strong reduction in irq latency (from 5000ns to 2000ns,
measured using https://github.com/edgarigl/tbm, the physical timer, and
1 pcpu dedicated to 1 vcpu). The downside is that the scheduler thinks
that the guest is busy when actually is sleeping, leading to suboptimal
scheduling decisions.
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
commit 9967251965a4cea19e6f69f7c5472174c4c5b971
Author: Julien Grall <julien.grall@arm.com>
Date: Fri Feb 24 10:01:59 2017 +0100
arm/p2m: remove the page from p2m->pages list before freeing it
The p2m code is using the page list field to link all the pages used
for the stage-2 page tables. The page is added into the p2m->pages
list just after the allocation but never removed from the list.
The page list field is also used by the allocator, not removing may
result a later Xen crash due to inconsistency (see [1]).
This bug was introduced by the reworking of p2m code in commit 2ef3e36ec7
"xen/arm: p2m: Introduce p2m_set_entry and __p2m_set_entry".
[1] https://lists.xenproject.org/archives/html/xen-devel/2017-02/msg00524.html
Reported-by: Vijaya Kumar K <Vijaya.Kumar@cavium.com>
Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
master commit: cf5e1a74b9687be3d146e59ab10c26be6da9d0d4
master date: 2017-02-24 09:58:50 +0100
commit 34305da2df62c67a559c20d22bdd25b549bfd1d8
Author: Ian Jackson <ian.jackson@eu.citrix.com>
Date: Wed Feb 22 16:26:41 2017 +0000
QEMU_TAG update
commit 437a8e63adb3b2f819dd11557e65d9cda331c9b1
Author: Jan Beulich <jbeulich@suse.com>
Date: Mon Feb 20 15:58:02 2017 +0100
VMX: fix VMCS race on context-switch paths
When __context_switch() is being bypassed during original context
switch handling, the vCPU "owning" the VMCS partially loses control of
it: It will appear non-running to remote CPUs, and hence their attempt
to pause the owning vCPU will have no effect on it (as it already
looks to be paused). At the same time the "owning" CPU will re-enable
interrupts eventually (the lastest when entering the idle loop) and
hence becomes subject to IPIs from other CPUs requesting access to the
VMCS. As a result, when __context_switch() finally gets run, the CPU
may no longer have the VMCS loaded, and hence any accesses to it would
fail. Hence we may need to re-load the VMCS in vmx_ctxt_switch_from().
For consistency use the new function also in vmx_do_resume(), to
avoid leaving an open-coded incarnation of it around.
Reported-by: Kevin Mayer <Kevin.Mayer@gdata.de>
Reported-by: Anshul Makkar <anshul.makkar@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Tested-by: Sergey Dyasli <sergey.dyasli@citrix.com>
master commit: 2f4d2198a9b3ba94c959330b5c94fe95917c364c
master date: 2017-02-17 15:49:56 +0100
commit 9028ba82efca076609d11f33ed6fa2a636ae9e58
Author: George Dunlap <george.dunlap@citrix.com>
Date: Mon Feb 20 15:57:37 2017 +0100
xen/p2m: Fix p2m_flush_table for non-nested cases
Commit 71bb7304e7a7a35ea6df4b0cedebc35028e4c159 added flushing of
nested p2m tables whenever the host p2m table changed. Unfortunately
in the process, it added a filter to p2m_flush_table() function so
that the p2m would only be flushed if it was being used as a nested
p2m. This meant that the p2m was not being flushed at all for altp2m
callers.
Only check np2m_base if p2m_class for nested p2m's.
NB that this is not a security issue: The only time this codepath is
called is in cases where either nestedp2m or altp2m is enabled, and
neither of them are in security support.
Reported-by: Matt Leinhos <matt@starlab.io>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Tested-by: Tamas K Lengyel <tamas@tklengyel.com>
master commit: 6192e6378e094094906950120470a621d5b2977c
master date: 2017-02-15 17:15:56 +0000
commit 1c28394aaab9727f5ce9c5f53e8617c50687d0dc
Author: David Woodhouse <dwmw@amazon.com>
Date: Mon Feb 20 15:56:48 2017 +0100
x86/ept: allow write-combining on !mfn_valid() MMIO mappings again
For some MMIO regions, such as those high above RAM, mfn_valid() will
return false.
Since the fix for XSA-154 in commit c61a6f74f80e ("x86: enforce
consistent cachability of MMIO mappings"), guests have no longer been
able to use PAT to obtain write-combining on such regions because the
'ignore PAT' bit is set in EPT.
We probably want to err on the side of caution and preserve that
behaviour for addresses in mmio_ro_ranges, but not for normal MMIO
mappings. That necessitates a slight refactoring to check mfn_valid()
later, and let the MMIO case get through to the right code path.
Since we're not bailing out for !mfn_valid() immediately, the range
checks need to be adjusted to cope simply by masking in the low bits
to account for 'order' instead of adding, to avoid overflow when the mfn
is INVALID_MFN (which happens on unmap, since we carefully call this
function to fill in the EMT even though the PTE won't be valid).
The range checks are also slightly refactored to put only one of them in
the fast path in the common case. If it doesn't overlap, then it
*definitely* isn't contained, so we don't need both checks. And if it
overlaps and is only one page, then it definitely *is* contained.
Finally, add a comment clarifying how that 'return -1' works it isn't
returning an error and causing the mapping to fail; it relies on
resolve_misconfig() being able to split the mapping later. So it's
*only* sane to do it where order>0 and the 'problem' will be solved by
splitting the large page. Not for blindly returning 'error', which I was
tempted to do in my first attempt.
Signed-off-by: David Woodhouse <dwmw@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
master commit: 30921dc2df3665ca1b2593595aa6725ff013d386
master date: 2017-02-07 14:30:01 +0100
commit c24629612fea2d44c8f03f0a2583e44dbbfc5e05
Author: Oleksandr Tyshchenko <olekstysh@gmail.com>
Date: Wed Feb 15 12:20:48 2017 +0000
IOMMU: always call teardown callback
There is a possible scenario when (d)->need_iommu remains unset
during guest domain execution. For example, when no devices
were assigned to it. Taking into account that teardown callback
is not called when (d)->need_iommu is unset we might have unreleased
resourses after destroying domain.
So, always call teardown callback to roll back actions
that were performed in init callback.
This is XSA-207.
Signed-off-by: Oleksandr Tyshchenko <olekstysh@gmail.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Julien Grall <julien.grall@arm.com>
commit 10baa197d218c222f298ac5ba0d4ef5afd1401ff
Author: George Dunlap <george.dunlap@citrix.com>
Date: Thu Feb 9 10:25:58 2017 +0100
x86/emulate: don't assume that addr_size == 32 implies protected mode
Callers of x86_emulate() generally define addr_size based on the code
segment. In vm86 mode, the code segment is set by the hardware to be
16-bits; but it is entirely possible to enable protected mode, set the
CS to 32-bits, and then disable protected mode. (This is commonly
called "unreal mode".)
But the instruction decoder only checks for protected mode when
addr_size == 16. So in unreal mode, hardware will throw a #UD for VEX
prefixes, but our instruction decoder will decode them, triggering an
ASSERT() further on in _get_fpu(). (With debug=n the emulator will
incorrectly emulate the instruction rather than throwing a #UD, but
this is only a bug, not a crash, so it's not a security issue.)
Teach the instruction decoder to check that we're in protected mode,
even if addr_size is 32.
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Split real mode and VM86 mode handling, as VM86 mode is strictly 16-bit
at all times. Re-base.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 05118b1596ffe4559549edbb28bd0124a7316123
master date: 2017-01-25 15:09:55 +0100
commit 4582c2b9597ff4b5be3f6b26449a3b8a0872e46e
Author: Dario Faggioli <dario.faggioli@citrix.com>
Date: Thu Feb 9 10:25:33 2017 +0100
xen: credit2: fix shutdown/suspend when playing with cpupools.
In fact, during shutdown/suspend, we temporarily move all
the vCPUs to the BSP (i.e., pCPU 0, as of now). For Credit2
domains, we call csched2_vcpu_migrate(), expects to find the
target pCPU in the domain's pool
Therefore, if Credit2 is the default scheduler and we have
removed pCPU 0 from cpupool0, shutdown/suspend fails like
this:
RIP: e008:[<ffff82d08012906d>] sched_credit2.c#migrate+0x274/0x2d1
Xen call trace:
[<ffff82d08012906d>] sched_credit2.c#migrate+0x274/0x2d1
[<ffff82d080129138>] sched_credit2.c#csched2_vcpu_migrate+0x6e/0x86
[<ffff82d08012c468>] schedule.c#vcpu_move_locked+0x69/0x6f
[<ffff82d08012ec14>] cpu_disable_scheduler+0x3d7/0x430
[<ffff82d08019669b>] __cpu_disable+0x299/0x2b0
[<ffff82d0801012f8>] cpu.c#take_cpu_down+0x2f/0x38
[<ffff82d0801312d8>] stop_machine.c#stopmachine_action+0x7f/0x8d
[<ffff82d0801330b8>] tasklet.c#do_tasklet_work+0x74/0xab
[<ffff82d0801333ed>] do_tasklet+0x66/0x8b
[<ffff82d080166a73>] domain.c#idle_loop+0x3b/0x5e
****************************************
Panic on CPU 8:
Assertion 'svc->vcpu->processor < nr_cpu_ids' failed at sched_credit2.c:1729
****************************************
On the other hand, if Credit2 is the scheduler of another
pool, when trying (still during shutdown/suspend) to move
the vCPUs of the Credit2 domains to pCPU 0, it figures
out that pCPU 0 is not a Credit2 pCPU, and fails like this:
RIP: e008:[<ffff82d08012916b>] sched_credit2.c#csched2_vcpu_migrate+0xa1/0x107
Xen call trace:
[<ffff82d08012916b>] sched_credit2.c#csched2_vcpu_migrate+0xa1/0x107
[<ffff82d08012c4e9>] schedule.c#vcpu_move_locked+0x69/0x6f
[<ffff82d08012edfc>] cpu_disable_scheduler+0x3d7/0x430
[<ffff82d08019687b>] __cpu_disable+0x299/0x2b0
[<ffff82d0801012f8>] cpu.c#take_cpu_down+0x2f/0x38
[<ffff82d0801314c0>] stop_machine.c#stopmachine_action+0x7f/0x8d
[<ffff82d0801332a0>] tasklet.c#do_tasklet_work+0x74/0xab
[<ffff82d0801335d5>] do_tasklet+0x66/0x8b
[<ffff82d080166c53>] domain.c#idle_loop+0x3b/0x5e
The solution is to recognise the specific situation, inside
csched2_vcpu_migrate() and, considering it is something temporary,
which only happens during shutdown/suspend, quickly deal with it.
Then, in the resume path, in restore_vcpu_affinity(), things
are set back to normal, and a new v->processor is chosen, for
each vCPU, from the proper set of pCPUs (i.e., the ones of
the proper cpupool).
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
xen: credit2: non Credit2 pCPUs are ok during shutdown/suspend.
Commit 7478ebe1602e6 ("xen: credit2: fix shutdown/suspend
when playing with cpupools"), while doing the right thing
for actual code, forgot to update the ASSERT()s accordingly,
in csched2_vcpu_migrate().
In fact, as stated there already, during shutdown/suspend,
we must allow a Credit2 vCPU to temporarily migrate to a
non Credit2 BSP, without any ASSERT() triggering.
Move them down, after the check for whether or not we are
shutting down, where the assumption that the pCPU must be
valid Credit2 ones, is valid.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
master commit: 7478ebe1602e6bb8242a18840b15757a1d5ad18a
master date: 2017-01-24 17:02:07 +0000
master commit: ad5808d9057248e7879cf375662f0a449fff7005
master date: 2017-02-01 14:44:51 +0000
commit a20300baf5714ed6098a4068e0f464d6971fe0a7
Author: Dario Faggioli <dario.faggioli@citrix.com>
Date: Thu Feb 9 10:24:56 2017 +0100
xen: credit2: never consider CPUs outside of our cpupool.
In fact, relying on the mask of what pCPUs belong to
which Credit2 runqueue is not enough. If we only do that,
when Credit2 is the boot scheduler, we may ASSERT() or
panic when moving a pCPU from Pool-0 to another cpupool.
This is because pCPUs outside of any pool are considered
part of cpupool0. This puts us at risk of crash when those
same pCPUs are added to another pool and something
different than the idle domain is found to be running
on them.
Note that, even if we prevent the above to happen (which
is the purpose of this patch), this is still pretty bad,
in fact, when we remove a pCPU from Pool-0:
- in Credit1, as we do *not* update prv->ncpus and
prv->credit, which means we're considering the wrong
total credits when doing accounting;
- in Credit2, the pCPU remains part of one runqueue,
and is hence at least considered during load balancing,
even if no vCPU should really run there.
In Credit1, this "only" causes skewed accounting and
no crashes because there is a lot of `cpumask_and`ing
going on with the cpumask of the domains' cpupool
(which, BTW, comes at a price).
A quick and not to involved (and easily backportable)
solution for Credit2, is to do exactly the same.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com
Acked-by: George Dunlap <george.dunlap@citrix.com>
master commit: e7191920261d20e52ca4c06a03589a1155981b04
master date: 2017-01-24 17:02:07 +0000
commit 23e33036f8d5f33add75d7fbecad13bcb2cb239e
Author: Dario Faggioli <dario.faggioli@citrix.com>
Date: Thu Feb 9 10:24:32 2017 +0100
xen: credit2: use the correct scratch cpumask.
In fact, there is one scratch mask per each CPU. When
you use the one of a CPU, it must be true that:
- the CPU belongs to your cpupool and scheduler,
- you own the runqueue lock (the one you take via
{v,p}cpu_schedule_lock()) for that CPU.
This was not the case within the following functions:
get_fallback_cpu(), csched2_cpu_pick(): as we can't be
sure we either are on, or hold the lock for, the CPU
that is in the vCPU's 'v->processor'.
migrate(): it's ok, when called from balance_load(),
because that comes from csched2_schedule(), which takes
the runqueue lock of the CPU where it executes. But it is
not ok when we come from csched2_vcpu_migrate(), which
can be called from other places.
The fix is to explicitly use the scratch space of the
CPUs for which we know we hold the runqueue lock.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reported-by: Jan Beulich <JBeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
master commit: 548db8742872399936a2090cbcdfd5e1b34fcbcc
master date: 2017-01-24 17:02:07 +0000
commit 95f1f99a7a2a7c8fbf0eeb1dc6b8473d6e09f535
Author: Joao Martins <joao.m.martins@oracle.com>
Date: Thu Feb 9 10:23:52 2017 +0100
x86/hvm: do not set msr_tsc_adjust on hvm_set_guest_tsc_fixed
Commit 6e03363 ("x86: Implement TSC adjust feature for HVM guest")
implemented TSC_ADJUST MSR for hvm guests. Though while booting
an HVM guest the boot CPU would have a value set with delta_tsc -
guest tsc while secondary CPUS would have 0. For example one can
observe:
$ xen-hvmctx 17 | grep tsc_adjust
TSC_ADJUST: tsc_adjust ff9377dfef47fe66
TSC_ADJUST: tsc_adjust 0
TSC_ADJUST: tsc_adjust 0
TSC_ADJUST: tsc_adjust 0
Upcoming Linux 4.10 now validates whether this MSR is correct and
adjusts them accordingly under the following conditions: values of < 0
(our case for CPU 0) or != 0 or values > 7FFFFFFF. In this conditions it
will force set to 0 and for the CPUs that the value doesn't match all
together. If this msr is not correct we would see messages such as:
[Firmware Bug]: TSC ADJUST: CPU0: -30517044286984129 force to 0
And on HVM guests supporting TSC_ADJUST (requiring at least Haswell
Intel) it won't boot.
Our current vCPU 0 values are incorrect and according to Intel SDM which on
section "Time-Stamp Counter Adjustment" states that "On RESET, the value
of the IA32_TSC_ADJUST MSR is 0." hence we should set it 0 and be
consistent across multiple vCPUs. Perhaps this MSR should be only
changed by the guest which already happens through
hvm_set_guest_tsc_adjust(..) routines (see below). After this patch
guests running Linux 4.10 will see a valid IA32_TSC_ADJUST msr of value
0 for all CPUs and are able to boot.
On the same section of the spec ("Time-Stamp Counter Adjustment") it is
also stated:
"If an execution of WRMSR to the IA32_TIME_STAMP_COUNTER MSR
adds (or subtracts) value X from the TSC, the logical processor also
adds (or subtracts) value X from the IA32_TSC_ADJUST MSR.
Unlike the TSC, the value of the IA32_TSC_ADJUST MSR changes only in
response to WRMSR (either to the MSR itself, or to the
IA32_TIME_STAMP_COUNTER MSR). Its value does not otherwise change as
time elapses. Software seeking to adjust the TSC can do so by using
WRMSR to write the same value to the IA32_TSC_ADJUST MSR on each logical
processor."
This suggests these MSRs values should only be changed through guest i.e.
throught write intercept msrs. We keep IA32_TSC MSR logic such that writes
accomodate adjustments to TSC_ADJUST, hence no functional change in the
msr_tsc_adjust for IA32_TSC msr. Though, we do that in a separate routine
namely hvm_set_guest_tsc_msr instead of through hvm_set_guest_tsc(...).
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 98297f09bd07bb63407909aae1d309d8adeb572e
master date: 2017-01-24 12:37:36 +0100
commit 9b0e6d34cb8e05d9ec5e308576c559f0aac5ba55
Author: Jan Beulich <jbeulich@suse.com>
Date: Thu Feb 9 10:23:22 2017 +0100
x86emul: correct FPU stub asm() constraints
Properly inform the compiler about fic's role as both an input (its
insn_bytes field) and output (its exn_raised field).
Take the opportunity and bring emulate_fpu_insn_stub() more in line
with emulate_fpu_insn_stub_eflags().
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 3dfbb8df335f12297cfc7db9d3df2b74c474921b
master date: 2017-01-24 12:35:59 +0100
commit b843de7f541037e8ff5779a017b837c71e7804af
Author: Jan Beulich <jbeulich@suse.com>
Date: Thu Feb 9 10:22:55 2017 +0100
x86: segment attribute handling adjustments
Null selector loads into SS (possible in 64-bit mode only, and only in
rings other than ring 3) must not alter SS.DPL. (This was found to be
an issue on KVM, and fixed in Linux commit 33ab91103b.)
Further arch_set_info_hvm_guest() didn't make sure that the ASSERT()s
in hvm_set_segment_register() wouldn't trigger: Add further checks, but
tolerate (adjust) clear accessed (CS, SS, DS, ES) and busy (TR) bits.
Finally the setting of the accessed bits for user segments was lost by
commit dd5c85e312 ("x86/hvm: Reposition the modification of raw segment
data from the VMCB/VMCS"), yet VMX requires them to be set for usable
segments. Add respective ASSERT()s (the only path not properly setting
them was arch_set_info_hvm_guest()).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 366ff5f1b3252f9069d5aedb2ffc2567bb0a37c9
master date: 2017-01-20 14:39:12 +0100
commit ba7e250cc48d068b3777ffddc2bb8b2f43d05e53
Author: Jan Beulich <jbeulich@suse.com>
Date: Thu Feb 9 10:22:28 2017 +0100
x86emul: LOCK check adjustments
BT, being encoded as DstBitBase just like BT{C,R,S}, nevertheless does
not write its (register or memory) operand and hence also doesn't allow
a LOCK prefix to be used.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: f2d4f4ba80de8a03a1b0f300d271715a88a8433d
master date: 2017-01-20 14:37:33 +0100
commit 6240d926c4cfe5f83fc940e61d1c0418a8710791
Author: Jan Beulich <jbeulich@suse.com>
Date: Thu Feb 9 10:21:50 2017 +0100
x86emul: VEX.B is ignored in compatibility mode
While VEX.R and VEX.X are guaranteed to be 1 in compatibility mode
(and hence a respective mode_64bit() check can be dropped), VEX.B can
be encoded as zero, but would be ignored by the processor. Since we
emulate instructions in 64-bit mode (except possibly in the test
harness), we need to force the bit to 1 in order to not act on the
wrong {X,Y,Z}MM register (which has no bad effect on 32-bit test
harness builds, as there the bit would again be ignored by the
hardware, and would by default be expected to be 1 anyway).
We must not, however, fiddle with the high bit of VEX.VVVV in the
decode phase, as that would undermine the checking of instructions
requiring the field to be all ones independent of mode. This is
being enforced in copy_REX_VEX() instead.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
x86emul: correct VEX/XOP/EVEX operand size handling for 16-bit code
Operand size defaults to 32 bits in that case, but would not have been
set that way in the absence of an operand size override.
Reported-by: Wei Liu <wei.liu2@citrix.com> (by AFL fuzzing)
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 89c76ee7f60777b81c8fd0475a6af7c84e72a791
master date: 2017-01-17 10:32:25 +0100
master commit: beb82042447c5d6e7073d816d6afc25c5a423cde
master date: 2017-01-25 15:08:59 +0100
commit b378b1f9fa4796b5048e8ac0c58cdbb6307a55c4
Author: Andrew Cooper <andrew.cooper3@citrix.com>
Date: Thu Feb 9 10:20:45 2017 +0100
x86/xstate: Fix array overrun on hardware with LWP
c/s da62246e4c "x86/xsaves: enable xsaves/xrstors/xsavec in xen" introduced
setup_xstate_features() to allocate and fill xstate_offsets[] and
xstate_sizes[].
However, fls() casts xfeature_mask to 32bits which truncates LWP out of the
calculation. As a result, the arrays are allocated too short, and the cpuid
infrastructure reads off the end of them when calculating xstate_size for the
guest.
On one test system, this results in 0x3fec83c0 being returned as the maximum
size of an xsave area, which surprisingly appears not to bother Windows or
Linux too much. I suspect they both use current size based on xcr0, which Xen
forwards from real hardware.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: fe0d67576e335c02becf1cea8e67005509fa90b6
master date: 2017-01-16 17:37:26 +0000
commit b29aed8b0355fe9f7d49faa9aef12b2f8f983c2c
Author: Tamas K Lengyel <tamas.lengyel@zentific.com>
Date: Wed Jan 25 09:12:01 2017 -0700
arm/p2m: Fix regression during domain shutdown with active mem_access
The change in commit 438c5fe4f0c introduced a regression for domains where
mem_acces is or was active. When relinquish_p2m_mapping attempts to clear
a page where the order is not 0 the following ASSERT is triggered:
ASSERT(!p2m->mem_access_enabled || page_order == 0);
This regression was unfortunately not caught during testing in preparation
for the 4.8 release.
In this patch we adjust the ASSERT to not trip when the domain
is being shutdown.
Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com>
Acked-by: Julien Grall <julien.grall@arm.com>
diff --git a/Config.mk b/Config.mk
index a83a205..d9ebcb7 100644
--- a/Config.mk
+++ b/Config.mk
@@ -277,8 +277,8 @@ SEABIOS_UPSTREAM_URL ?= git://xenbits.xen.org/seabios.git
MINIOS_UPSTREAM_URL ?= git://xenbits.xen.org/mini-os.git
endif
OVMF_UPSTREAM_REVISION ?= bc54e50e0fe03c570014f363b547426913e92449
-QEMU_UPSTREAM_REVISION ?= qemu-xen-4.8.0
-MINIOS_UPSTREAM_REVISION ?= xen-RELEASE-4.8.0
+QEMU_UPSTREAM_REVISION ?= qemu-xen-4.8.1
+MINIOS_UPSTREAM_REVISION ?= xen-RELEASE-4.8.1
# Wed Sep 28 11:50:04 2016 +0200
# minios: fix build issue with xen_*mb defines
@@ -289,9 +289,7 @@ SEABIOS_UPSTREAM_REVISION ?= rel-1.10.0
ETHERBOOT_NICS ?= rtl8139 8086100e
-QEMU_TRADITIONAL_REVISION ?= 095261a9ad5c31b9ed431f8382e8aa223089c85b
-# Mon Nov 14 17:19:46 2016 +0000
-# qemu: ioport_read, ioport_write: be defensive about 32-bit addresses
+QEMU_TRADITIONAL_REVISION ?= xen-4.8.1
# Specify which qemu-dm to use. This may be `ioemu' to use the old
# Mercurial in-tree version, or a local directory, or a git URL.
diff --git a/debian/changelog b/debian/changelog
index fafbb7e..0e6cf0f 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,3 +1,13 @@
+xen (4.8.1-1) unstable; urgency=high
+
+ * Update to upstream 4.8.1 release.
+ Changes include numerous bugfixes, including security fixes for:
+ XSA-212 / CVE-2017-7228 Closes:#859560
+ XSA-207 / no cve yet Closes:#856229
+ XSA-206 / no cve yet no Debian bug
+
+ -- Ian Jackson <ian.jackson@eu.citrix.com> Tue, 18 Apr 2017 18:05:00 +0100
+
xen (4.8.1~pre.2017.01.23-1) unstable; urgency=medium
* Update to current upstream stable-4.8 git branch (Xen 4.8.1-pre).
diff --git a/debian/control.md5sum b/debian/control.md5sum
index d2d7fcf..218cada 100644
--- a/debian/control.md5sum
+++ b/debian/control.md5sum
@@ -1,4 +1,4 @@
-d74356cd54456cb07dc4a89ff001c233 debian/changelog
+414390ca652da67ac85ebd905500eb66 debian/changelog
dc7b5d9f0538e3180af4e9aff9b0bd57 debian/bin/gencontrol.py
20e336dbea44b1641802eff0dde9569b debian/templates/control.main.in
a15fa64ce6deead28d33c1581b14dba7 debian/templates/xen-hypervisor.postinst.in
diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index 0138978..54acc60 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1619,6 +1619,21 @@ Note that if **watchdog** option is also specified vpmu will be turned off.
As the virtualisation is not 100% safe, don't use the vpmu flag on
production systems (see http://xenbits.xen.org/xsa/advisory-163.html)!
+### vwfi
+> `= trap | native
+
+> Default: `trap`
+
+WFI is the ARM instruction to "wait for interrupt". WFE is similar and
+means "wait for event". This option, which is ARM specific, changes the
+way guest WFI and WFE are implemented in Xen. By default, Xen traps both
+instructions. In the case of WFI, Xen blocks the guest vcpu; in the case
+of WFE, Xen yield the guest vcpu. When setting vwfi to `native`, Xen
+doesn't trap either instruction, running them in guest context. Setting
+vwfi to `native` reduces irq latency significantly. It can also lead to
+suboptimal scheduling decisions, but only when the system is
+oversubscribed (i.e., in total there are more vCPUs than pCPUs).
+
### watchdog
> `= force | <boolean>`
diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 2c83544..a71e98e 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2710,6 +2710,14 @@ int xc_livepatch_revert(xc_interface *xch, char *name, uint32_t timeout);
int xc_livepatch_unload(xc_interface *xch, char *name, uint32_t timeout);
int xc_livepatch_replace(xc_interface *xch, char *name, uint32_t timeout);
+/*
+ * Ensure cache coherency after memory modifications. A call to this function
+ * is only required on ARM as the x86 architecture provides cache coherency
+ * guarantees. Calling this function on x86 is allowed but has no effect.
+ */
+int xc_domain_cacheflush(xc_interface *xch, uint32_t domid,
+ xen_pfn_t start_pfn, xen_pfn_t nr_pfns);
+
/* Compat shims */
#include "xenctrl_compat.h"
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 296b852..98ab6ba 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -74,10 +74,10 @@ int xc_domain_cacheflush(xc_interface *xch, uint32_t domid,
/*
* The x86 architecture provides cache coherency guarantees which prevent
* the need for this hypercall. Avoid the overhead of making a hypercall
- * just for Xen to return -ENOSYS.
+ * just for Xen to return -ENOSYS. It is safe to ignore this call on x86
+ * so we just return 0.
*/
- errno = ENOSYS;
- return -1;
+ return 0;
#else
DECLARE_DOMCTL;
domctl.cmd = XEN_DOMCTL_cacheflush;
diff --git a/tools/libxc/xc_private.c b/tools/libxc/xc_private.c
index d57c39a..9ba4b73 100644
--- a/tools/libxc/xc_private.c
+++ b/tools/libxc/xc_private.c
@@ -64,8 +64,7 @@ struct xc_interface_core *xc_interface_open(xentoollog_logger *logger,
goto err;
xch->fmem = xenforeignmemory_open(xch->error_handler, 0);
-
- if ( xch->xcall == NULL )
+ if ( xch->fmem == NULL )
goto err;
return xch;
diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h
index 97445ae..fddebdc 100644
--- a/tools/libxc/xc_private.h
+++ b/tools/libxc/xc_private.h
@@ -366,9 +366,6 @@ void bitmap_byte_to_64(uint64_t *lp, const uint8_t *bp, int nbits);
/* Optionally flush file to disk and discard page cache */
void discard_file_cache(xc_interface *xch, int fd, int flush);
-int xc_domain_cacheflush(xc_interface *xch, uint32_t domid,
- xen_pfn_t start_pfn, xen_pfn_t nr_pfns);
-
#define MAX_MMU_UPDATES 1024
struct xc_mmu {
mmu_update_t updates[MAX_MMU_UPDATES];
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 0386f28..acf714e 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -2255,7 +2255,8 @@ static void device_disk_add(libxl__egc *egc, uint32_t domid,
case LIBXL_DISK_BACKEND_QDISK:
flexarray_append(back, "params");
flexarray_append(back, GCSPRINTF("%s:%s",
- libxl__device_disk_string_of_format(disk->format), disk->pdev_path));
+ libxl__device_disk_string_of_format(disk->format),
+ disk->pdev_path ? : ""));
if (libxl_defbool_val(disk->colo_enable)) {
flexarray_append(back, "colo-host");
flexarray_append(back, libxl__sprintf(gc, "%s", disk->colo_host));
diff --git a/tools/ocaml/xenstored/Makefile b/tools/ocaml/xenstored/Makefile
index 1769e55..d238836 100644
--- a/tools/ocaml/xenstored/Makefile
+++ b/tools/ocaml/xenstored/Makefile
@@ -53,6 +53,7 @@ OBJS = paths \
domains \
connection \
connections \
+ history \
parse_arg \
process \
xenstored
diff --git a/tools/ocaml/xenstored/connection.ml b/tools/ocaml/xenstored/connection.ml
index 3ffd35b..a66d2f7 100644
--- a/tools/ocaml/xenstored/connection.ml
+++ b/tools/ocaml/xenstored/connection.ml
@@ -296,3 +296,8 @@ let debug con =
let domid = get_domstr con in
let watches = List.map (fun (path, token) -> Printf.sprintf "watch %s: %s %s\n" domid path token) (list_watches con) in
String.concat "" watches
+
+let decr_conflict_credit doms con =
+ match con.dom with
+ | None -> () (* It's a socket connection. We don't know which domain we're in, so treat it as if it's free to conflict *)
+ | Some dom -> Domains.decr_conflict_credit doms dom
diff --git a/tools/ocaml/xenstored/connections.ml b/tools/ocaml/xenstored/connections.ml
index f9bc225..ae76928 100644
--- a/tools/ocaml/xenstored/connections.ml
+++ b/tools/ocaml/xenstored/connections.ml
@@ -44,12 +44,14 @@ let add_domain cons dom =
| Some p -> Hashtbl.add cons.ports p con;
| None -> ()
-let select cons =
- Hashtbl.fold
- (fun _ con (ins, outs) ->
- let fd = Connection.get_fd con in
- (fd :: ins, if Connection.has_output con then fd :: outs else outs))
- cons.anonymous ([], [])
+let select ?(only_if = (fun _ -> true)) cons =
+ Hashtbl.fold (fun _ con (ins, outs) ->
+ if (only_if con) then (
+ let fd = Connection.get_fd con in
+ (fd :: ins, if Connection.has_output con then fd :: outs else outs)
+ ) else (ins, outs)
+ )
+ cons.anonymous ([], [])
let find cons =
Hashtbl.find cons.anonymous
diff --git a/tools/ocaml/xenstored/define.ml b/tools/ocaml/xenstored/define.ml
index e9d957f..5a604d1 100644
--- a/tools/ocaml/xenstored/define.ml
+++ b/tools/ocaml/xenstored/define.ml
@@ -29,6 +29,10 @@ let maxwatch = ref (50)
let maxtransaction = ref (20)
let maxrequests = ref (-1) (* maximum requests per transaction *)
+let conflict_burst_limit = ref 5.0
+let conflict_max_history_seconds = ref 0.05
+let conflict_rate_limit_is_aggregate = ref true
+
let domid_self = 0x7FF0
exception Not_a_directory of string
diff --git a/tools/ocaml/xenstored/domain.ml b/tools/ocaml/xenstored/domain.ml
index ab34314..4515650 100644
--- a/tools/ocaml/xenstored/domain.ml
+++ b/tools/ocaml/xenstored/domain.ml
@@ -31,8 +31,13 @@ type t =
mutable io_credit: int; (* the rounds of ring process left to do, default is 0,
usually set to 1 when there is work detected, could
also set to n to give "lazy" clients extra credit *)
+ mutable conflict_credit: float; (* Must be positive to perform writes; a commit
+ that later causes conflict with another
+ domain's transaction costs credit. *)
+ mutable caused_conflicts: int64;
}
+let is_dom0 d = d.id = 0
let get_path dom = "/local/domain/" ^ (sprintf "%u" dom.id)
let get_id domain = domain.id
let get_interface d = d.interface
@@ -48,6 +53,10 @@ let set_io_credit ?(n=1) domain = domain.io_credit <- max 0 n
let incr_io_credit domain = domain.io_credit <- domain.io_credit + 1
let decr_io_credit domain = domain.io_credit <- max 0 (domain.io_credit - 1)
+let is_paused_for_conflict dom = dom.conflict_credit <= 0.0
+
+let is_free_to_conflict = is_dom0
+
let string_of_port = function
| None -> "None"
| Some x -> string_of_int (Xeneventchn.to_int x)
@@ -84,6 +93,12 @@ let make id mfn remote_port interface eventchn = {
port = None;
bad_client = false;
io_credit = 0;
+ conflict_credit = !Define.conflict_burst_limit;
+ caused_conflicts = 0L;
}
-let is_dom0 d = d.id = 0
+let log_and_reset_conflict_stats logfn dom =
+ if dom.caused_conflicts > 0L then (
+ logfn dom.id dom.caused_conflicts;
+ dom.caused_conflicts <- 0L
+ )
diff --git a/tools/ocaml/xenstored/domains.ml b/tools/ocaml/xenstored/domains.ml
index 395f3a9..fdae298 100644
--- a/tools/ocaml/xenstored/domains.ml
+++ b/tools/ocaml/xenstored/domains.ml
@@ -15,20 +15,77 @@
*)
let debug fmt = Logging.debug "domains" fmt
+let error fmt = Logging.error "domains" fmt
+let warn fmt = Logging.warn "domains" fmt
type domains = {
eventchn: Event.t;
table: (Xenctrl.domid, Domain.t) Hashtbl.t;
+
+ (* N.B. the Queue module is not thread-safe but oxenstored is single-threaded. *)
+ (* Domains queue up to regain conflict-credit; we have a queue for
+ domains that are carrying some penalty and so are below the
+ maximum credit, and another queue for domains that have run out of
+ credit and so have had their access paused. *)
+ doms_conflict_paused: (Domain.t option ref) Queue.t;
+ doms_with_conflict_penalty: (Domain.t option ref) Queue.t;
+
+ (* A callback function to be called when we go from zero to one paused domain.
+ This will be to reset the countdown until the next unit of credit is issued. *)
+ on_first_conflict_pause: unit -> unit;
+
+ (* If config is set to use individual instead of aggregate conflict-rate-limiting,
+ we use these counts instead of the queues. The second one includes the first. *)
+ mutable n_paused: int; (* Number of domains with zero or negative credit *)
+ mutable n_penalised: int; (* Number of domains with less than maximum credit *)
}
-let init eventchn =
- { eventchn = eventchn; table = Hashtbl.create 10 }
+let init eventchn on_first_conflict_pause = {
+ eventchn = eventchn;
+ table = Hashtbl.create 10;
+ doms_conflict_paused = Queue.create ();
+ doms_with_conflict_penalty = Queue.create ();
+ on_first_conflict_pause = on_first_conflict_pause;
+ n_paused = 0;
+ n_penalised = 0;
+}
let del doms id = Hashtbl.remove doms.table id
let exist doms id = Hashtbl.mem doms.table id
let find doms id = Hashtbl.find doms.table id
let number doms = Hashtbl.length doms.table
let iter doms fct = Hashtbl.iter (fun _ b -> fct b) doms.table
+let rec is_empty_queue q =
+ Queue.is_empty q ||
+ if !(Queue.peek q) = None
+ then (
+ ignore (Queue.pop q);
+ is_empty_queue q
+ ) else false
+
+let all_at_max_credit doms =
+ if !Define.conflict_rate_limit_is_aggregate
+ then
+ (* Check both becuase if burst limit is 1.0 then a domain can go straight
+ * from max-credit to paused without getting into the penalty queue. *)
+ is_empty_queue doms.doms_with_conflict_penalty
+ && is_empty_queue doms.doms_conflict_paused
+ else doms.n_penalised = 0
+
+(* Functions to handle queues of domains given that the domain might be deleted while in a queue. *)
+let push dom queue =
+ Queue.push (ref (Some dom)) queue
+
+let rec pop queue =
+ match !(Queue.pop queue) with
+ | None -> pop queue
+ | Some x -> x
+
+let remove_from_queue dom queue =
+ Queue.iter (fun d -> match !d with
+ | None -> ()
+ | Some x -> if x=dom then d := None) queue
+
let cleanup xc doms =
let notify = ref false in
let dead_dom = ref [] in
@@ -52,6 +109,11 @@ let cleanup xc doms =
let dom = Hashtbl.find doms.table id in
Domain.close dom;
Hashtbl.remove doms.table id;
+ if dom.Domain.conflict_credit <= !Define.conflict_burst_limit
+ then (
+ remove_from_queue dom doms.doms_with_conflict_penalty;
+ if (dom.Domain.conflict_credit <= 0.) then remove_from_queue dom doms.doms_conflict_paused
+ )
) !dead_dom;
!notify, !dead_dom
@@ -82,3 +144,74 @@ let create0 doms =
Domain.bind_interdomain dom;
Domain.notify dom;
dom
+
+let decr_conflict_credit doms dom =
+ dom.Domain.caused_conflicts <- Int64.add 1L dom.Domain.caused_conflicts;
+ let before = dom.Domain.conflict_credit in
+ let after = max (-1.0) (before -. 1.0) in
+ debug "decr_conflict_credit dom%d %F -> %F" (Domain.get_id dom) before after;
+ dom.Domain.conflict_credit <- after;
+ let newly_penalised =
+ before >= !Define.conflict_burst_limit
+ && after < !Define.conflict_burst_limit in
+ let newly_paused = before > 0.0 && after <= 0.0 in
+ if !Define.conflict_rate_limit_is_aggregate then (
+ if newly_penalised
+ && after > 0.0
+ then (
+ push dom doms.doms_with_conflict_penalty
+ ) else if newly_paused
+ then (
+ let first_pause = Queue.is_empty doms.doms_conflict_paused in
+ push dom doms.doms_conflict_paused;
+ if first_pause then doms.on_first_conflict_pause ()
+ ) else (
+ (* The queues are correct already: no further action needed. *)
+ )
+ ) else (
+ if newly_penalised then doms.n_penalised <- doms.n_penalised + 1;
+ if newly_paused then (
+ doms.n_paused <- doms.n_paused + 1;
+ if doms.n_paused = 1 then doms.on_first_conflict_pause ()
+ )
+ )
+
+(* Give one point of credit to one domain, and update the queues appropriately. *)
+let incr_conflict_credit_from_queue doms =
+ let process_queue q requeue_test =
+ let d = pop q in
+ let before = d.Domain.conflict_credit in (* just for debug-logging *)
+ d.Domain.conflict_credit <- min (d.Domain.conflict_credit +. 1.0) !Define.conflict_burst_limit;
+ debug "incr_conflict_credit_from_queue: dom%d: %F -> %F" (Domain.get_id d) before d.Domain.conflict_credit;
+ if requeue_test d.Domain.conflict_credit then (
+ push d q (* Make it queue up again for its next point of credit. *)
+ )
+ in
+ let paused_queue_test cred = cred <= 0.0 in
+ let penalty_queue_test cred = cred < !Define.conflict_burst_limit in
+ try process_queue doms.doms_conflict_paused paused_queue_test
+ with Queue.Empty -> (
+ try process_queue doms.doms_with_conflict_penalty penalty_queue_test
+ with Queue.Empty -> () (* Both queues are empty: nothing to do here. *)
+ )
+
+let incr_conflict_credit doms =
+ if !Define.conflict_rate_limit_is_aggregate
+ then incr_conflict_credit_from_queue doms
+ else (
+ (* Give a point of credit to every domain, subject only to the cap. *)
+ let inc dom =
+ let before = dom.Domain.conflict_credit in
+ let after = min (before +. 1.0) !Define.conflict_burst_limit in
+ dom.Domain.conflict_credit <- after;
+ debug "incr_conflict_credit dom%d: %F -> %F" (Domain.get_id dom) before after;
+
+ if before <= 0.0 && after > 0.0
+ then doms.n_paused <- doms.n_paused - 1;
+
+ if before < !Define.conflict_burst_limit
+ && after >= !Define.conflict_burst_limit
+ then doms.n_penalised <- doms.n_penalised - 1
+ in
+ if doms.n_penalised > 0 then iter doms inc
+ )
diff --git a/tools/ocaml/xenstored/history.ml b/tools/ocaml/xenstored/history.ml
new file mode 100644
index 0000000..f39565b
--- /dev/null
+++ b/tools/ocaml/xenstored/history.ml
@@ -0,0 +1,73 @@
+(*
+ * Copyright (c) 2017 Citrix Systems Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU Lesser General Public License for more details.
+ *)
+
+type history_record = {
+ con: Connection.t; (* connection that made a change *)
+ tid: int; (* transaction id of the change (may be Transaction.none) *)
+ before: Store.t; (* the store before the change *)
+ after: Store.t; (* the store after the change *)
+ finish_count: int64; (* the commit-count at which the transaction finished *)
+}
+
+let history : history_record list ref = ref []
+
+(* Called from periodic_ops to ensure we don't discard symbols that are still needed. *)
+(* There is scope for optimisation here, since in consecutive commits one commit's `after`
+ * is the same thing as the next commit's `before`, but not all commits in history are
+ * consecutive. *)
+let mark_symbols () =
+ (* There are gaps where dom0's commits are missing. Otherwise we could assume that
+ * each element's `before` is the same thing as the next element's `after`
+ * since the next element is the previous commit *)
+ List.iter (fun hist_rec ->
+ Store.mark_symbols hist_rec.before;
+ Store.mark_symbols hist_rec.after;
+ )
+ !history
+
+(* Keep only enough commit-history to protect the running transactions that we are still tracking *)
+(* There is scope for optimisation here, replacing List.filter with something more efficient,
+ * probably on a different list-like structure. *)
+let trim ?txn () =
+ Transaction.trim_short_running_transactions txn;
+ history := match Transaction.oldest_short_running_transaction () with
+ | None -> [] (* We have no open transaction, so no history is needed *)
+ | Some (_, txn) -> (
+ (* keep records with finish_count recent enough to be relevant *)
+ List.filter (fun r -> r.finish_count > txn.Transaction.start_count) !history
+ )
+
+let end_transaction txn con tid commit =
+ let success = Connection.end_transaction con tid commit in
+ trim ~txn ();
+ success
+
+let push (x: history_record) =
+ let dom = x.con.Connection.dom in
+ match dom with
+ | None -> () (* treat socket connections as always free to conflict *)
+ | Some d -> if not (Domain.is_free_to_conflict d) then history := x :: !history
+
+(* Find the connections from records since commit-count [since] for which [f record] returns [true] *)
+let filter_connections ~ignore ~since ~f =
+ (* The "mem" call is an optimisation, to avoid calling f if we have picked con already. *)
+ (* Using a hash table rather than a list is to optimise the "mem" call. *)
+ List.fold_left (fun acc hist_rec ->
+ if hist_rec.finish_count > since
+ && not (hist_rec.con == ignore)
+ && not (Hashtbl.mem acc hist_rec.con)
+ && f hist_rec
+ then Hashtbl.replace acc hist_rec.con ();
+ acc
+ ) (Hashtbl.create 1023) !history
diff --git a/tools/ocaml/xenstored/oxenstored.conf.in b/tools/ocaml/xenstored/oxenstored.conf.in
index 82117a9..536611e 100644
--- a/tools/ocaml/xenstored/oxenstored.conf.in
+++ b/tools/ocaml/xenstored/oxenstored.conf.in
@@ -9,6 +9,38 @@ test-eagain = false
# Activate transaction merge support
merge-activate = true
+# Limits applied to domains whose writes cause other domains' transaction
+# commits to fail. Must include decimal point.
+
+# The burst limit is the number of conflicts a domain can cause to
+# fail in a short period; this value is used for both the initial and
+# the maximum value of each domain's conflict-credit, which falls by
+# one point for each conflict caused, and when it reaches zero the
+# domain's requests are ignored.
+conflict-burst-limit = 5.0
+
+# The conflict-credit is replenished over time:
+# one point is issued after each conflict-max-history-seconds, so this
+# is the minimum pause-time during which a domain will be ignored.
+conflict-max-history-seconds = 0.05
+
+# If the conflict-rate-limit-is-aggregate flag is true then after each
+# tick one point of conflict-credit is given to just one domain: the
+# one at the front of the queue. If false, then after each tick each
+# domain gets a point of conflict-credit.
+#
+# In environments where it is known that every transaction will
+# involve a set of nodes that is writable by at most one other domain,
+# then it is safe to set this aggregate-limit flag to false for better
+# performance. (This can be determined by considering the layout of
+# the xenstore tree and permissions, together with the content of the
+# transactions that require protection.)
+#
+# A transaction which involves a set of nodes which can be modified by
+# multiple other domains can suffer conflicts caused by any of those
+# domains, so the flag must be set to true.
+conflict-rate-limit-is-aggregate = true
+
# Activate node permission system
perms-activate = true
diff --git a/tools/ocaml/xenstored/process.ml b/tools/ocaml/xenstored/process.ml
index 7b60376..8a688c4 100644
--- a/tools/ocaml/xenstored/process.ml
+++ b/tools/ocaml/xenstored/process.ml
@@ -16,6 +16,7 @@
let error fmt = Logging.error "process" fmt
let info fmt = Logging.info "process" fmt
+let debug fmt = Logging.debug "process" fmt
open Printf
open Stdext
@@ -25,6 +26,7 @@ exception Transaction_nested
exception Domain_not_match
exception Invalid_Cmd_Args
+(* This controls the do_debug fn in this module, not the debug logging-function. *)
let allow_debug = ref false
let c_int_of_string s =
@@ -293,6 +295,11 @@ let write_response_log ~ty ~tid ~con ~response =
| Packet.Reply x -> write_answer_log ~ty ~tid ~con ~data:x
| Packet.Error e -> write_answer_log ~ty:(Xenbus.Xb.Op.Error) ~tid ~con ~data:e
+let record_commit ~con ~tid ~before ~after =
+ let inc r = r := Int64.add 1L !r in
+ let finish_count = inc Transaction.counter; !Transaction.counter in
+ History.push {History.con=con; tid=tid; before=before; after=after; finish_count=finish_count}
+
(* Replay a stored transaction against a fresh store, check the responses are
all equivalent: if so, commit the transaction. Otherwise send the abort to
the client. *)
@@ -301,25 +308,57 @@ let transaction_replay c t doms cons =
| Transaction.No ->
error "attempted to replay a non-full transaction";
false
- | Transaction.Full(id, oldroot, cstore) ->
+ | Transaction.Full(id, oldstore, cstore) ->
let tid = Connection.start_transaction c cstore in
- let new_t = Transaction.make tid cstore in
+ let replay_t = Transaction.make ~internal:true tid cstore in
let con = sprintf "r(%d):%s" id (Connection.get_domstr c) in
- let perform_exn (request, response) =
- write_access_log ~ty:request.Packet.ty ~tid ~con ~data:request.Packet.data;
+
+ let perform_exn ~wlog txn (request, response) =
+ if wlog then write_access_log ~ty:request.Packet.ty ~tid ~con ~data:request.Packet.data;
let fct = function_of_type_simple_op request.Packet.ty in
- let response' = input_handle_error ~cons ~doms ~fct ~con:c ~t:new_t ~req:request in
- write_response_log ~ty:request.Packet.ty ~tid ~con ~response:response';
- if not(Packet.response_equal response response') then raise Transaction_again in
+ let response' = input_handle_error ~cons ~doms ~fct ~con:c ~t:txn ~req:request in
+ if wlog then write_response_log ~ty:request.Packet.ty ~tid ~con ~response:response';
+ if not(Packet.response_equal response response') then raise Transaction_again
+ in
finally
(fun () ->
try
Logging.start_transaction ~con ~tid;
- List.iter perform_exn (Transaction.get_operations t);
- Logging.end_transaction ~con ~tid;
+ List.iter (perform_exn ~wlog:true replay_t) (Transaction.get_operations t); (* May throw EAGAIN *)
- Transaction.commit ~con new_t
- with e ->
+ Logging.end_transaction ~con ~tid;
+ Transaction.commit ~con replay_t
+ with
+ | Transaction_again -> (
+ Transaction.failed_commits := Int64.add !Transaction.failed_commits 1L;
+ let victim_domstr = Connection.get_domstr c in
+ debug "Apportioning blame for EAGAIN in txn %d, domain=%s" id victim_domstr;
+ let punish guilty_con =
+ debug "Blaming domain %s for conflict with domain %s txn %d"
+ (Connection.get_domstr guilty_con) victim_domstr id;
+ Connection.decr_conflict_credit doms guilty_con
+ in
+ let judge_and_sentence hist_rec = (
+ let can_apply_on store = (
+ let store = Store.copy store in
+ let trial_t = Transaction.make ~internal:true Transaction.none store in
+ try List.iter (perform_exn ~wlog:false trial_t) (Transaction.get_operations t);
+ true
+ with Transaction_again -> false
+ ) in
+ if can_apply_on hist_rec.History.before
+ && not (can_apply_on hist_rec.History.after)
+ then (punish hist_rec.History.con; true)
+ else false
+ ) in
+ let guilty_cons = History.filter_connections ~ignore:c ~since:t.Transaction.start_count ~f:judge_and_sentence in
+ if Hashtbl.length guilty_cons = 0 then (
+ debug "Found no culprit for conflict in %s: must be self or not in history." con;
+ Transaction.failed_commits_no_culprit := Int64.add !Transaction.failed_commits_no_culprit 1L
+ );
+ false
+ )
+ | e ->
info "transaction_replay %d caught: %s" tid (Printexc.to_string e);
false
)
@@ -358,13 +397,20 @@ let do_transaction_end con t domains cons data =
| x :: _ -> raise (Invalid_argument x)
| _ -> raise Invalid_Cmd_Args
in
+ let commit = commit && not (Transaction.is_read_only t) in
let success =
let commit = if commit then Some (fun con trans -> transaction_replay con trans domains cons) else None in
- Connection.end_transaction con (Transaction.get_id t) commit in
+ History.end_transaction t con (Transaction.get_id t) commit in
if not success then
raise Transaction_again;
- if commit then
- process_watch (List.rev (Transaction.get_paths t)) cons
+ if commit then begin
+ process_watch (List.rev (Transaction.get_paths t)) cons;
+ match t.Transaction.ty with
+ | Transaction.No ->
+ () (* no need to record anything *)
+ | Transaction.Full(id, oldstore, cstore) ->
+ record_commit ~con ~tid:id ~before:oldstore ~after:cstore
+ end
let do_introduce con t domains cons data =
if not (Connection.is_dom0 con)
@@ -434,6 +480,37 @@ let function_of_type ty =
| _ -> function_of_type_simple_op ty
(**
+ * Determines which individual (non-transactional) operations we want to retain.
+ * We only want to retain operations that have side-effects in the store since
+ * these can be the cause of transactions failing.
+ *)
+let retain_op_in_history ty =
+ match ty with
+ | Xenbus.Xb.Op.Write
+ | Xenbus.Xb.Op.Mkdir
+ | Xenbus.Xb.Op.Rm
+ | Xenbus.Xb.Op.Setperms -> true
+ | Xenbus.Xb.Op.Debug
+ | Xenbus.Xb.Op.Directory
+ | Xenbus.Xb.Op.Read
+ | Xenbus.Xb.Op.Getperms
+ | Xenbus.Xb.Op.Watch
+ | Xenbus.Xb.Op.Unwatch
+ | Xenbus.Xb.Op.Transaction_start
+ | Xenbus.Xb.Op.Transaction_end
+ | Xenbus.Xb.Op.Introduce
+ | Xenbus.Xb.Op.Release
+ | Xenbus.Xb.Op.Getdomainpath
+ | Xenbus.Xb.Op.Watchevent
+ | Xenbus.Xb.Op.Error
+ | Xenbus.Xb.Op.Isintroduced
+ | Xenbus.Xb.Op.Resume
+ | Xenbus.Xb.Op.Set_target
+ | Xenbus.Xb.Op.Restrict
+ | Xenbus.Xb.Op.Reset_watches
+ | Xenbus.Xb.Op.Invalid -> false
+
+(**
* Nothrow guarantee.
*)
let process_packet ~store ~cons ~doms ~con ~req =
@@ -448,7 +525,19 @@ let process_packet ~store ~cons ~doms ~con ~req =
else
Connection.get_transaction con tid
in
- let response = input_handle_error ~cons ~doms ~fct ~con ~t ~req in
+
+ let execute () = input_handle_error ~cons ~doms ~fct ~con ~t ~req in
+
+ let response =
+ (* Note that transactions are recorded in history separately. *)
+ if tid = Transaction.none && retain_op_in_history ty then begin
+ let before = Store.copy store in
+ let response = execute () in
+ let after = Store.copy store in
+ record_commit ~con ~tid ~before ~after;
+ response
+ end else execute ()
+ in
let response = try
if tid <> Transaction.none then
diff --git a/tools/ocaml/xenstored/store.ml b/tools/ocaml/xenstored/store.ml
index 223ee21..9f619b8 100644
--- a/tools/ocaml/xenstored/store.ml
+++ b/tools/ocaml/xenstored/store.ml
@@ -211,6 +211,7 @@ let apply rnode path fct =
lookup rnode path fct
end
+(* The Store.t type *)
type t =
{
mutable stat_transaction_coalesce: int;
diff --git a/tools/ocaml/xenstored/transaction.ml b/tools/ocaml/xenstored/transaction.ml
index 6b37fc2..23e7ccf 100644
--- a/tools/ocaml/xenstored/transaction.ml
+++ b/tools/ocaml/xenstored/transaction.ml
@@ -14,6 +14,8 @@
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU Lesser General Public License for more details.
*)
+let error fmt = Logging.error "transaction" fmt
+
open Stdext
let none = 0
@@ -69,34 +71,73 @@ let can_coalesce oldroot currentroot path =
else
false
-type ty = No | Full of (int * Store.Node.t * Store.t)
+type ty = No | Full of (
+ int * (* Transaction id *)
+ Store.t * (* Original store *)
+ Store.t (* A pointer to the canonical store: its root changes on each transaction-commit *)
+)
type t = {
ty: ty;
- store: Store.t;
+ start_count: int64;
+ store: Store.t; (* This is the store that we change in write operations. *)
quota: Quota.t;
mutable paths: (Xenbus.Xb.Op.operation * Store.Path.t) list;
mutable operations: (Packet.request * Packet.response) list;
mutable read_lowpath: Store.Path.t option;
mutable write_lowpath: Store.Path.t option;
}
+let get_id t = match t.ty with No -> none | Full (id, _, _) -> id
-let make id store =
- let ty = if id = none then No else Full(id, Store.get_root store, store) in
- {
+let counter = ref 0L
+let failed_commits = ref 0L
+let failed_commits_no_culprit = ref 0L
+let reset_conflict_stats () =
+ failed_commits := 0L;
+ failed_commits_no_culprit := 0L
+
+(* Scope for optimisation: different data-structure and functions to search/filter it *)
+let short_running_txns = ref []
+
+let oldest_short_running_transaction () =
+ let rec last = function
+ | [] -> None
+ | [x] -> Some x
+ | x :: xs -> last xs
+ in last !short_running_txns
+
+let trim_short_running_transactions txn =
+ let cutoff = Unix.gettimeofday () -. !Define.conflict_max_history_seconds in
+ let keep = match txn with
+ | None -> (function (start_time, _) -> start_time >= cutoff)
+ | Some t -> (function (start_time, tx) -> start_time >= cutoff && tx != t)
+ in
+ short_running_txns := List.filter
+ keep
+ !short_running_txns
+
+let make ?(internal=false) id store =
+ let ty = if id = none then No else Full(id, Store.copy store, store) in
+ let txn = {
ty = ty;
+ start_count = !counter;
store = if id = none then store else Store.copy store;
quota = Quota.copy store.Store.quota;
paths = [];
operations = [];
read_lowpath = None;
write_lowpath = None;
- }
+ } in
+ if id <> none && not internal then (
+ let now = Unix.gettimeofday () in
+ short_running_txns := (now, txn) :: !short_running_txns
+ );
+ txn
-let get_id t = match t.ty with No -> none | Full (id, _, _) -> id
let get_store t = t.store
let get_paths t = t.paths
+let is_read_only t = t.paths = []
let add_wop t ty path = t.paths <- (ty, path) :: t.paths
let add_operation ~perm t request response =
if !Define.maxrequests >= 0
@@ -155,7 +196,7 @@ let commit ~con t =
let has_commited =
match t.ty with
| No -> true
- | Full (id, oldroot, cstore) ->
+ | Full (id, oldstore, cstore) -> (* "cstore" meaning current canonical store *)
let commit_partial oldroot cstore store =
(* get the lowest path of the query and verify that it hasn't
been modified by others transactions. *)
@@ -198,7 +239,7 @@ let commit ~con t =
if !test_eagain && Random.int 3 = 0 then
false
else
- try_commit oldroot cstore t.store
+ try_commit (Store.get_root oldstore) cstore t.store
in
if has_commited && has_write_ops then
Disk.write t.store;
diff --git a/tools/ocaml/xenstored/xenstored.ml b/tools/ocaml/xenstored/xenstored.ml
index 2efcce6..5474ece 100644
--- a/tools/ocaml/xenstored/xenstored.ml
+++ b/tools/ocaml/xenstored/xenstored.ml
@@ -53,14 +53,16 @@ let process_connection_fds store cons domains rset wset =
let process_domains store cons domains =
let do_io_domain domain =
- if not (Domain.is_bad_domain domain) then
- let io_credit = Domain.get_io_credit domain in
- if io_credit > 0 then (
- let con = Connections.find_domain cons (Domain.get_id domain) in
- Process.do_input store cons domains con;
- Process.do_output store cons domains con;
- Domain.decr_io_credit domain;
- ) in
+ if Domain.is_bad_domain domain
+ || Domain.get_io_credit domain <= 0
+ || Domain.is_paused_for_conflict domain
+ then () (* nothing to do *)
+ else (
+ let con = Connections.find_domain cons (Domain.get_id domain) in
+ Process.do_input store cons domains con;
+ Process.do_output store cons domains con;
+ Domain.decr_io_credit domain
+ ) in
Domains.iter domains do_io_domain
let sigusr1_handler store =
@@ -89,6 +91,9 @@ let parse_config filename =
let pidfile = ref default_pidfile in
let options = [
("merge-activate", Config.Set_bool Transaction.do_coalesce);
+ ("conflict-burst-limit", Config.Set_float Define.conflict_burst_limit);
+ ("conflict-max-history-seconds", Config.Set_float Define.conflict_max_history_seconds);
+ ("conflict-rate-limit-is-aggregate", Config.Set_bool Define.conflict_rate_limit_is_aggregate);
("perms-activate", Config.Set_bool Perms.activate);
("quota-activate", Config.Set_bool Quota.activate);
("quota-maxwatch", Config.Set_int Define.maxwatch);
@@ -260,7 +265,23 @@ let _ =
let store = Store.create () in
let eventchn = Event.init () in
- let domains = Domains.init eventchn in
+ let next_frequent_ops = ref 0. in
+ let advance_next_frequent_ops () =
+ next_frequent_ops := (Unix.gettimeofday () +. !Define.conflict_max_history_seconds)
+ in
+ let delay_next_frequent_ops_by duration =
+ next_frequent_ops := !next_frequent_ops +. duration
+ in
+ let domains = Domains.init eventchn advance_next_frequent_ops in
+
+ (* For things that need to be done periodically but more often
+ * than the periodic_ops function *)
+ let frequent_ops () =
+ if Unix.gettimeofday () > !next_frequent_ops then (
+ History.trim ();
+ Domains.incr_conflict_credit domains;
+ advance_next_frequent_ops ()
+ ) in
let cons = Connections.create () in
let quit = ref false in
@@ -356,6 +377,7 @@ let _ =
let last_scan_time = ref 0. in
let periodic_ops now =
+ debug "periodic_ops starting";
(* we garbage collect the string->int dictionary after a sizeable amount of operations,
* there's no need to be really fast even if we got loose
* objects since names are often reuse.
@@ -365,6 +387,7 @@ let _ =
Symbol.mark_all_as_unused ();
Store.mark_symbols store;
Connections.iter cons Connection.mark_symbols;
+ History.mark_symbols ();
Symbol.garbage ()
end;
@@ -374,7 +397,11 @@ let _ =
(* make sure we don't print general stats faster than 2 min *)
if now > (!last_stat_time +. 120.) then (
+ info "Transaction conflict statistics for last %F seconds:" (now -. !last_stat_time);
last_stat_time := now;
+ Domains.iter domains (Domain.log_and_reset_conflict_stats (info "Dom%d caused %Ld conflicts"));
+ info "%Ld failed transactions; of these no culprit was found for %Ld" !Transaction.failed_commits !Transaction.failed_commits_no_culprit;
+ Transaction.reset_conflict_stats ();
let gc = Gc.stat () in
let (lanon, lanon_ops, lanon_watchs,
@@ -392,23 +419,38 @@ let _ =
gc.Gc.heap_words gc.Gc.heap_chunks
gc.Gc.live_words gc.Gc.live_blocks
gc.Gc.free_words gc.Gc.free_blocks
- )
- in
+ );
+ let elapsed = Unix.gettimeofday () -. now in
+ debug "periodic_ops took %F seconds." elapsed;
+ delay_next_frequent_ops_by elapsed
+ in
- let period_ops_interval = 15. in
- let period_start = ref 0. in
+ let period_ops_interval = 15. in
+ let period_start = ref 0. in
let main_loop () =
-
+ let is_peaceful c =
+ match Connection.get_domain c with
+ | None -> true (* Treat socket-connections as exempt, and free to conflict. *)
+ | Some dom -> not (Domain.is_paused_for_conflict dom)
+ in
+ frequent_ops ();
let mw = Connections.has_more_work cons in
+ let peaceful_mw = List.filter is_peaceful mw in
List.iter
(fun c ->
match Connection.get_domain c with
| None -> () | Some d -> Domain.incr_io_credit d)
- mw;
+ peaceful_mw;
+ let start_time = Unix.gettimeofday () in
let timeout =
- if List.length mw > 0 then 0. else period_ops_interval in
- let inset, outset = Connections.select cons in
+ let until_next_activity =
+ if Domains.all_at_max_credit domains
+ then period_ops_interval
+ else min (max 0. (!next_frequent_ops -. start_time)) period_ops_interval in
+ if peaceful_mw <> [] then 0. else until_next_activity
+ in
+ let inset, outset = Connections.select ~only_if:is_peaceful cons in
let rset, wset, _ =
try
Select.select (spec_fds @ inset) outset [] timeout
@@ -418,6 +460,7 @@ let _ =
List.partition (fun fd -> List.mem fd spec_fds) rset in
if List.length sfds > 0 then
process_special_fds sfds;
+
if List.length cfds > 0 || List.length wset > 0 then
process_connection_fds store cons domains cfds wset;
if timeout <> 0. then (
@@ -425,6 +468,7 @@ let _ =
if now > !period_start +. period_ops_interval then
(period_start := now; periodic_ops now)
);
+
process_domains store cons domains
in
diff --git a/tools/tests/x86_emulator/test_x86_emulator.c b/tools/tests/x86_emulator/test_x86_emulator.c
index 9b31a36..7b467fe 100644
--- a/tools/tests/x86_emulator/test_x86_emulator.c
+++ b/tools/tests/x86_emulator/test_x86_emulator.c
@@ -163,6 +163,18 @@ static inline uint64_t xgetbv(uint32_t xcr)
(ebx & (1U << 5)) != 0; \
})
+static int read_segment(
+ enum x86_segment seg,
+ struct segment_register *reg,
+ struct x86_emulate_ctxt *ctxt)
+{
+ if ( !is_x86_user_segment(seg) )
+ return X86EMUL_UNHANDLEABLE;
+ memset(reg, 0, sizeof(*reg));
+ reg->attr.fields.p = 1;
+ return X86EMUL_OKAY;
+}
+
static int read_cr(
unsigned int reg,
unsigned long *val,
@@ -215,6 +227,7 @@ static struct x86_emulate_ops emulops = {
.write = write,
.cmpxchg = cmpxchg,
.cpuid = cpuid,
+ .read_segment = read_segment,
.read_cr = read_cr,
.get_fpu = get_fpu,
};
@@ -732,6 +745,27 @@ int main(int argc, char **argv)
goto fail;
printf("okay\n");
+ printf("%-40s", "Testing mov %%cr4,%%esi (bad ModRM)...");
+ /*
+ * Mod = 1, Reg = 4, R/M = 6 would normally encode a memory reference of
+ * disp8(%esi), but mov to/from cr/dr are special and behave as if they
+ * were encoded with Mod == 3.
+ */
+ instr[0] = 0x0f; instr[1] = 0x20, instr[2] = 0x66;
+ instr[3] = 0; /* Supposed disp8. */
+ regs.esi = 0;
+ regs.eip = (unsigned long)&instr[0];
+ rc = x86_emulate(&ctxt, &emulops);
+ /*
+ * We don't care precicely what gets read from %cr4 into %esi, just so
+ * long as ModRM is treated as a register operand and 0(%esi) isn't
+ * followed as a memory reference.
+ */
+ if ( (rc != X86EMUL_OKAY) ||
+ (regs.eip != (unsigned long)&instr[3]) )
+ goto fail;
+ printf("okay\n");
+
#define decl_insn(which) extern const unsigned char which[], which##_len[]
#define put_insn(which, insn) ".pushsection .test, \"ax\", @progbits\n" \
#which ": " insn "\n" \
diff --git a/tools/xenstore/Makefile b/tools/xenstore/Makefile
index f6dee14..5968f44 100644
--- a/tools/xenstore/Makefile
+++ b/tools/xenstore/Makefile
@@ -34,6 +34,7 @@ XENSTORED_OBJS_$(CONFIG_FreeBSD) = xenstored_posix.o
XENSTORED_OBJS_$(CONFIG_MiniOS) = xenstored_minios.o
XENSTORED_OBJS += $(XENSTORED_OBJS_y)
+LDLIBS_xenstored += -lrt
ifneq ($(XENSTORE_STATIC_CLIENTS),y)
LIBXENSTORE := libxenstore.so
@@ -75,7 +76,7 @@ endif
$(XENSTORED_OBJS): CFLAGS += $(CFLAGS_libxengnttab)
xenstored: $(XENSTORED_OBJS)
- $(CC) $^ $(LDFLAGS) $(LDLIBS_libxenevtchn) $(LDLIBS_libxengnttab) $(LDLIBS_libxenctrl) $(SOCKET_LIBS) $(call LDFLAGS_RPATH,../lib) -o $@ $(APPEND_LDFLAGS)
+ $(CC) $^ $(LDFLAGS) $(LDLIBS_libxenevtchn) $(LDLIBS_libxengnttab) $(LDLIBS_libxenctrl) $(SOCKET_LIBS) $(LDLIBS_xenstored) $(call LDFLAGS_RPATH,../lib) -o $@ $(APPEND_LDFLAGS)
xenstored.a: $(XENSTORED_OBJS)
$(AR) cr $@ $^
diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 3df977b..dc9a26f 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -358,6 +358,7 @@ static void initialize_fds(int sock, int *p_sock_pollfd_idx,
int *ptimeout)
{
struct connection *conn;
+ struct wrl_timestampt now;
if (fds)
memset(fds, 0, sizeof(struct pollfd) * current_array_size);
@@ -377,8 +378,12 @@ static void initialize_fds(int sock, int *p_sock_pollfd_idx,
xce_pollfd_idx = set_fd(xenevtchn_fd(xce_handle),
POLLIN|POLLPRI);
+ wrl_gettime_now(&now);
+ wrl_log_periodic(now);
+
list_for_each_entry(conn, &connections, list) {
if (conn->domain) {
+ wrl_check_timeout(conn->domain, now, ptimeout);
if (domain_can_read(conn) ||
(domain_can_write(conn) &&
!list_empty(&conn->out_list)))
@@ -833,6 +838,7 @@ static void delete_node_single(struct connection *conn, struct node *node)
corrupt(conn, "Could not delete '%s'", node->name);
return;
}
+
domain_entry_dec(conn, node);
}
@@ -972,6 +978,7 @@ static void do_write(struct connection *conn, struct buffered_data *in)
}
add_change_node(conn->transaction, name, false);
+ wrl_apply_debit_direct(conn);
fire_watches(conn, in, name, false);
send_ack(conn, XS_WRITE);
}
@@ -1003,6 +1010,7 @@ static void do_mkdir(struct connection *conn, struct buffered_data *in)
return;
}
add_change_node(conn->transaction, name, false);
+ wrl_apply_debit_direct(conn);
fire_watches(conn, in, name, false);
}
send_ack(conn, XS_MKDIR);
@@ -1129,6 +1137,7 @@ static void do_rm(struct connection *conn, struct buffered_data *in)
if (_rm(conn, node, name)) {
add_change_node(conn->transaction, name, true);
+ wrl_apply_debit_direct(conn);
fire_watches(conn, in, name, true);
send_ack(conn, XS_RM);
}
@@ -1205,6 +1214,7 @@ static void do_set_perms(struct connection *conn, struct buffered_data *in)
}
add_change_node(conn->transaction, name, false);
+ wrl_apply_debit_direct(conn);
fire_watches(conn, in, name, false);
send_ack(conn, XS_SET_PERMS);
}
diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
index ecc614f..9e9d960 100644
--- a/tools/xenstore/xenstored_core.h
+++ b/tools/xenstore/xenstored_core.h
@@ -33,6 +33,12 @@
#include "list.h"
#include "tdb.h"
+#define MIN(a, b) (((a) < (b))? (a) : (b))
+
+typedef int32_t wrl_creditt;
+#define WRL_CREDIT_MAX (1000*1000*1000)
+/* ^ satisfies non-overflow condition for wrl_xfer_credit */
+
struct buffered_data
{
struct list_head list;
diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
index 5de93d4..18ac327 100644
--- a/tools/xenstore/xenstored_domain.c
+++ b/tools/xenstore/xenstored_domain.c
@@ -21,6 +21,8 @@
#include <unistd.h>
#include <stdlib.h>
#include <stdarg.h>
+#include <time.h>
+#include <syslog.h>
#include "utils.h"
#include "talloc.h"
@@ -74,6 +76,11 @@ struct domain
/* number of watch for this domain */
int nbwatch;
+
+ /* write rate limit */
+ wrl_creditt wrl_credit; /* [ -wrl_config_writecost, +_dburst ] */
+ struct wrl_timestampt wrl_timestamp;
+ bool wrl_delay_logged;
};
static LIST_HEAD(domains);
@@ -206,6 +213,8 @@ static int destroy_domain(void *_domain)
fire_watches(NULL, domain, "@releaseDomain", false);
+ wrl_domain_destroy(domain);
+
return 0;
}
@@ -253,6 +262,9 @@ void handle_event(void)
bool domain_can_read(struct connection *conn)
{
struct xenstore_domain_interface *intf = conn->domain->interface;
+
+ if (domain_is_unprivileged(conn) && conn->domain->wrl_credit < 0)
+ return false;
return (intf->req_cons != intf->req_prod);
}
@@ -284,6 +296,8 @@ static struct domain *new_domain(void *context, unsigned int domid,
domain->domid = domid;
domain->path = talloc_domain_path(domain, domid);
+ wrl_domain_new(domain);
+
list_add(&domain->list, &domains);
talloc_set_destructor(domain, destroy_domain);
@@ -751,6 +765,233 @@ int domain_watch(struct connection *conn)
: 0;
}
+static wrl_creditt wrl_config_writecost = WRL_FACTOR;
+static wrl_creditt wrl_config_rate = WRL_RATE * WRL_FACTOR;
+static wrl_creditt wrl_config_dburst = WRL_DBURST * WRL_FACTOR;
+static wrl_creditt wrl_config_gburst = WRL_GBURST * WRL_FACTOR;
+static wrl_creditt wrl_config_newdoms_dburst =
+ WRL_DBURST * WRL_NEWDOMS * WRL_FACTOR;
+
+long wrl_ntransactions;
+
+static long wrl_ndomains;
+static wrl_creditt wrl_reserve; /* [-wrl_config_newdoms_dburst, +_gburst ] */
+static time_t wrl_log_last_warning; /* 0: no previous warning */
+
+void wrl_gettime_now(struct wrl_timestampt *now_wt)
+{
+ struct timespec now_ts;
+ int r;
+
+ r = clock_gettime(CLOCK_MONOTONIC, &now_ts);
+ if (r)
+ barf_perror("Could not find time (clock_gettime failed)");
+
+ now_wt->sec = now_ts.tv_sec;
+ now_wt->msec = now_ts.tv_nsec / 1000000;
+}
+
+static void wrl_xfer_credit(wrl_creditt *debit, wrl_creditt debit_floor,
+ wrl_creditt *credit, wrl_creditt credit_ceil)
+ /*
+ * Transfers zero or more credit from "debit" to "credit".
+ * Transfers as much as possible while maintaining
+ * debit >= debit_floor and credit <= credit_ceil.
+ * (If that's violated already, does nothing.)
+ *
+ * Sufficient conditions to avoid overflow, either of:
+ * |every argument| <= 0x3fffffff
+ * |every argument| <= 1E9
+ * |every argument| <= WRL_CREDIT_MAX
+ * (And this condition is preserved.)
+ */
+{
+ wrl_creditt xfer = MIN( *debit - debit_floor,
+ credit_ceil - *credit );
+ if (xfer > 0) {
+ *debit -= xfer;
+ *credit += xfer;
+ }
+}
+
+void wrl_domain_new(struct domain *domain)
+{
+ domain->wrl_credit = 0;
+ wrl_gettime_now(&domain->wrl_timestamp);
+ wrl_ndomains++;
+ /* Steal up to DBURST from the reserve */
+ wrl_xfer_credit(&wrl_reserve, -wrl_config_newdoms_dburst,
+ &domain->wrl_credit, wrl_config_dburst);
+}
+
+void wrl_domain_destroy(struct domain *domain)
+{
+ wrl_ndomains--;
+ /*
+ * Don't bother recalculating domain's credit - this just
+ * means we don't give the reserve the ending domain's credit
+ * for time elapsed since last update.
+ */
+ wrl_xfer_credit(&domain->wrl_credit, 0,
+ &wrl_reserve, wrl_config_dburst);
+}
+
+void wrl_credit_update(struct domain *domain, struct wrl_timestampt now)
+{
+ /*
+ * We want to calculate
+ * credit += (now - timestamp) * RATE / ndoms;
+ * But we want it to saturate, and to avoid floating point.
+ * To avoid rounding errors from constantly adding small
+ * amounts of credit, we only add credit for whole milliseconds.
+ */
+ long seconds = now.sec - domain->wrl_timestamp.sec;
+ long milliseconds = now.msec - domain->wrl_timestamp.msec;
+ long msec;
+ int64_t denom, num;
+ wrl_creditt surplus;
+
+ seconds = MIN(seconds, 1000*1000); /* arbitrary, prevents overflow */
+ msec = seconds * 1000 + milliseconds;
+
+ if (msec < 0)
+ /* shouldn't happen with CLOCK_MONOTONIC */
+ msec = 0;
+
+ /* 32x32 -> 64 cannot overflow */
+ denom = (int64_t)msec * wrl_config_rate;
+ num = (int64_t)wrl_ndomains * 1000;
+ /* denom / num <= 1E6 * wrl_config_rate, so with
+ reasonable wrl_config_rate, denom / num << 2^64 */
+
+ /* at last! */
+ domain->wrl_credit = MIN( (int64_t)domain->wrl_credit + denom / num,
+ WRL_CREDIT_MAX );
+ /* (maybe briefly violating the DBURST cap on wrl_credit) */
+
+ /* maybe take from the reserve to make us nonnegative */
+ wrl_xfer_credit(&wrl_reserve, 0,
+ &domain->wrl_credit, 0);
+
+ /* return any surplus (over DBURST) to the reserve */
+ surplus = 0;
+ wrl_xfer_credit(&domain->wrl_credit, wrl_config_dburst,
+ &surplus, WRL_CREDIT_MAX);
+ wrl_xfer_credit(&surplus, 0,
+ &wrl_reserve, wrl_config_gburst);
+ /* surplus is now implicitly discarded */
+
+ domain->wrl_timestamp = now;
+
+ trace("wrl: dom %4d %6ld msec %9ld credit %9ld reserve"
+ " %9ld discard\n",
+ domain->domid,
+ msec,
+ (long)domain->wrl_credit, (long)wrl_reserve,
+ (long)surplus);
+}
+
+void wrl_check_timeout(struct domain *domain,
+ struct wrl_timestampt now,
+ int *ptimeout)
+{
+ uint64_t num, denom;
+ int wakeup;
+
+ wrl_credit_update(domain, now);
+
+ if (domain->wrl_credit >= 0)
+ /* not blocked */
+ return;
+
+ if (!*ptimeout)
+ /* already decided on immediate wakeup,
+ so no need to calculate our timeout */
+ return;
+
+ /* calculate wakeup = now + -credit / (RATE / ndoms); */
+
+ /* credit cannot go more -ve than one transaction,
+ * so the first multiplication cannot overflow even 32-bit */
+ num = (uint64_t)(-domain->wrl_credit * 1000) * wrl_ndomains;
+ denom = wrl_config_rate;
+
+ wakeup = MIN( num / denom /* uint64_t */, INT_MAX );
+ if (*ptimeout==-1 || wakeup < *ptimeout)
+ *ptimeout = wakeup;
+
+ trace("wrl: domain %u credit=%ld (reserve=%ld) SLEEPING for %d\n",
+ domain->domid,
+ (long)domain->wrl_credit, (long)wrl_reserve,
+ wakeup);
+}
+
+#define WRL_LOG(now, ...) \
+ (syslog(LOG_WARNING, "write rate limit: " __VA_ARGS__))
+
+void wrl_apply_debit_actual(struct domain *domain)
+{
+ struct wrl_timestampt now;
+
+ if (!domain)
+ /* sockets escape the write rate limit */
+ return;
+
+ wrl_gettime_now(&now);
+ wrl_credit_update(domain, now);
+
+ domain->wrl_credit -= wrl_config_writecost;
+ trace("wrl: domain %u credit=%ld (reserve=%ld)\n",
+ domain->domid,
+ (long)domain->wrl_credit, (long)wrl_reserve);
+
+ if (domain->wrl_credit < 0) {
+ if (!domain->wrl_delay_logged) {
+ domain->wrl_delay_logged = true;
+ WRL_LOG(now, "domain %ld is affected",
+ (long)domain->domid);
+ } else if (!wrl_log_last_warning) {
+ WRL_LOG(now, "rate limiting restarts");
+ }
+ wrl_log_last_warning = now.sec;
+ }
+}
+
+void wrl_log_periodic(struct wrl_timestampt now)
+{
+ if (wrl_log_last_warning &&
+ (now.sec - wrl_log_last_warning) > WRL_LOGEVERY) {
+ WRL_LOG(now, "not in force recently");
+ wrl_log_last_warning = 0;
+ }
+}
+
+void wrl_apply_debit_direct(struct connection *conn)
+{
+ if (!conn)
+ /* some writes are generated internally */
+ return;
+
+ if (conn->transaction)
+ /* these are accounted for when the transaction ends */
+ return;
+
+ if (!wrl_ntransactions)
+ /* we don't conflict with anyone */
+ return;
+
+ wrl_apply_debit_actual(conn->domain);
+}
+
+void wrl_apply_debit_trans_commit(struct connection *conn)
+{
+ if (wrl_ntransactions <= 1)
+ /* our own transaction appears in the counter */
+ return;
+
+ wrl_apply_debit_actual(conn->domain);
+}
+
/*
* Local variables:
* c-file-style: "linux"
diff --git a/tools/xenstore/xenstored_domain.h b/tools/xenstore/xenstored_domain.h
index 2554423..561ab5d 100644
--- a/tools/xenstore/xenstored_domain.h
+++ b/tools/xenstore/xenstored_domain.h
@@ -65,4 +65,31 @@ void domain_watch_inc(struct connection *conn);
void domain_watch_dec(struct connection *conn);
int domain_watch(struct connection *conn);
+/* Write rate limiting */
+
+#define WRL_FACTOR 1000 /* for fixed-point arithmetic */
+#define WRL_RATE 200
+#define WRL_DBURST 10
+#define WRL_GBURST 1000
+#define WRL_NEWDOMS 5
+#define WRL_LOGEVERY 120 /* seconds */
+
+struct wrl_timestampt {
+ time_t sec;
+ int msec;
+};
+
+extern long wrl_ntransactions;
+
+void wrl_gettime_now(struct wrl_timestampt *now_ts);
+void wrl_domain_new(struct domain *domain);
+void wrl_domain_destroy(struct domain *domain);
+void wrl_credit_update(struct domain *domain, struct wrl_timestampt now);
+void wrl_check_timeout(struct domain *domain,
+ struct wrl_timestampt now,
+ int *ptimeout);
+void wrl_log_periodic(struct wrl_timestampt now);
+void wrl_apply_debit_direct(struct connection *conn);
+void wrl_apply_debit_trans_commit(struct connection *conn);
+
#endif /* _XENSTORED_DOMAIN_H */
diff --git a/tools/xenstore/xenstored_transaction.c b/tools/xenstore/xenstored_transaction.c
index 84cb0bf..5059a11 100644
--- a/tools/xenstore/xenstored_transaction.c
+++ b/tools/xenstore/xenstored_transaction.c
@@ -120,6 +120,7 @@ static int destroy_transaction(void *_transaction)
{
struct transaction *trans = _transaction;
+ wrl_ntransactions--;
trace_destroy(trans, "transaction");
if (trans->tdb)
tdb_close(trans->tdb);
@@ -183,6 +184,7 @@ void do_transaction_start(struct connection *conn, struct buffered_data *in)
talloc_steal(conn, trans);
talloc_set_destructor(trans, destroy_transaction);
conn->transaction_started++;
+ wrl_ntransactions++;
snprintf(id_str, sizeof(id_str), "%u", trans->id);
send_reply(conn, XS_TRANSACTION_START, id_str, strlen(id_str)+1);
@@ -218,6 +220,9 @@ void do_transaction_end(struct connection *conn, struct buffered_data *in)
send_error(conn, EAGAIN);
return;
}
+
+ wrl_apply_debit_trans_commit(conn);
+
if (!replace_tdb(trans->tdb_name, trans->tdb)) {
send_error(conn, errno);
return;
diff --git a/xen/Makefile b/xen/Makefile
index 22d1361..25bd1f3 100644
--- a/xen/Makefile
+++ b/xen/Makefile
@@ -2,7 +2,7 @@
# All other places this is stored (eg. compile.h) should be autogenerated.
export XEN_VERSION = 4
export XEN_SUBVERSION = 8
-export XEN_EXTRAVERSION ?= .1-pre$(XEN_VENDORVERSION)
+export XEN_EXTRAVERSION ?= .1$(XEN_VENDORVERSION)
export XEN_FULLVERSION = $(XEN_VERSION).$(XEN_SUBVERSION)$(XEN_EXTRAVERSION)
-include xen-version
diff --git a/xen/arch/arm/alternative.c b/xen/arch/arm/alternative.c
index b9c2b3a..fdf5911 100644
--- a/xen/arch/arm/alternative.c
+++ b/xen/arch/arm/alternative.c
@@ -25,6 +25,7 @@
#include <xen/vmap.h>
#include <xen/smp.h>
#include <xen/stop_machine.h>
+#include <xen/virtual_region.h>
#include <asm/alternative.h>
#include <asm/atomic.h>
#include <asm/byteorder.h>
@@ -155,8 +156,12 @@ static int __apply_alternatives_multi_stop(void *unused)
int ret;
struct alt_region region;
mfn_t xen_mfn = _mfn(virt_to_mfn(_start));
- unsigned int xen_order = get_order_from_bytes(_end - _start);
+ paddr_t xen_size = _end - _start;
+ unsigned int xen_order = get_order_from_bytes(xen_size);
void *xenmap;
+ struct virtual_region patch_region = {
+ .list = LIST_HEAD_INIT(patch_region.list),
+ };
BUG_ON(patched);
@@ -170,6 +175,15 @@ static int __apply_alternatives_multi_stop(void *unused)
BUG_ON(!xenmap);
/*
+ * If we generate a new branch instruction, the target will be
+ * calculated in this re-mapped Xen region. So we have to register
+ * this re-mapped Xen region as a virtual region temporarily.
+ */
+ patch_region.start = xenmap;
+ patch_region.end = xenmap + xen_size;
+ register_virtual_region(&patch_region);
+
+ /*
* Find the virtual address of the alternative region in the new
* mapping.
* alt_instr contains relative offset, so the function
@@ -183,6 +197,8 @@ static int __apply_alternatives_multi_stop(void *unused)
/* The patching is not expected to fail during boot. */
BUG_ON(ret != 0);
+ unregister_virtual_region(&patch_region);
+
vunmap(xenmap);
/* Barriers provided by the cache flushing */
diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index e8a400c..418b1cc 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -48,20 +48,6 @@ struct map_range_data
p2m_type_t p2mt;
};
-static const struct dt_device_match dev_map_attrs[] __initconst =
-{
- {
- __DT_MATCH_COMPATIBLE("mmio-sram"),
- __DT_MATCH_PROP("no-memory-wc"),
- .data = (void *) (uintptr_t) p2m_mmio_direct_dev,
- },
- {
- __DT_MATCH_COMPATIBLE("mmio-sram"),
- .data = (void *) (uintptr_t) p2m_mmio_direct_nc,
- },
- { /* sentinel */ },
-};
-
//#define DEBUG_11_ALLOCATION
#ifdef DEBUG_11_ALLOCATION
# define D11PRINT(fmt, args...) printk(XENLOG_DEBUG fmt, ##args)
@@ -1159,21 +1145,6 @@ static int handle_device(struct domain *d, struct dt_device_node *dev,
return 0;
}
-static p2m_type_t lookup_map_attr(struct dt_device_node *node,
- p2m_type_t parent_p2mt)
-{
- const struct dt_device_match *r;
-
- /* Search and if nothing matches, use the parent's attributes. */
- r = dt_match_node(dev_map_attrs, node);
-
- /*
- * If this node does not dictate specific mapping attributes,
- * it inherits its parent's attributes.
- */
- return r ? (uintptr_t) r->data : parent_p2mt;
-}
-
static int handle_node(struct domain *d, struct kernel_info *kinfo,
struct dt_device_node *node,
p2m_type_t p2mt)
@@ -1264,7 +1235,6 @@ static int handle_node(struct domain *d, struct kernel_info *kinfo,
"WARNING: Path %s is reserved, skip the node as we may re-use the path.\n",
path);
- p2mt = lookup_map_attr(node, p2mt);
res = handle_device(d, node, p2mt);
if ( res)
return res;
@@ -1319,7 +1289,7 @@ static int handle_node(struct domain *d, struct kernel_info *kinfo,
static int prepare_dtb(struct domain *d, struct kernel_info *kinfo)
{
- const p2m_type_t default_p2mt = p2m_mmio_direct_dev;
+ const p2m_type_t default_p2mt = p2m_mmio_direct_c;
const void *fdt;
int new_size;
int ret;
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 63c744a..a5348f2 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -205,7 +205,10 @@ int gic_remove_irq_from_guest(struct domain *d, unsigned int virq,
*/
if ( test_bit(_IRQ_INPROGRESS, &desc->status) ||
!test_bit(_IRQ_DISABLED, &desc->status) )
+ {
+ vgic_unlock_rank(v_target, rank, flags);
return -EBUSY;
+ }
}
clear_bit(_IRQ_GUEST, &desc->status);
diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
index 06d4843..508028b 100644
--- a/xen/arch/arm/irq.c
+++ b/xen/arch/arm/irq.c
@@ -477,26 +477,32 @@ int route_irq_to_guest(struct domain *d, unsigned int virq,
*/
if ( desc->action != NULL )
{
- struct domain *ad = irq_get_domain(desc);
-
- if ( test_bit(_IRQ_GUEST, &desc->status) && d == ad )
+ if ( test_bit(_IRQ_GUEST, &desc->status) )
{
- if ( irq_get_guest_info(desc)->virq != virq )
+ struct domain *ad = irq_get_domain(desc);
+
+ if ( d == ad )
+ {
+ if ( irq_get_guest_info(desc)->virq != virq )
+ {
+ printk(XENLOG_G_ERR
+ "d%u: IRQ %u is already assigned to vIRQ %u\n",
+ d->domain_id, irq, irq_get_guest_info(desc)->virq);
+ retval = -EBUSY;
+ }
+ }
+ else
{
- printk(XENLOG_G_ERR
- "d%u: IRQ %u is already assigned to vIRQ %u\n",
- d->domain_id, irq, irq_get_guest_info(desc)->virq);
+ printk(XENLOG_G_ERR "IRQ %u is already used by domain %u\n",
+ irq, ad->domain_id);
retval = -EBUSY;
}
- goto out;
}
-
- if ( test_bit(_IRQ_GUEST, &desc->status) )
- printk(XENLOG_G_ERR "IRQ %u is already used by domain %u\n",
- irq, ad->domain_id);
else
+ {
printk(XENLOG_G_ERR "IRQ %u is already used by Xen\n", irq);
- retval = -EBUSY;
+ retval = -EBUSY;
+ }
goto out;
}
diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 99588a3..596283f 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -390,6 +390,16 @@ void flush_page_to_ram(unsigned long mfn)
clean_and_invalidate_dcache_va_range(v, PAGE_SIZE);
unmap_domain_page(v);
+
+ /*
+ * For some of the instruction cache (such as VIPT), the entire I-Cache
+ * needs to be flushed to guarantee that all the aliases of a given
+ * physical address will be removed from the cache.
+ * Invalidating the I-Cache by VA highly depends on the behavior of the
+ * I-Cache (See D4.9.2 in ARM DDI 0487A.k_iss10775). Instead of using flush
+ * by VA on select platforms, we just flush the entire cache here.
+ */
+ invalidate_icache();
}
void __init arch_init_memory(void)
diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index cc5634b..c7c726b 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -135,13 +135,12 @@ void p2m_restore_state(struct vcpu *n)
{
register_t hcr;
struct p2m_domain *p2m = &n->domain->arch.p2m;
+ uint8_t *last_vcpu_ran;
if ( is_idle_vcpu(n) )
return;
hcr = READ_SYSREG(HCR_EL2);
- WRITE_SYSREG(hcr & ~HCR_VM, HCR_EL2);
- isb();
WRITE_SYSREG64(p2m->vttbr, VTTBR_EL2);
isb();
@@ -156,6 +155,17 @@ void p2m_restore_state(struct vcpu *n)
WRITE_SYSREG(hcr, HCR_EL2);
isb();
+
+ last_vcpu_ran = &p2m->last_vcpu_ran[smp_processor_id()];
+
+ /*
+ * Flush local TLB for the domain to prevent wrong TLB translation
+ * when running multiple vCPU of the same domain on a single pCPU.
+ */
+ if ( *last_vcpu_ran != INVALID_VCPU_ID && *last_vcpu_ran != n->vcpu_id )
+ flush_tlb_local();
+
+ *last_vcpu_ran = n->vcpu_id;
}
static void p2m_flush_tlb(struct p2m_domain *p2m)
@@ -734,6 +744,7 @@ static void p2m_free_entry(struct p2m_domain *p2m,
unsigned int i;
lpae_t *table;
mfn_t mfn;
+ struct page_info *pg;
/* Nothing to do if the entry is invalid. */
if ( !p2m_valid(entry) )
@@ -771,7 +782,10 @@ static void p2m_free_entry(struct p2m_domain *p2m,
mfn = _mfn(entry.p2m.base);
ASSERT(mfn_valid(mfn_x(mfn)));
- free_domheap_page(mfn_to_page(mfn_x(mfn)));
+ pg = mfn_to_page(mfn_x(mfn));
+
+ page_list_del(pg, &p2m->pages);
+ free_domheap_page(pg);
}
static bool p2m_split_superpage(struct p2m_domain *p2m, lpae_t *entry,
@@ -982,9 +996,10 @@ static int __p2m_set_entry(struct p2m_domain *p2m,
/*
* The radix-tree can only work on 4KB. This is only used when
- * memaccess is enabled.
+ * memaccess is enabled and during shutdown.
*/
- ASSERT(!p2m->mem_access_enabled || page_order == 0);
+ ASSERT(!p2m->mem_access_enabled || page_order == 0 ||
+ p2m->domain->is_dying);
/*
* The access type should always be p2m_access_rwx when the mapping
* is removed.
@@ -1176,7 +1191,7 @@ int map_dev_mmio_region(struct domain *d,
if ( !(nr && iomem_access_permitted(d, mfn_x(mfn), mfn_x(mfn) + nr - 1)) )
return 0;
- res = map_mmio_regions(d, gfn, nr, mfn);
+ res = p2m_insert_mapping(d, gfn, nr, mfn, p2m_mmio_direct_c);
if ( res < 0 )
{
printk(XENLOG_G_ERR "Unable to map MFNs [%#"PRI_mfn" - %#"PRI_mfn" in Dom%d\n",
@@ -1308,6 +1323,7 @@ int p2m_init(struct domain *d)
{
struct p2m_domain *p2m = &d->arch.p2m;
int rc = 0;
+ unsigned int cpu;
rwlock_init(&p2m->lock);
INIT_PAGE_LIST_HEAD(&p2m->pages);
@@ -1336,6 +1352,17 @@ int p2m_init(struct domain *d)
rc = p2m_alloc_table(d);
+ /*
+ * Make sure that the type chosen to is able to store the an vCPU ID
+ * between 0 and the maximum of virtual CPUS supported as long as
+ * the INVALID_VCPU_ID.
+ */
+ BUILD_BUG_ON((1 << (sizeof(p2m->last_vcpu_ran[0]) * 8)) < MAX_VIRT_CPUS);
+ BUILD_BUG_ON((1 << (sizeof(p2m->last_vcpu_ran[0])* 8)) < INVALID_VCPU_ID);
+
+ for_each_possible_cpu(cpu)
+ p2m->last_vcpu_ran[cpu] = INVALID_VCPU_ID;
+
return rc;
}
diff --git a/xen/arch/arm/psci.c b/xen/arch/arm/psci.c
index 7966b5e..34ee97e 100644
--- a/xen/arch/arm/psci.c
+++ b/xen/arch/arm/psci.c
@@ -147,7 +147,7 @@ int __init psci_init_0_2(void)
psci_ver = call_smc(PSCI_0_2_FN_PSCI_VERSION, 0, 0, 0);
/* For the moment, we only support PSCI 0.2 and PSCI 1.x */
- if ( psci_ver != PSCI_VERSION(0, 2) && PSCI_VERSION_MAJOR(psci_ver != 1) )
+ if ( psci_ver != PSCI_VERSION(0, 2) && PSCI_VERSION_MAJOR(psci_ver) != 1 )
{
printk("Error: Unrecognized PSCI version %u.%u\n",
PSCI_VERSION_MAJOR(psci_ver), PSCI_VERSION_MINOR(psci_ver));
diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index 38eb888..861c39e 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -784,6 +784,8 @@ void __init start_xen(unsigned long boot_phys_offset,
smp_init_cpus();
cpus = smp_get_max_cpus();
+ printk(XENLOG_INFO "SMP: Allowing %u CPUs\n", cpus);
+ nr_cpu_ids = cpus;
init_xen_time();
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 8ff73fe..90aba2a 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -101,6 +101,19 @@ static int debug_stack_lines = 40;
integer_param("debug_stack_lines", debug_stack_lines);
+static enum {
+ TRAP,
+ NATIVE,
+} vwfi;
+
+static void __init parse_vwfi(const char *s)
+{
+ if ( !strcmp(s, "native") )
+ vwfi = NATIVE;
+ else
+ vwfi = TRAP;
+}
+custom_param("vwfi", parse_vwfi);
void init_traps(void)
{
@@ -127,8 +140,8 @@ void init_traps(void)
/* Setup hypervisor traps */
WRITE_SYSREG(HCR_PTW|HCR_BSU_INNER|HCR_AMO|HCR_IMO|HCR_FMO|HCR_VM|
- HCR_TWE|HCR_TWI|HCR_TSC|HCR_TAC|HCR_SWIO|HCR_TIDCP|HCR_FB,
- HCR_EL2);
+ (vwfi != NATIVE ? (HCR_TWI|HCR_TWE) : 0) |
+ HCR_TSC|HCR_TAC|HCR_SWIO|HCR_TIDCP|HCR_FB,HCR_EL2);
isb();
}
@@ -643,7 +656,7 @@ static const char *mode_string(uint32_t cpsr)
};
mode = cpsr & PSR_MODE_MASK;
- if ( mode > ARRAY_SIZE(mode_strings) )
+ if ( mode >= ARRAY_SIZE(mode_strings) )
return "Unknown";
return mode_strings[mode] ? : "Unknown";
}
@@ -2280,6 +2293,20 @@ static void do_sysreg(struct cpu_user_regs *regs,
return inject_undef64_exception(regs, hsr.len);
/*
+ * ICC_SRE_EL2.Enable = 0
+ *
+ * GIC Architecture Specification (IHI 0069C): Section 8.1.9
+ */
+ case HSR_SYSREG_ICC_SRE_EL1:
+ /*
+ * Trapped when the guest is using GICv2 whilst the platform
+ * interrupt controller is GICv3. In this case, the register
+ * should be emulate as RAZ/WI to tell the guest to use the GIC
+ * memory mapped interface (i.e GICv2 compatibility).
+ */
+ return handle_raz_wi(regs, regidx, hsr.sysreg.read, hsr, 1);
+
+ /*
* HCR_EL2.TIDCP
*
* ARMv8 (DDI 0487A.d): D1-1501 Table D1-43
diff --git a/xen/arch/arm/vgic-v2.c b/xen/arch/arm/vgic-v2.c
index c6d280e..92188a2 100644
--- a/xen/arch/arm/vgic-v2.c
+++ b/xen/arch/arm/vgic-v2.c
@@ -79,7 +79,7 @@ static uint32_t vgic_fetch_itargetsr(struct vgic_irq_rank *rank,
offset &= ~(NR_TARGETS_PER_ITARGETSR - 1);
for ( i = 0; i < NR_TARGETS_PER_ITARGETSR; i++, offset++ )
- reg |= (1 << rank->vcpu[offset]) << (i * NR_BITS_PER_TARGET);
+ reg |= (1 << read_atomic(&rank->vcpu[offset])) << (i * NR_BITS_PER_TARGET);
return reg;
}
@@ -152,7 +152,7 @@ static void vgic_store_itargetsr(struct domain *d, struct vgic_irq_rank *rank,
/* The vCPU ID always starts from 0 */
new_target--;
- old_target = rank->vcpu[offset];
+ old_target = read_atomic(&rank->vcpu[offset]);
/* Only migrate the vIRQ if the target vCPU has changed */
if ( new_target != old_target )
@@ -162,7 +162,7 @@ static void vgic_store_itargetsr(struct domain *d, struct vgic_irq_rank *rank,
virq);
}
- rank->vcpu[offset] = new_target;
+ write_atomic(&rank->vcpu[offset], new_target);
}
}
diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c
index ec038a3..2d71cac 100644
--- a/xen/arch/arm/vgic-v3.c
+++ b/xen/arch/arm/vgic-v3.c
@@ -107,7 +107,7 @@ static uint64_t vgic_fetch_irouter(struct vgic_irq_rank *rank,
/* Get the index in the rank */
offset &= INTERRUPT_RANK_MASK;
- return vcpuid_to_vaffinity(rank->vcpu[offset]);
+ return vcpuid_to_vaffinity(read_atomic(&rank->vcpu[offset]));
}
/*
@@ -135,7 +135,7 @@ static void vgic_store_irouter(struct domain *d, struct vgic_irq_rank *rank,
offset &= virq & INTERRUPT_RANK_MASK;
new_vcpu = vgic_v3_irouter_to_vcpu(d, irouter);
- old_vcpu = d->vcpu[rank->vcpu[offset]];
+ old_vcpu = d->vcpu[read_atomic(&rank->vcpu[offset])];
/*
* From the spec (see 8.9.13 in IHI 0069A), any write with an
@@ -153,7 +153,7 @@ static void vgic_store_irouter(struct domain *d, struct vgic_irq_rank *rank,
if ( new_vcpu != old_vcpu )
vgic_migrate_irq(old_vcpu, new_vcpu, virq);
- rank->vcpu[offset] = new_vcpu->vcpu_id;
+ write_atomic(&rank->vcpu[offset], new_vcpu->vcpu_id);
}
static inline bool vgic_reg64_check_access(struct hsr_dabt dabt)
diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index 0965119..d12e6f0 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -85,7 +85,7 @@ static void vgic_rank_init(struct vgic_irq_rank *rank, uint8_t index,
rank->index = index;
for ( i = 0; i < NR_INTERRUPT_PER_RANK; i++ )
- rank->vcpu[i] = vcpu;
+ write_atomic(&rank->vcpu[i], vcpu);
}
int domain_vgic_register(struct domain *d, int *mmio_count)
@@ -218,28 +218,11 @@ int vcpu_vgic_free(struct vcpu *v)
return 0;
}
-/* The function should be called by rank lock taken. */
-static struct vcpu *__vgic_get_target_vcpu(struct vcpu *v, unsigned int virq)
-{
- struct vgic_irq_rank *rank = vgic_rank_irq(v, virq);
-
- ASSERT(spin_is_locked(&rank->lock));
-
- return v->domain->vcpu[rank->vcpu[virq & INTERRUPT_RANK_MASK]];
-}
-
-/* takes the rank lock */
struct vcpu *vgic_get_target_vcpu(struct vcpu *v, unsigned int virq)
{
- struct vcpu *v_target;
struct vgic_irq_rank *rank = vgic_rank_irq(v, virq);
- unsigned long flags;
-
- vgic_lock_rank(v, rank, flags);
- v_target = __vgic_get_target_vcpu(v, virq);
- vgic_unlock_rank(v, rank, flags);
-
- return v_target;
+ int target = read_atomic(&rank->vcpu[virq & INTERRUPT_RANK_MASK]);
+ return v->domain->vcpu[target];
}
static int vgic_get_virq_priority(struct vcpu *v, unsigned int virq)
@@ -326,7 +309,7 @@ void vgic_disable_irqs(struct vcpu *v, uint32_t r, int n)
while ( (i = find_next_bit(&mask, 32, i)) < 32 ) {
irq = i + (32 * n);
- v_target = __vgic_get_target_vcpu(v, irq);
+ v_target = vgic_get_target_vcpu(v, irq);
p = irq_to_pending(v_target, irq);
clear_bit(GIC_IRQ_GUEST_ENABLED, &p->status);
gic_remove_from_queues(v_target, irq);
@@ -368,7 +351,7 @@ void vgic_enable_irqs(struct vcpu *v, uint32_t r, int n)
while ( (i = find_next_bit(&mask, 32, i)) < 32 ) {
irq = i + (32 * n);
- v_target = __vgic_get_target_vcpu(v, irq);
+ v_target = vgic_get_target_vcpu(v, irq);
p = irq_to_pending(v_target, irq);
set_bit(GIC_IRQ_GUEST_ENABLED, &p->status);
spin_lock_irqsave(&v_target->arch.vgic.lock, flags);
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index eae643f..093856a 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1315,16 +1315,24 @@ static inline int check_segment(struct segment_register *reg,
return 0;
}
- if ( seg != x86_seg_tr && !reg->attr.fields.s )
+ if ( seg == x86_seg_tr )
{
- gprintk(XENLOG_ERR,
- "System segment provided for a code or data segment\n");
- return -EINVAL;
- }
+ if ( reg->attr.fields.s )
+ {
+ gprintk(XENLOG_ERR, "Code or data segment provided for TR\n");
+ return -EINVAL;
+ }
- if ( seg == x86_seg_tr && reg->attr.fields.s )
+ if ( reg->attr.fields.type != SYS_DESC_tss_busy )
+ {
+ gprintk(XENLOG_ERR, "Non-32-bit-TSS segment provided for TR\n");
+ return -EINVAL;
+ }
+ }
+ else if ( !reg->attr.fields.s )
{
- gprintk(XENLOG_ERR, "Code or data segment provided for TR\n");
+ gprintk(XENLOG_ERR,
+ "System segment provided for a code or data segment\n");
return -EINVAL;
}
@@ -1387,7 +1395,8 @@ int arch_set_info_hvm_guest(struct vcpu *v, const vcpu_hvm_context_t *ctx)
#define SEG(s, r) ({ \
s = (struct segment_register){ .base = (r)->s ## _base, \
.limit = (r)->s ## _limit, \
- .attr.bytes = (r)->s ## _ar }; \
+ .attr.bytes = (r)->s ## _ar | \
+ (x86_seg_##s != x86_seg_tr ? 1 : 2) }; \
check_segment(&s, x86_seg_ ## s); })
rc = SEG(cs, regs);
diff --git a/xen/arch/x86/efi/efi-boot.h b/xen/arch/x86/efi/efi-boot.h
index 388c4ea..d11b9c4 100644
--- a/xen/arch/x86/efi/efi-boot.h
+++ b/xen/arch/x86/efi/efi-boot.h
@@ -13,7 +13,11 @@ static struct file __initdata ucode;
static multiboot_info_t __initdata mbi = {
.flags = MBI_MODULES | MBI_LOADERNAME
};
-static module_t __initdata mb_modules[3];
+/*
+ * The array size needs to be one larger than the number of modules we
+ * support - see __start_xen().
+ */
+static module_t __initdata mb_modules[5];
static void __init edd_put_string(u8 *dst, size_t n, const char *src)
{
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index f8ef6e5..6c30bec 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -387,13 +387,20 @@ void hvm_set_guest_tsc_fixed(struct vcpu *v, u64 guest_tsc, u64 at_tsc)
}
delta_tsc = guest_tsc - tsc;
- v->arch.hvm_vcpu.msr_tsc_adjust += delta_tsc
- - v->arch.hvm_vcpu.cache_tsc_offset;
v->arch.hvm_vcpu.cache_tsc_offset = delta_tsc;
hvm_funcs.set_tsc_offset(v, v->arch.hvm_vcpu.cache_tsc_offset, at_tsc);
}
+static void hvm_set_guest_tsc_msr(struct vcpu *v, u64 guest_tsc)
+{
+ uint64_t tsc_offset = v->arch.hvm_vcpu.cache_tsc_offset;
+
+ hvm_set_guest_tsc(v, guest_tsc);
+ v->arch.hvm_vcpu.msr_tsc_adjust += v->arch.hvm_vcpu.cache_tsc_offset
+ - tsc_offset;
+}
+
void hvm_set_guest_tsc_adjust(struct vcpu *v, u64 tsc_adjust)
{
v->arch.hvm_vcpu.cache_tsc_offset += tsc_adjust
@@ -3940,7 +3947,7 @@ int hvm_msr_write_intercept(unsigned int msr, uint64_t msr_content,
break;
case MSR_IA32_TSC:
- hvm_set_guest_tsc(v, msr_content);
+ hvm_set_guest_tsc_msr(v, msr_content);
break;
case MSR_IA32_TSC_ADJUST:
diff --git a/xen/arch/x86/hvm/mtrr.c b/xen/arch/x86/hvm/mtrr.c
index 228dac1..cc448e7 100644
--- a/xen/arch/x86/hvm/mtrr.c
+++ b/xen/arch/x86/hvm/mtrr.c
@@ -776,17 +776,19 @@ int epte_get_entry_emt(struct domain *d, unsigned long gfn, mfn_t mfn,
if ( v->domain != d )
v = d->vcpu ? d->vcpu[0] : NULL;
- if ( !mfn_valid(mfn_x(mfn)) ||
- rangeset_contains_range(mmio_ro_ranges, mfn_x(mfn),
- mfn_x(mfn) + (1UL << order) - 1) )
- {
- *ipat = 1;
- return MTRR_TYPE_UNCACHABLE;
- }
-
+ /* Mask, not add, for order so it works with INVALID_MFN on unmapping */
if ( rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
- mfn_x(mfn) + (1UL << order) - 1) )
+ mfn_x(mfn) | ((1UL << order) - 1)) )
+ {
+ if ( !order || rangeset_contains_range(mmio_ro_ranges, mfn_x(mfn),
+ mfn_x(mfn) | ((1UL << order) - 1)) )
+ {
+ *ipat = 1;
+ return MTRR_TYPE_UNCACHABLE;
+ }
+ /* Force invalid memory type so resolve_misconfig() will split it */
return -1;
+ }
if ( direct_mmio )
{
@@ -798,6 +800,12 @@ int epte_get_entry_emt(struct domain *d, unsigned long gfn, mfn_t mfn,
return MTRR_TYPE_WRBACK;
}
+ if ( !mfn_valid(mfn_x(mfn)) )
+ {
+ *ipat = 1;
+ return MTRR_TYPE_UNCACHABLE;
+ }
+
if ( !need_iommu(d) && !cache_flush_permitted(d) )
{
*ipat = 1;
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 37bd6c4..8edc846 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -353,7 +353,7 @@ static void svm_save_cpu_state(struct vcpu *v, struct hvm_hw_cpu *data)
data->msr_cstar = vmcb->cstar;
data->msr_syscall_mask = vmcb->sfmask;
data->msr_efer = v->arch.hvm_vcpu.guest_efer;
- data->msr_flags = -1ULL;
+ data->msr_flags = 0;
}
diff --git a/xen/arch/x86/hvm/svm/vmcb.c b/xen/arch/x86/hvm/svm/vmcb.c
index 9ea014f..f982fc9 100644
--- a/xen/arch/x86/hvm/svm/vmcb.c
+++ b/xen/arch/x86/hvm/svm/vmcb.c
@@ -72,6 +72,9 @@ static int construct_vmcb(struct vcpu *v)
struct arch_svm_struct *arch_svm = &v->arch.hvm_svm;
struct vmcb_struct *vmcb = arch_svm->vmcb;
+ /* Build-time check of the size of VMCB AMD structure. */
+ BUILD_BUG_ON(sizeof(*vmcb) != PAGE_SIZE);
+
vmcb->_general1_intercepts =
GENERAL1_INTERCEPT_INTR | GENERAL1_INTERCEPT_NMI |
GENERAL1_INTERCEPT_SMI | GENERAL1_INTERCEPT_INIT |
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 0995496..4646ecc 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -552,6 +552,20 @@ static void vmx_load_vmcs(struct vcpu *v)
local_irq_restore(flags);
}
+void vmx_vmcs_reload(struct vcpu *v)
+{
+ /*
+ * As we may be running with interrupts disabled, we can't acquire
+ * v->arch.hvm_vmx.vmcs_lock here. However, with interrupts disabled
+ * the VMCS can't be taken away from us anymore if we still own it.
+ */
+ ASSERT(v->is_running || !local_irq_is_enabled());
+ if ( v->arch.hvm_vmx.vmcs_pa == this_cpu(current_vmcs) )
+ return;
+
+ vmx_load_vmcs(v);
+}
+
int vmx_cpu_up_prepare(unsigned int cpu)
{
/*
@@ -1090,6 +1104,9 @@ static int construct_vmcs(struct vcpu *v)
vmx_disable_intercept_for_msr(v, MSR_IA32_BNDCFGS, MSR_TYPE_R | MSR_TYPE_W);
}
+ /* All guest MSR state is dirty. */
+ v->arch.hvm_vmx.msr_state.flags = ((1u << VMX_MSR_COUNT) - 1);
+
/* I/O access bitmap. */
__vmwrite(IO_BITMAP_A, __pa(d->arch.hvm_domain.io_bitmap));
__vmwrite(IO_BITMAP_B, __pa(d->arch.hvm_domain.io_bitmap) + PAGE_SIZE);
@@ -1652,10 +1669,7 @@ void vmx_do_resume(struct vcpu *v)
bool_t debug_state;
if ( v->arch.hvm_vmx.active_cpu == smp_processor_id() )
- {
- if ( v->arch.hvm_vmx.vmcs_pa != this_cpu(current_vmcs) )
- vmx_load_vmcs(v);
- }
+ vmx_vmcs_reload(v);
else
{
/*
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 7b2c50c..9a42e2e 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -739,13 +739,12 @@ static int vmx_vmcs_restore(struct vcpu *v, struct hvm_hw_cpu *c)
static void vmx_save_cpu_state(struct vcpu *v, struct hvm_hw_cpu *data)
{
struct vmx_msr_state *guest_state = &v->arch.hvm_vmx.msr_state;
- unsigned long guest_flags = guest_state->flags;
data->shadow_gs = v->arch.hvm_vmx.shadow_gs;
data->msr_cstar = v->arch.hvm_vmx.cstar;
/* save msrs */
- data->msr_flags = guest_flags;
+ data->msr_flags = 0;
data->msr_lstar = guest_state->msrs[VMX_INDEX_MSR_LSTAR];
data->msr_star = guest_state->msrs[VMX_INDEX_MSR_STAR];
data->msr_syscall_mask = guest_state->msrs[VMX_INDEX_MSR_SYSCALL_MASK];
@@ -756,7 +755,7 @@ static void vmx_load_cpu_state(struct vcpu *v, struct hvm_hw_cpu *data)
struct vmx_msr_state *guest_state = &v->arch.hvm_vmx.msr_state;
/* restore msrs */
- guest_state->flags = data->msr_flags & 7;
+ guest_state->flags = ((1u << VMX_MSR_COUNT) - 1);
guest_state->msrs[VMX_INDEX_MSR_LSTAR] = data->msr_lstar;
guest_state->msrs[VMX_INDEX_MSR_STAR] = data->msr_star;
guest_state->msrs[VMX_INDEX_MSR_SYSCALL_MASK] = data->msr_syscall_mask;
@@ -896,6 +895,18 @@ static void vmx_ctxt_switch_from(struct vcpu *v)
if ( unlikely(!this_cpu(vmxon)) )
return;
+ if ( !v->is_running )
+ {
+ /*
+ * When this vCPU isn't marked as running anymore, a remote pCPU's
+ * attempt to pause us (from vmx_vmcs_enter()) won't have a reason
+ * to spin in vcpu_sleep_sync(), and hence that pCPU might have taken
+ * away the VMCS from us. As we're running with interrupts disabled,
+ * we also can't call vmx_vmcs_enter().
+ */
+ vmx_vmcs_reload(v);
+ }
+
vmx_fpu_leave(v);
vmx_save_guest_msrs(v);
vmx_restore_host_msrs();
diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
index 3b025d5..9e246b6 100644
--- a/xen/arch/x86/mm/p2m-pt.c
+++ b/xen/arch/x86/mm/p2m-pt.c
@@ -452,7 +452,7 @@ static int do_recalc(struct p2m_domain *p2m, unsigned long gfn)
mfn |= _PAGE_PSE_PAT >> PAGE_SHIFT;
}
else
- mfn &= ~(_PAGE_PSE_PAT >> PAGE_SHIFT);
+ mfn &= ~((unsigned long)_PAGE_PSE_PAT >> PAGE_SHIFT);
flags |= _PAGE_PSE;
}
e = l1e_from_pfn(mfn, flags);
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 6a45185..162120c 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -2048,7 +2048,8 @@ p2m_flush_table(struct p2m_domain *p2m)
ASSERT(page_list_empty(&p2m->pod.super));
ASSERT(page_list_empty(&p2m->pod.single));
- if ( p2m->np2m_base == P2M_BASE_EADDR )
+ /* No need to flush if it's already empty */
+ if ( p2m_is_nestedp2m(p2m) && p2m->np2m_base == P2M_BASE_EADDR )
{
p2m_unlock(p2m);
return;
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index b130671..1bfe4ce 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -890,6 +890,17 @@ void __init noreturn __start_xen(unsigned long mbi_p)
mod[i].reserved = 0;
}
+ if ( efi_enabled )
+ {
+ /*
+ * This needs to remain in sync with xen_in_range() and the
+ * respective reserve_e820_ram() invocation below.
+ */
+ mod[mbi->mods_count].mod_start = PFN_DOWN(mbi->mem_upper);
+ mod[mbi->mods_count].mod_end = __pa(__2M_rwdata_end) -
+ (mbi->mem_upper & PAGE_MASK);
+ }
+
modules_headroom = bzimage_headroom(bootstrap_map(mod), mod->mod_end);
bootstrap_map(NULL);
@@ -925,7 +936,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
1UL << (PAGE_SHIFT + 32)) )
e = min(HYPERVISOR_VIRT_END - DIRECTMAP_VIRT_START,
1UL << (PAGE_SHIFT + 32));
-#define reloc_size ((__pa(&_end) + mask) & ~mask)
+#define reloc_size ((__pa(__2M_rwdata_end) + mask) & ~mask)
/* Is the region suitable for relocating Xen? */
if ( !xen_phys_start && e <= limit )
{
@@ -1070,8 +1081,9 @@ void __init noreturn __start_xen(unsigned long mbi_p)
if ( mod[j].reserved )
continue;
- /* Don't overlap with other modules. */
- end = consider_modules(s, e, size, mod, mbi->mods_count, j);
+ /* Don't overlap with other modules (or Xen itself). */
+ end = consider_modules(s, e, size, mod,
+ mbi->mods_count + efi_enabled, j);
if ( highmem_start && end > highmem_start )
continue;
@@ -1096,9 +1108,9 @@ void __init noreturn __start_xen(unsigned long mbi_p)
*/
while ( !kexec_crash_area.start )
{
- /* Don't overlap with modules. */
- e = consider_modules(s, e, PAGE_ALIGN(kexec_crash_area.size),
- mod, mbi->mods_count, -1);
+ /* Don't overlap with modules (or Xen itself). */
+ e = consider_modules(s, e, PAGE_ALIGN(kexec_crash_area.size), mod,
+ mbi->mods_count + efi_enabled, -1);
if ( s >= e )
break;
if ( e > kexec_crash_area_limit )
@@ -1122,8 +1134,10 @@ void __init noreturn __start_xen(unsigned long mbi_p)
if ( !xen_phys_start )
panic("Not enough memory to relocate Xen.");
- reserve_e820_ram(&boot_e820, efi_enabled ? mbi->mem_upper : __pa(&_start),
- __pa(&_end));
+
+ /* This needs to remain in sync with xen_in_range(). */
+ reserve_e820_ram(&boot_e820, efi_enabled ? mbi->mem_upper : __pa(_stext),
+ __pa(__2M_rwdata_end));
/* Late kexec reservation (dynamic start address). */
kexec_reserve_area(&boot_e820);
@@ -1672,7 +1686,7 @@ int __hwdom_init xen_in_range(unsigned long mfn)
paddr_t start, end;
int i;
- enum { region_s3, region_text, region_bss, nr_regions };
+ enum { region_s3, region_ro, region_rw, nr_regions };
static struct {
paddr_t s, e;
} xen_regions[nr_regions] __hwdom_initdata;
@@ -1683,12 +1697,20 @@ int __hwdom_init xen_in_range(unsigned long mfn)
/* S3 resume code (and other real mode trampoline code) */
xen_regions[region_s3].s = bootsym_phys(trampoline_start);
xen_regions[region_s3].e = bootsym_phys(trampoline_end);
- /* hypervisor code + data */
- xen_regions[region_text].s =__pa(&_stext);
- xen_regions[region_text].e = __pa(&__init_begin);
- /* bss */
- xen_regions[region_bss].s = __pa(&__bss_start);
- xen_regions[region_bss].e = __pa(&__bss_end);
+
+ /*
+ * This needs to remain in sync with the uses of the same symbols in
+ * - __start_xen() (above)
+ * - is_xen_fixed_mfn()
+ * - tboot_shutdown()
+ */
+
+ /* hypervisor .text + .rodata */
+ xen_regions[region_ro].s = __pa(&_stext);
+ xen_regions[region_ro].e = __pa(&__2M_rodata_end);
+ /* hypervisor .data + .bss */
+ xen_regions[region_rw].s = __pa(&__2M_rwdata_start);
+ xen_regions[region_rw].e = __pa(&__2M_rwdata_end);
}
start = (paddr_t)mfn << PAGE_SHIFT;
diff --git a/xen/arch/x86/tboot.c b/xen/arch/x86/tboot.c
index e5d7c42..562efcd 100644
--- a/xen/arch/x86/tboot.c
+++ b/xen/arch/x86/tboot.c
@@ -12,6 +12,7 @@
#include <asm/processor.h>
#include <asm/e820.h>
#include <asm/tboot.h>
+#include <asm/setup.h>
#include <crypto/vmac.h>
/* tboot=<physical address of shared page> */
@@ -282,7 +283,7 @@ static void tboot_gen_xenheap_integrity(const uint8_t key[TB_KEY_SIZE],
if ( !mfn_valid(mfn) )
continue;
- if ( (mfn << PAGE_SHIFT) < __pa(&_end) )
+ if ( is_xen_fixed_mfn(mfn) )
continue; /* skip Xen */
if ( (mfn >= PFN_DOWN(g_tboot_shared->tboot_base - 3 * PAGE_SIZE))
&& (mfn < PFN_UP(g_tboot_shared->tboot_base
@@ -363,20 +364,22 @@ void tboot_shutdown(uint32_t shutdown_type)
if ( shutdown_type == TB_SHUTDOWN_S3 )
{
/*
- * Xen regions for tboot to MAC
+ * Xen regions for tboot to MAC. This needs to remain in sync with
+ * xen_in_range().
*/
g_tboot_shared->num_mac_regions = 3;
/* S3 resume code (and other real mode trampoline code) */
g_tboot_shared->mac_regions[0].start = bootsym_phys(trampoline_start);
g_tboot_shared->mac_regions[0].size = bootsym_phys(trampoline_end) -
bootsym_phys(trampoline_start);
- /* hypervisor code + data */
+ /* hypervisor .text + .rodata */
g_tboot_shared->mac_regions[1].start = (uint64_t)__pa(&_stext);
- g_tboot_shared->mac_regions[1].size = __pa(&__init_begin) -
+ g_tboot_shared->mac_regions[1].size = __pa(&__2M_rodata_end) -
__pa(&_stext);
- /* bss */
- g_tboot_shared->mac_regions[2].start = (uint64_t)__pa(&__bss_start);
- g_tboot_shared->mac_regions[2].size = __pa(&__bss_end) - __pa(&__bss_start);
+ /* hypervisor .data + .bss */
+ g_tboot_shared->mac_regions[2].start = (uint64_t)__pa(&__2M_rwdata_start);
+ g_tboot_shared->mac_regions[2].size = __pa(&__2M_rwdata_end) -
+ __pa(&__2M_rwdata_start);
/*
* MAC domains and other Xen memory
diff --git a/xen/arch/x86/x86_emulate/x86_emulate.c b/xen/arch/x86/x86_emulate/x86_emulate.c
index b06c456..3dc6f10 100644
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -331,7 +331,11 @@ union vex {
#define copy_REX_VEX(ptr, rex, vex) do { \
if ( (vex).opcx != vex_none ) \
+ { \
+ if ( !mode_64bit() ) \
+ vex.reg |= 8; \
ptr[0] = 0xc4, ptr[1] = (vex).raw[0], ptr[2] = (vex).raw[1]; \
+ } \
else if ( mode_64bit() ) \
ptr[1] = rex | REX_PREFIX; \
} while (0)
@@ -870,15 +874,15 @@ do{ struct fpu_insn_ctxt fic; \
put_fpu(&fic); \
} while (0)
-#define emulate_fpu_insn_stub(_bytes...) \
+#define emulate_fpu_insn_stub(bytes...) \
do { \
- uint8_t *buf = get_stub(stub); \
- unsigned int _nr = sizeof((uint8_t[]){ _bytes }); \
- struct fpu_insn_ctxt fic = { .insn_bytes = _nr }; \
- memcpy(buf, ((uint8_t[]){ _bytes, 0xc3 }), _nr + 1); \
- get_fpu(X86EMUL_FPU_fpu, &fic); \
- stub.func(); \
- put_fpu(&fic); \
+ unsigned int nr_ = sizeof((uint8_t[]){ bytes }); \
+ struct fpu_insn_ctxt fic_ = { .insn_bytes = nr_ }; \
+ memcpy(get_stub(stub), ((uint8_t[]){ bytes, 0xc3 }), nr_ + 1); \
+ get_fpu(X86EMUL_FPU_fpu, &fic_); \
+ asm volatile ( "call *%[stub]" : "+m" (fic_) : \
+ [stub] "rm" (stub.func) ); \
+ put_fpu(&fic_); \
put_stub(stub); \
} while (0)
@@ -893,7 +897,7 @@ do { \
"call *%[func];" \
_POST_EFLAGS("[eflags]", "[mask]", "[tmp]") \
: [eflags] "+g" (_regs.eflags), \
- [tmp] "=&r" (tmp_) \
+ [tmp] "=&r" (tmp_), "+m" (fic_) \
: [func] "rm" (stub.func), \
[mask] "i" (EFLG_ZF|EFLG_PF|EFLG_CF) ); \
put_fpu(&fic_); \
@@ -1356,6 +1360,11 @@ protmode_load_seg(
}
memset(sreg, 0, sizeof(*sreg));
sreg->sel = sel;
+
+ /* Since CPL == SS.DPL, we need to put back DPL. */
+ if ( seg == x86_seg_ss )
+ sreg->attr.fields.dpl = sel;
+
return X86EMUL_OKAY;
}
@@ -2017,16 +2026,21 @@ x86_decode(
default:
BUG(); /* Shouldn't be possible. */
case 2:
- if ( in_realmode(ctxt, ops) || (state->regs->eflags & EFLG_VM) )
+ if ( state->regs->eflags & EFLG_VM )
break;
/* fall through */
case 4:
- if ( modrm_mod != 3 )
+ if ( modrm_mod != 3 || in_realmode(ctxt, ops) )
break;
/* fall through */
case 8:
/* VEX / XOP / EVEX */
generate_exception_if(rex_prefix || vex.pfx, EXC_UD, -1);
+ /*
+ * With operand size override disallowed (see above), op_bytes
+ * should not have changed from its default.
+ */
+ ASSERT(op_bytes == def_op_bytes);
vex.raw[0] = modrm;
if ( b == 0xc5 )
@@ -2053,6 +2067,12 @@ x86_decode(
op_bytes = 8;
}
}
+ else
+ {
+ /* Operand size fixed at 4 (no override via W bit). */
+ op_bytes = 4;
+ vex.b = 1;
+ }
switch ( b )
{
case 0x62:
@@ -2071,7 +2091,7 @@ x86_decode(
break;
}
}
- if ( mode_64bit() && !vex.r )
+ if ( !vex.r )
rex_prefix |= REX_R;
ext = vex.opcx;
@@ -2113,12 +2133,21 @@ x86_decode(
opcode |= b | MASK_INSR(vex.pfx, X86EMUL_OPC_PFX_MASK);
+ if ( !(d & ModRM) )
+ {
+ modrm_reg = modrm_rm = modrm_mod = modrm = 0;
+ break;
+ }
+
modrm = insn_fetch_type(uint8_t);
modrm_mod = (modrm & 0xc0) >> 6;
break;
}
+ }
+ if ( d & ModRM )
+ {
modrm_reg = ((rex_prefix & 4) << 1) | ((modrm & 0x38) >> 3);
modrm_rm = modrm & 0x07;
@@ -2182,6 +2211,17 @@ x86_decode(
break;
}
break;
+ case 0x20: /* mov cr,reg */
+ case 0x21: /* mov dr,reg */
+ case 0x22: /* mov reg,cr */
+ case 0x23: /* mov reg,dr */
+ /*
+ * Mov to/from cr/dr ignore the encoding of Mod, and behave as
+ * if they were encoded as reg/reg instructions. No futher
+ * disp/SIB bytes are fetched.
+ */
+ modrm_mod = 3;
+ break;
}
break;
@@ -4730,7 +4770,7 @@ x86_emulate(
case X86EMUL_OPC(0x0f, 0x21): /* mov dr,reg */
case X86EMUL_OPC(0x0f, 0x22): /* mov reg,cr */
case X86EMUL_OPC(0x0f, 0x23): /* mov reg,dr */
- generate_exception_if(ea.type != OP_REG, EXC_UD, -1);
+ ASSERT(ea.type == OP_REG); /* Early operand adjustment ensures this. */
generate_exception_if(!mode_ring0(), EXC_GP, 0);
modrm_reg |= lock_prefix << 3;
if ( b & 2 )
@@ -5050,6 +5090,7 @@ x86_emulate(
}
case X86EMUL_OPC(0x0f, 0xa3): bt: /* bt */
+ generate_exception_if(lock_prefix, EXC_UD, 0);
emulate_2op_SrcV_nobyte("bt", src, dst, _regs.eflags);
dst.type = OP_NONE;
break;
diff --git a/xen/arch/x86/x86_emulate/x86_emulate.h b/xen/arch/x86/x86_emulate/x86_emulate.h
index 993c576..708ce78 100644
--- a/xen/arch/x86/x86_emulate/x86_emulate.h
+++ b/xen/arch/x86/x86_emulate/x86_emulate.h
@@ -71,7 +71,7 @@ enum x86_swint_emulation {
* Attribute for segment selector. This is a copy of bit 40:47 & 52:55 of the
* segment descriptor. It happens to match the format of an AMD SVM VMCB.
*/
-typedef union __attribute__((__packed__)) segment_attributes {
+typedef union segment_attributes {
uint16_t bytes;
struct
{
@@ -91,7 +91,7 @@ typedef union __attribute__((__packed__)) segment_attributes {
* Full state of a segment register (visible and hidden portions).
* Again, this happens to match the format of an AMD SVM VMCB.
*/
-struct __attribute__((__packed__)) segment_register {
+struct segment_register {
uint16_t sel;
segment_attributes_t attr;
uint32_t limit;
diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
index 7676de9..1154996 100644
--- a/xen/arch/x86/xen.lds.S
+++ b/xen/arch/x86/xen.lds.S
@@ -299,7 +299,7 @@ SECTIONS
}
ASSERT(__image_base__ > XEN_VIRT_START ||
- _end <= XEN_VIRT_END - NR_CPUS * PAGE_SIZE,
+ __2M_rwdata_end <= XEN_VIRT_END - NR_CPUS * PAGE_SIZE,
"Xen image overlaps stubs area")
#ifdef CONFIG_KEXEC
diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
index 85a0116..a5da858 100644
--- a/xen/arch/x86/xstate.c
+++ b/xen/arch/x86/xstate.c
@@ -92,7 +92,7 @@ static int setup_xstate_features(bool_t bsp)
if ( bsp )
{
- xstate_features = fls(xfeature_mask);
+ xstate_features = flsl(xfeature_mask);
xstate_offsets = xzalloc_array(unsigned int, xstate_features);
if ( !xstate_offsets )
return -ENOMEM;
diff --git a/xen/common/memory.c b/xen/common/memory.c
index 21797ca..17f9e1e 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -437,8 +437,8 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
goto fail_early;
}
- if ( !guest_handle_okay(exch.in.extent_start, exch.in.nr_extents) ||
- !guest_handle_okay(exch.out.extent_start, exch.out.nr_extents) )
+ if ( !guest_handle_subrange_okay(exch.in.extent_start, exch.nr_exchanged,
+ exch.in.nr_extents - 1) )
{
rc = -EFAULT;
goto fail_early;
@@ -448,11 +448,27 @@ static long memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
{
in_chunk_order = exch.out.extent_order - exch.in.extent_order;
out_chunk_order = 0;
+
+ if ( !guest_handle_subrange_okay(exch.out.extent_start,
+ exch.nr_exchanged >> in_chunk_order,
+ exch.out.nr_extents - 1) )
+ {
+ rc = -EFAULT;
+ goto fail_early;
+ }
}
else
{
in_chunk_order = 0;
out_chunk_order = exch.in.extent_order - exch.out.extent_order;
+
+ if ( !guest_handle_subrange_okay(exch.out.extent_start,
+ exch.nr_exchanged << out_chunk_order,
+ exch.out.nr_extents - 1) )
+ {
+ rc = -EFAULT;
+ goto fail_early;
+ }
}
d = rcu_lock_domain_by_any_id(exch.in.domid);
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index ef8e0d8..6f7860a 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -491,12 +491,15 @@ void smt_idle_mask_set(unsigned int cpu, const cpumask_t *idlers,
}
/*
- * Clear the bits of all the siblings of cpu from mask.
+ * Clear the bits of all the siblings of cpu from mask (if necessary).
*/
static inline
void smt_idle_mask_clear(unsigned int cpu, cpumask_t *mask)
{
- cpumask_andnot(mask, mask, per_cpu(cpu_sibling_mask, cpu));
+ const cpumask_t *cpu_siblings = per_cpu(cpu_sibling_mask, cpu);
+
+ if ( cpumask_subset(cpu_siblings, mask) )
+ cpumask_andnot(mask, mask, per_cpu(cpu_sibling_mask, cpu));
}
/*
@@ -510,24 +513,26 @@ void smt_idle_mask_clear(unsigned int cpu, cpumask_t *mask)
*/
static int get_fallback_cpu(struct csched2_vcpu *svc)
{
- int cpu;
+ struct vcpu *v = svc->vcpu;
+ int cpu = v->processor;
- if ( likely(cpumask_test_cpu(svc->vcpu->processor,
- svc->vcpu->cpu_hard_affinity)) )
- return svc->vcpu->processor;
+ cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+ cpupool_domain_cpumask(v->domain));
- cpumask_and(cpumask_scratch, svc->vcpu->cpu_hard_affinity,
- &svc->rqd->active);
- cpu = cpumask_first(cpumask_scratch);
- if ( likely(cpu < nr_cpu_ids) )
+ if ( likely(cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu))) )
return cpu;
- cpumask_and(cpumask_scratch, svc->vcpu->cpu_hard_affinity,
- cpupool_domain_cpumask(svc->vcpu->domain));
+ if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu),
+ &svc->rqd->active)) )
+ {
+ cpumask_and(cpumask_scratch_cpu(cpu), &svc->rqd->active,
+ cpumask_scratch_cpu(cpu));
+ return cpumask_first(cpumask_scratch_cpu(cpu));
+ }
- ASSERT(!cpumask_empty(cpumask_scratch));
+ ASSERT(!cpumask_empty(cpumask_scratch_cpu(cpu)));
- return cpumask_first(cpumask_scratch);
+ return cpumask_first(cpumask_scratch_cpu(cpu));
}
/*
@@ -898,6 +903,14 @@ __runq_remove(struct csched2_vcpu *svc)
void burn_credits(struct csched2_runqueue_data *rqd, struct csched2_vcpu *, s_time_t);
+static inline void
+tickle_cpu(unsigned int cpu, struct csched2_runqueue_data *rqd)
+{
+ __cpumask_set_cpu(cpu, &rqd->tickled);
+ smt_idle_mask_clear(cpu, &rqd->smt_idle);
+ cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
+}
+
/*
* Check what processor it is best to 'wake', for picking up a vcpu that has
* just been put (back) in the runqueue. Logic is as follows:
@@ -941,6 +954,9 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
(unsigned char *)&d);
}
+ cpumask_and(cpumask_scratch_cpu(cpu), new->vcpu->cpu_hard_affinity,
+ cpupool_domain_cpumask(new->vcpu->domain));
+
/*
* First of all, consider idle cpus, checking if we can just
* re-use the pcpu where we were running before.
@@ -953,7 +969,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
cpumask_andnot(&mask, &rqd->idle, &rqd->smt_idle);
else
cpumask_copy(&mask, &rqd->smt_idle);
- cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
+ cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu));
i = cpumask_test_or_cycle(cpu, &mask);
if ( i < nr_cpu_ids )
{
@@ -968,7 +984,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
* gone through the scheduler yet.
*/
cpumask_andnot(&mask, &rqd->idle, &rqd->tickled);
- cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
+ cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu));
i = cpumask_test_or_cycle(cpu, &mask);
if ( i < nr_cpu_ids )
{
@@ -984,7 +1000,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
*/
cpumask_andnot(&mask, &rqd->active, &rqd->idle);
cpumask_andnot(&mask, &mask, &rqd->tickled);
- cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
+ cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu));
if ( cpumask_test_cpu(cpu, &mask) )
{
cur = CSCHED2_VCPU(curr_on_cpu(cpu));
@@ -1062,9 +1078,8 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
sizeof(d),
(unsigned char *)&d);
}
- __cpumask_set_cpu(ipid, &rqd->tickled);
- smt_idle_mask_clear(ipid, &rqd->smt_idle);
- cpu_raise_softirq(ipid, SCHEDULE_SOFTIRQ);
+
+ tickle_cpu(ipid, rqd);
if ( unlikely(new->tickled_cpu != -1) )
SCHED_STAT_CRANK(tickled_cpu_overwritten);
@@ -1104,18 +1119,28 @@ static void reset_credit(const struct scheduler *ops, int cpu, s_time_t now,
list_for_each( iter, &rqd->svc )
{
+ unsigned int svc_cpu;
struct csched2_vcpu * svc;
int start_credit;
svc = list_entry(iter, struct csched2_vcpu, rqd_elem);
+ svc_cpu = svc->vcpu->processor;
ASSERT(!is_idle_vcpu(svc->vcpu));
ASSERT(svc->rqd == rqd);
+ /*
+ * If svc is running, it is our responsibility to make sure, here,
+ * that the credit it has spent so far get accounted.
+ */
+ if ( svc->vcpu == curr_on_cpu(svc_cpu) )
+ burn_credits(rqd, svc, now);
+
start_credit = svc->credit;
- /* And add INIT * m, avoiding integer multiplication in the
- * common case. */
+ /*
+ * Add INIT * m, avoiding integer multiplication in the common case.
+ */
if ( likely(m==1) )
svc->credit += CSCHED2_CREDIT_INIT;
else
@@ -1378,7 +1403,9 @@ csched2_vcpu_sleep(const struct scheduler *ops, struct vcpu *vc)
SCHED_STAT_CRANK(vcpu_sleep);
if ( curr_on_cpu(vc->processor) == vc )
- cpu_raise_softirq(vc->processor, SCHEDULE_SOFTIRQ);
+ {
+ tickle_cpu(vc->processor, svc->rqd);
+ }
else if ( __vcpu_on_runq(svc) )
{
ASSERT(svc->rqd == RQD(ops, vc->processor));
@@ -1492,7 +1519,7 @@ static int
csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
{
struct csched2_private *prv = CSCHED2_PRIV(ops);
- int i, min_rqi = -1, new_cpu;
+ int i, min_rqi = -1, new_cpu, cpu = vc->processor;
struct csched2_vcpu *svc = CSCHED2_VCPU(vc);
s_time_t min_avgload = MAX_LOAD;
@@ -1512,7 +1539,7 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
* just grab the prv lock. Instead, we'll have to trylock, and
* do something else reasonable if we fail.
*/
- ASSERT(spin_is_locked(per_cpu(schedule_data, vc->processor).schedule_lock));
+ ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
if ( !read_trylock(&prv->lock) )
{
@@ -1526,6 +1553,9 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
goto out;
}
+ cpumask_and(cpumask_scratch_cpu(cpu), vc->cpu_hard_affinity,
+ cpupool_domain_cpumask(vc->domain));
+
/*
* First check to see if we're here because someone else suggested a place
* for us to move.
@@ -1537,13 +1567,13 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
printk(XENLOG_WARNING "%s: target runqueue disappeared!\n",
__func__);
}
- else
+ else if ( cpumask_intersects(cpumask_scratch_cpu(cpu),
+ &svc->migrate_rqd->active) )
{
- cpumask_and(cpumask_scratch, vc->cpu_hard_affinity,
+ cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
&svc->migrate_rqd->active);
- new_cpu = cpumask_any(cpumask_scratch);
- if ( new_cpu < nr_cpu_ids )
- goto out_up;
+ new_cpu = cpumask_any(cpumask_scratch_cpu(cpu));
+ goto out_up;
}
/* Fall-through to normal cpu pick */
}
@@ -1571,12 +1601,12 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
*/
if ( rqd == svc->rqd )
{
- if ( cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) )
+ if ( cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active) )
rqd_avgload = max_t(s_time_t, rqd->b_avgload - svc->avgload, 0);
}
else if ( spin_trylock(&rqd->lock) )
{
- if ( cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) )
+ if ( cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active) )
rqd_avgload = rqd->b_avgload;
spin_unlock(&rqd->lock);
@@ -1598,9 +1628,9 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
goto out_up;
}
- cpumask_and(cpumask_scratch, vc->cpu_hard_affinity,
+ cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
&prv->rqd[min_rqi].active);
- new_cpu = cpumask_any(cpumask_scratch);
+ new_cpu = cpumask_any(cpumask_scratch_cpu(cpu));
BUG_ON(new_cpu >= nr_cpu_ids);
out_up:
@@ -1675,6 +1705,8 @@ static void migrate(const struct scheduler *ops,
struct csched2_runqueue_data *trqd,
s_time_t now)
{
+ int cpu = svc->vcpu->processor;
+
if ( unlikely(tb_init_done) )
{
struct {
@@ -1696,8 +1728,8 @@ static void migrate(const struct scheduler *ops,
svc->migrate_rqd = trqd;
__set_bit(_VPF_migrating, &svc->vcpu->pause_flags);
__set_bit(__CSFLAG_runq_migrate_request, &svc->flags);
- cpu_raise_softirq(svc->vcpu->processor, SCHEDULE_SOFTIRQ);
SCHED_STAT_CRANK(migrate_requested);
+ tickle_cpu(cpu, svc->rqd);
}
else
{
@@ -1711,9 +1743,11 @@ static void migrate(const struct scheduler *ops,
}
__runq_deassign(svc);
- cpumask_and(cpumask_scratch, svc->vcpu->cpu_hard_affinity,
+ cpumask_and(cpumask_scratch_cpu(cpu), svc->vcpu->cpu_hard_affinity,
+ cpupool_domain_cpumask(svc->vcpu->domain));
+ cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
&trqd->active);
- svc->vcpu->processor = cpumask_any(cpumask_scratch);
+ svc->vcpu->processor = cpumask_any(cpumask_scratch_cpu(cpu));
ASSERT(svc->vcpu->processor < nr_cpu_ids);
__runq_assign(svc, trqd);
@@ -1737,8 +1771,14 @@ static void migrate(const struct scheduler *ops,
static bool_t vcpu_is_migrateable(struct csched2_vcpu *svc,
struct csched2_runqueue_data *rqd)
{
+ struct vcpu *v = svc->vcpu;
+ int cpu = svc->vcpu->processor;
+
+ cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+ cpupool_domain_cpumask(v->domain));
+
return !(svc->flags & CSFLAG_runq_migrate_request) &&
- cpumask_intersects(svc->vcpu->cpu_hard_affinity, &rqd->active);
+ cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active);
}
static void balance_load(const struct scheduler *ops, int cpu, s_time_t now)
@@ -1928,10 +1968,40 @@ static void
csched2_vcpu_migrate(
const struct scheduler *ops, struct vcpu *vc, unsigned int new_cpu)
{
+ struct domain *d = vc->domain;
struct csched2_vcpu * const svc = CSCHED2_VCPU(vc);
struct csched2_runqueue_data *trqd;
+ s_time_t now = NOW();
+
+ /*
+ * Being passed a target pCPU which is outside of our cpupool is only
+ * valid if we are shutting down (or doing ACPI suspend), and we are
+ * moving everyone to BSP, no matter whether or not BSP is inside our
+ * cpupool.
+ *
+ * And since there indeed is the chance that it is not part of it, all
+ * we must do is remove _and_ unassign the vCPU from any runqueue, as
+ * well as updating v->processor with the target, so that the suspend
+ * process can continue.
+ *
+ * It will then be during resume that a new, meaningful, value for
+ * v->processor will be chosen, and during actual domain unpause that
+ * the vCPU will be assigned to and added to the proper runqueue.
+ */
+ if ( unlikely(!cpumask_test_cpu(new_cpu, cpupool_domain_cpumask(d))) )
+ {
+ ASSERT(system_state == SYS_STATE_suspend);
+ if ( __vcpu_on_runq(svc) )
+ {
+ __runq_remove(svc);
+ update_load(ops, svc->rqd, NULL, -1, now);
+ }
+ __runq_deassign(svc);
+ vc->processor = new_cpu;
+ return;
+ }
- /* Check if new_cpu is valid */
+ /* If here, new_cpu must be a valid Credit2 pCPU, and in our affinity. */
ASSERT(cpumask_test_cpu(new_cpu, &CSCHED2_PRIV(ops)->initialized));
ASSERT(cpumask_test_cpu(new_cpu, vc->cpu_hard_affinity));
@@ -1946,7 +2016,7 @@ csched2_vcpu_migrate(
* pointing to a pcpu where we can't run any longer.
*/
if ( trqd != svc->rqd )
- migrate(ops, svc, trqd, NOW());
+ migrate(ops, svc, trqd, now);
else
vc->processor = new_cpu;
}
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 5b444c4..47b2155 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -84,7 +84,27 @@ static struct scheduler __read_mostly ops;
: (typeof((opsptr)->fn(opsptr, ##__VA_ARGS__)))0 )
#define DOM2OP(_d) (((_d)->cpupool == NULL) ? &ops : ((_d)->cpupool->sched))
-#define VCPU2OP(_v) (DOM2OP((_v)->domain))
+static inline struct scheduler *VCPU2OP(const struct vcpu *v)
+{
+ struct domain *d = v->domain;
+
+ if ( likely(d->cpupool != NULL) )
+ return d->cpupool->sched;
+
+ /*
+ * If d->cpupool is NULL, this is a vCPU of the idle domain. And this
+ * case is special because the idle domain does not really belong to
+ * a cpupool and, hence, doesn't really have a scheduler). In fact, its
+ * vCPUs (may) run on pCPUs which are in different pools, with different
+ * schedulers.
+ *
+ * What we want, in this case, is the scheduler of the pCPU where this
+ * particular idle vCPU is running. And, since v->processor never changes
+ * for idle vCPUs, it is safe to use it, with no locks, to figure that out.
+ */
+ ASSERT(is_idle_domain(d));
+ return per_cpu(scheduler, v->processor);
+}
#define VCPU2ONLINE(_v) cpupool_domain_cpumask((_v)->domain)
static inline void trace_runstate_change(struct vcpu *v, int new_state)
@@ -633,8 +653,11 @@ void vcpu_force_reschedule(struct vcpu *v)
void restore_vcpu_affinity(struct domain *d)
{
+ unsigned int cpu = smp_processor_id();
struct vcpu *v;
+ ASSERT(system_state == SYS_STATE_resume);
+
for_each_vcpu ( d, v )
{
spinlock_t *lock = vcpu_schedule_lock_irq(v);
@@ -643,18 +666,34 @@ void restore_vcpu_affinity(struct domain *d)
{
cpumask_copy(v->cpu_hard_affinity, v->cpu_hard_affinity_saved);
v->affinity_broken = 0;
+
}
- if ( v->processor == smp_processor_id() )
+ /*
+ * During suspend (in cpu_disable_scheduler()), we moved every vCPU
+ * to BSP (which, as of now, is pCPU 0), as a temporary measure to
+ * allow the nonboot processors to have their data structure freed
+ * and go to sleep. But nothing guardantees that the BSP is a valid
+ * pCPU for a particular domain.
+ *
+ * Therefore, here, before actually unpausing the domains, we should
+ * set v->processor of each of their vCPUs to something that will
+ * make sense for the scheduler of the cpupool in which they are in.
+ */
+ cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+ cpupool_domain_cpumask(v->domain));
+ v->processor = cpumask_any(cpumask_scratch_cpu(cpu));
+
+ if ( v->processor == cpu )
{
set_bit(_VPF_migrating, &v->pause_flags);
- vcpu_schedule_unlock_irq(lock, v);
+ spin_unlock_irq(lock);;
vcpu_sleep_nosync(v);
vcpu_migrate(v);
}
else
{
- vcpu_schedule_unlock_irq(lock, v);
+ spin_unlock_irq(lock);
}
}
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index d793f5d..5e81813 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -244,8 +244,7 @@ void iommu_domain_destroy(struct domain *d)
if ( !iommu_enabled || !dom_iommu(d)->platform_ops )
return;
- if ( need_iommu(d) )
- iommu_teardown(d);
+ iommu_teardown(d);
arch_iommu_domain_destroy(d);
}
diff --git a/xen/include/asm-arm/config.h b/xen/include/asm-arm/config.h
index ba61f65..6a92f53 100644
--- a/xen/include/asm-arm/config.h
+++ b/xen/include/asm-arm/config.h
@@ -46,6 +46,8 @@
#define MAX_VIRT_CPUS 8
#endif
+#define INVALID_VCPU_ID MAX_VIRT_CPUS
+
#define asmlinkage /* Nothing needed */
#define __LINUX_ARM_ARCH__ 7
diff --git a/xen/include/asm-arm/cpufeature.h b/xen/include/asm-arm/cpufeature.h
index af60fe3..c0a25ae 100644
--- a/xen/include/asm-arm/cpufeature.h
+++ b/xen/include/asm-arm/cpufeature.h
@@ -24,7 +24,7 @@
#define cpu_has_arm (boot_cpu_feature32(arm) == 1)
#define cpu_has_thumb (boot_cpu_feature32(thumb) >= 1)
#define cpu_has_thumb2 (boot_cpu_feature32(thumb) >= 3)
-#define cpu_has_jazelle (boot_cpu_feature32(jazelle) >= 0)
+#define cpu_has_jazelle (boot_cpu_feature32(jazelle) > 0)
#define cpu_has_thumbee (boot_cpu_feature32(thumbee) == 1)
#define cpu_has_aarch32 (cpu_has_arm || cpu_has_thumb)
diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
index fdb6b47..9e71776 100644
--- a/xen/include/asm-arm/p2m.h
+++ b/xen/include/asm-arm/p2m.h
@@ -95,6 +95,9 @@ struct p2m_domain {
/* back pointer to domain */
struct domain *domain;
+
+ /* Keeping track on which CPU this p2m was used and for which vCPU */
+ uint8_t last_vcpu_ran[NR_CPUS];
};
/*
diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h
index c492d6d..a0f9344 100644
--- a/xen/include/asm-arm/page.h
+++ b/xen/include/asm-arm/page.h
@@ -292,24 +292,20 @@ extern size_t cacheline_bytes;
static inline int invalidate_dcache_va_range(const void *p, unsigned long size)
{
- size_t off;
const void *end = p + size;
+ size_t cacheline_mask = cacheline_bytes - 1;
dsb(sy); /* So the CPU issues all writes to the range */
- off = (unsigned long)p % cacheline_bytes;
- if ( off )
+ if ( (uintptr_t)p & cacheline_mask )
{
- p -= off;
+ p = (void *)((uintptr_t)p & ~cacheline_mask);
asm volatile (__clean_and_invalidate_dcache_one(0) : : "r" (p));
p += cacheline_bytes;
- size -= cacheline_bytes - off;
}
- off = (unsigned long)end % cacheline_bytes;
- if ( off )
+ if ( (uintptr_t)end & cacheline_mask )
{
- end -= off;
- size -= off;
+ end = (void *)((uintptr_t)end & ~cacheline_mask);
asm volatile (__clean_and_invalidate_dcache_one(0) : : "r" (end));
}
@@ -323,9 +319,10 @@ static inline int invalidate_dcache_va_range(const void *p, unsigned long size)
static inline int clean_dcache_va_range(const void *p, unsigned long size)
{
- const void *end;
+ const void *end = p + size;
dsb(sy); /* So the CPU issues all writes to the range */
- for ( end = p + size; p < end; p += cacheline_bytes )
+ p = (void *)((uintptr_t)p & ~(cacheline_bytes - 1));
+ for ( ; p < end; p += cacheline_bytes )
asm volatile (__clean_dcache_one(0) : : "r" (p));
dsb(sy); /* So we know the flushes happen before continuing */
/* ARM callers assume that dcache_* functions cannot fail. */
@@ -335,9 +332,10 @@ static inline int clean_dcache_va_range(const void *p, unsigned long size)
static inline int clean_and_invalidate_dcache_va_range
(const void *p, unsigned long size)
{
- const void *end;
+ const void *end = p + size;
dsb(sy); /* So the CPU issues all writes to the range */
- for ( end = p + size; p < end; p += cacheline_bytes )
+ p = (void *)((uintptr_t)p & ~(cacheline_bytes - 1));
+ for ( ; p < end; p += cacheline_bytes )
asm volatile (__clean_and_invalidate_dcache_one(0) : : "r" (p));
dsb(sy); /* So we know the flushes happen before continuing */
/* ARM callers assume that dcache_* functions cannot fail. */
diff --git a/xen/include/asm-arm/sysregs.h b/xen/include/asm-arm/sysregs.h
index 570f43e..887368e 100644
--- a/xen/include/asm-arm/sysregs.h
+++ b/xen/include/asm-arm/sysregs.h
@@ -90,6 +90,7 @@
#define HSR_SYSREG_ICC_SGI1R_EL1 HSR_SYSREG(3,0,c12,c11,5)
#define HSR_SYSREG_ICC_ASGI1R_EL1 HSR_SYSREG(3,1,c12,c11,6)
#define HSR_SYSREG_ICC_SGI0R_EL1 HSR_SYSREG(3,2,c12,c11,7)
+#define HSR_SYSREG_ICC_SRE_EL1 HSR_SYSREG(3,0,c12,c12,5)
#define HSR_SYSREG_CONTEXTIDR_EL1 HSR_SYSREG(3,0,c13,c0,1)
#define HSR_SYSREG_PMCR_EL0 HSR_SYSREG(3,3,c9,c12,0)
diff --git a/xen/include/asm-arm/vgic.h b/xen/include/asm-arm/vgic.h
index 300f461..51b187f 100644
--- a/xen/include/asm-arm/vgic.h
+++ b/xen/include/asm-arm/vgic.h
@@ -69,7 +69,7 @@ struct pending_irq
unsigned long status;
struct irq_desc *desc; /* only set it the irq corresponds to a physical irq */
unsigned int irq;
-#define GIC_INVALID_LR ~(uint8_t)0
+#define GIC_INVALID_LR (uint8_t)~0
uint8_t lr;
uint8_t priority;
/* inflight is used to append instances of pending_irq to
@@ -107,7 +107,9 @@ struct vgic_irq_rank {
/*
* It's more convenient to store a target VCPU per vIRQ
- * than the register ITARGETSR/IROUTER itself
+ * than the register ITARGETSR/IROUTER itself.
+ * Use atomic operations to read/write the vcpu fields to avoid
+ * taking the rank lock.
*/
uint8_t vcpu[32];
};
diff --git a/xen/include/asm-x86/hvm/svm/vmcb.h b/xen/include/asm-x86/hvm/svm/vmcb.h
index bad2382..a3cd1b1 100644
--- a/xen/include/asm-x86/hvm/svm/vmcb.h
+++ b/xen/include/asm-x86/hvm/svm/vmcb.h
@@ -308,7 +308,7 @@ enum VMEXIT_EXITCODE
/* Definition of segment state is borrowed by the generic HVM code. */
typedef struct segment_register svm_segment_register_t;
-typedef union __packed
+typedef union
{
u64 bytes;
struct
@@ -322,7 +322,7 @@ typedef union __packed
} fields;
} eventinj_t;
-typedef union __packed
+typedef union
{
u64 bytes;
struct
@@ -340,7 +340,7 @@ typedef union __packed
} fields;
} vintr_t;
-typedef union __packed
+typedef union
{
u64 bytes;
struct
@@ -357,7 +357,7 @@ typedef union __packed
} fields;
} ioio_info_t;
-typedef union __packed
+typedef union
{
u64 bytes;
struct
@@ -366,7 +366,7 @@ typedef union __packed
} fields;
} lbrctrl_t;
-typedef union __packed
+typedef union
{
uint32_t bytes;
struct
@@ -401,7 +401,7 @@ typedef union __packed
#define IOPM_SIZE (12 * 1024)
#define MSRPM_SIZE (8 * 1024)
-struct __packed vmcb_struct {
+struct vmcb_struct {
u32 _cr_intercepts; /* offset 0x00 - cleanbit 0 */
u32 _dr_intercepts; /* offset 0x04 - cleanbit 0 */
u32 _exception_intercepts; /* offset 0x08 - cleanbit 0 */
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 997f4f5..0dfd5f8 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -238,6 +238,7 @@ void vmx_destroy_vmcs(struct vcpu *v);
void vmx_vmcs_enter(struct vcpu *v);
bool_t __must_check vmx_vmcs_try_enter(struct vcpu *v);
void vmx_vmcs_exit(struct vcpu *v);
+void vmx_vmcs_reload(struct vcpu *v);
#define CPU_BASED_VIRTUAL_INTR_PENDING 0x00000004
#define CPU_BASED_USE_TSC_OFFSETING 0x00000008
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 1b4d1c3..6687dbc 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -253,8 +253,8 @@ struct spage_info
#define is_xen_heap_mfn(mfn) \
(__mfn_valid(mfn) && is_xen_heap_page(__mfn_to_page(mfn)))
#define is_xen_fixed_mfn(mfn) \
- ((((mfn) << PAGE_SHIFT) >= __pa(&_start)) && \
- (((mfn) << PAGE_SHIFT) <= __pa(&_end)))
+ ((((mfn) << PAGE_SHIFT) >= __pa(&_stext)) && \
+ (((mfn) << PAGE_SHIFT) <= __pa(&__2M_rwdata_end)))
#define PRtype_info "016lx"/* should only be used for printk's */
diff --git a/xen/include/asm-x86/x86_64/uaccess.h b/xen/include/asm-x86/x86_64/uaccess.h
index 953abe7..4275e66 100644
--- a/xen/include/asm-x86/x86_64/uaccess.h
+++ b/xen/include/asm-x86/x86_64/uaccess.h
@@ -29,8 +29,9 @@ extern void *xlat_malloc(unsigned long *xlat_page_current, size_t size);
/*
* Valid if in +ve half of 48-bit address space, or above Xen-reserved area.
* This is also valid for range checks (addr, addr+size). As long as the
- * start address is outside the Xen-reserved area then we will access a
- * non-canonical address (and thus fault) before ever reaching VIRT_START.
+ * start address is outside the Xen-reserved area, sequential accesses
+ * (starting at addr) will hit a non-canonical address (and thus fault)
+ * before ever reaching VIRT_START.
*/
#define __addr_ok(addr) \
(((unsigned long)(addr) < (1UL<<47)) || \
@@ -40,7 +41,8 @@ extern void *xlat_malloc(unsigned long *xlat_page_current, size_t size);
(__addr_ok(addr) || is_compat_arg_xlat_range(addr, size))
#define array_access_ok(addr, count, size) \
- (access_ok(addr, (count)*(size)))
+ (likely(((count) ?: 0UL) < (~0UL / (size))) && \
+ access_ok(addr, (count) * (size)))
#define __compat_addr_ok(d, addr) \
((unsigned long)(addr) < HYPERVISOR_COMPAT_VIRT_START(d))
diff --git a/xen/include/public/arch-x86/hvm/save.h b/xen/include/public/arch-x86/hvm/save.h
index 8d73b51..419a3b2 100644
--- a/xen/include/public/arch-x86/hvm/save.h
+++ b/xen/include/public/arch-x86/hvm/save.h
@@ -135,7 +135,7 @@ struct hvm_hw_cpu {
uint64_t shadow_gs;
/* msr content saved/restored. */
- uint64_t msr_flags;
+ uint64_t msr_flags; /* Obsolete, ignored. */
uint64_t msr_lstar;
uint64_t msr_star;
uint64_t msr_cstar;
@@ -249,7 +249,7 @@ struct hvm_hw_cpu_compat {
uint64_t shadow_gs;
/* msr content saved/restored. */
- uint64_t msr_flags;
+ uint64_t msr_flags; /* Obsolete, ignored. */
uint64_t msr_lstar;
uint64_t msr_star;
uint64_t msr_cstar;
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index 5bf840f..315a4e8 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -222,9 +222,9 @@ DEFINE_XEN_GUEST_HANDLE(xen_machphys_mapping_t);
* XENMEM_add_to_physmap_batch only. */
#define XENMAPSPACE_dev_mmio 5 /* device mmio region
ARM only; the region is mapped in
- Stage-2 using the memory attribute
- "Device-nGnRE" (previously named
- "Device" on ARMv7) */
+ Stage-2 using the Normal Memory
+ Inner/Outer Write-Back Cacheable
+ memory attribute. */
/* ` } */
/*
diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
index 95460af..edc9086 100644
--- a/xen/include/xsm/dummy.h
+++ b/xen/include/xsm/dummy.h
@@ -712,18 +712,13 @@ static XSM_INLINE int xsm_pmu_op (XSM_DEFAULT_ARG struct domain *d, unsigned int
XSM_ASSERT_ACTION(XSM_OTHER);
switch ( op )
{
- case XENPMU_mode_set:
- case XENPMU_mode_get:
- case XENPMU_feature_set:
- case XENPMU_feature_get:
- return xsm_default_action(XSM_PRIV, d, current->domain);
case XENPMU_init:
case XENPMU_finish:
case XENPMU_lvtpc_set:
case XENPMU_flush:
return xsm_default_action(XSM_HOOK, d, current->domain);
default:
- return -EPERM;
+ return xsm_default_action(XSM_PRIV, d, current->domain);
}
}
diff -Nru xen-4.8.1~pre.2017.01.23/Config.mk xen-4.8.1/Config.mk
--- xen-4.8.1~pre.2017.01.23/Config.mk 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/Config.mk 2017-04-10 14:21:48.000000000 +0100
@@ -277,8 +277,8 @@
MINIOS_UPSTREAM_URL ?= git://xenbits.xen.org/mini-os.git
endif
OVMF_UPSTREAM_REVISION ?= bc54e50e0fe03c570014f363b547426913e92449
-QEMU_UPSTREAM_REVISION ?= qemu-xen-4.8.0
-MINIOS_UPSTREAM_REVISION ?= xen-RELEASE-4.8.0
+QEMU_UPSTREAM_REVISION ?= qemu-xen-4.8.1
+MINIOS_UPSTREAM_REVISION ?= xen-RELEASE-4.8.1
# Wed Sep 28 11:50:04 2016 +0200
# minios: fix build issue with xen_*mb defines
@@ -289,9 +289,7 @@
ETHERBOOT_NICS ?= rtl8139 8086100e
-QEMU_TRADITIONAL_REVISION ?= 095261a9ad5c31b9ed431f8382e8aa223089c85b
-# Mon Nov 14 17:19:46 2016 +0000
-# qemu: ioport_read, ioport_write: be defensive about 32-bit addresses
+QEMU_TRADITIONAL_REVISION ?= xen-4.8.1
# Specify which qemu-dm to use. This may be `ioemu' to use the old
# Mercurial in-tree version, or a local directory, or a git URL.
diff -Nru xen-4.8.1~pre.2017.01.23/debian/changelog xen-4.8.1/debian/changelog
--- xen-4.8.1~pre.2017.01.23/debian/changelog 2017-01-23 16:23:58.000000000 +0000
+++ xen-4.8.1/debian/changelog 2017-04-18 18:05:00.000000000 +0100
@@ -1,3 +1,13 @@
+xen (4.8.1-1) unstable; urgency=high
+
+ * Update to upstream 4.8.1 release.
+ Changes include numerous bugfixes, including security fixes for:
+ XSA-212 / CVE-2017-7228 Closes:#859560
+ XSA-207 / no cve yet Closes:#856229
+ XSA-206 / no cve yet no Debian bug
+
+ -- Ian Jackson <ian.jackson@eu.citrix.com> Tue, 18 Apr 2017 18:05:00 +0100
+
xen (4.8.1~pre.2017.01.23-1) unstable; urgency=medium
* Update to current upstream stable-4.8 git branch (Xen 4.8.1-pre).
diff -Nru xen-4.8.1~pre.2017.01.23/debian/control.md5sum xen-4.8.1/debian/control.md5sum
--- xen-4.8.1~pre.2017.01.23/debian/control.md5sum 2017-01-23 16:23:58.000000000 +0000
+++ xen-4.8.1/debian/control.md5sum 2017-04-18 18:05:13.000000000 +0100
@@ -1,4 +1,4 @@
-d74356cd54456cb07dc4a89ff001c233 debian/changelog
+414390ca652da67ac85ebd905500eb66 debian/changelog
dc7b5d9f0538e3180af4e9aff9b0bd57 debian/bin/gencontrol.py
20e336dbea44b1641802eff0dde9569b debian/templates/control.main.in
a15fa64ce6deead28d33c1581b14dba7 debian/templates/xen-hypervisor.postinst.in
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/config-prefix.diff xen-4.8.1/debian/patches/config-prefix.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/config-prefix.diff 2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/config-prefix.diff 2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
From: Bastian Blank <waldi@debian.org>
Date: Sat, 5 Jul 2014 11:46:45 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 3ff81ee48afd44afd4c5bc2dbd4daf2edeb0d8fc
+X-Dgit-Generated: 4.8.1-1 a376dc60f2926c349685de141c3993c7d791a494
Subject: config-prefix.diff
Patch-Name: config-prefix.diff
---
---- xen-4.8.1~pre.2017.01.23.orig/Config.mk
-+++ xen-4.8.1~pre.2017.01.23/Config.mk
+--- xen-4.8.1.orig/Config.mk
++++ xen-4.8.1/Config.mk
@@ -82,7 +82,7 @@ EXTRA_LIB += $(EXTRA_PREFIX)/lib
endif
@@ -18,8 +18,8 @@
# The above requires that prefix contains *no spaces*. This variable is here
# to permit the user to set PYTHON_PREFIX_ARG to '' to workaround this bug:
# https://bugs.launchpad.net/ubuntu/+bug/362570
---- xen-4.8.1~pre.2017.01.23.orig/config/Paths.mk.in
-+++ xen-4.8.1~pre.2017.01.23/config/Paths.mk.in
+--- xen-4.8.1.orig/config/Paths.mk.in
++++ xen-4.8.1/config/Paths.mk.in
@@ -13,6 +13,7 @@
# http://wiki.xen.org/wiki/Category:Host_Configuration#System_wide_xen_configuration
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/rerun-autogen.sh-stretch xen-4.8.1/debian/patches/rerun-autogen.sh-stretch
--- xen-4.8.1~pre.2017.01.23/debian/patches/rerun-autogen.sh-stretch 2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/rerun-autogen.sh-stretch 2017-04-18 18:07:28.000000000 +0100
@@ -1,6 +1,6 @@
From: Ian Jackson <ian.jackson@citrix.com>
Date: Fri, 28 Oct 2016 14:52:13 +0100
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 b3c8b0d4caa81fac565ec8439f33ff8677827dc5
+X-Dgit-Generated: 4.8.1-1 b1ceff30c4420ee49c49761e183b4ee2a66e3ed4
Subject: Rerun autogen.sh (stretch)
Using autoconf 2.69-10 (amd64)
@@ -9,8 +9,8 @@
---
---- xen-4.8.1~pre.2017.01.23.orig/configure
-+++ xen-4.8.1~pre.2017.01.23/configure
+--- xen-4.8.1.orig/configure
++++ xen-4.8.1/configure
@@ -641,6 +641,7 @@ infodir
docdir
oldincludedir
@@ -60,8 +60,8 @@
--libdir=DIR object code libraries [EPREFIX/lib]
--includedir=DIR C header files [PREFIX/include]
--oldincludedir=DIR C header files for non-gcc [/usr/include]
---- xen-4.8.1~pre.2017.01.23.orig/docs/configure
-+++ xen-4.8.1~pre.2017.01.23/docs/configure
+--- xen-4.8.1.orig/docs/configure
++++ xen-4.8.1/docs/configure
@@ -632,6 +632,7 @@ infodir
docdir
oldincludedir
@@ -111,8 +111,8 @@
--libdir=DIR object code libraries [EPREFIX/lib]
--includedir=DIR C header files [PREFIX/include]
--oldincludedir=DIR C header files for non-gcc [/usr/include]
---- xen-4.8.1~pre.2017.01.23.orig/stubdom/configure
-+++ xen-4.8.1~pre.2017.01.23/stubdom/configure
+--- xen-4.8.1.orig/stubdom/configure
++++ xen-4.8.1/stubdom/configure
@@ -659,6 +659,7 @@ infodir
docdir
oldincludedir
@@ -162,8 +162,8 @@
--libdir=DIR object code libraries [EPREFIX/lib]
--includedir=DIR C header files [PREFIX/include]
--oldincludedir=DIR C header files for non-gcc [/usr/include]
---- xen-4.8.1~pre.2017.01.23.orig/tools/configure
-+++ xen-4.8.1~pre.2017.01.23/tools/configure
+--- xen-4.8.1.orig/tools/configure
++++ xen-4.8.1/tools/configure
@@ -767,6 +767,7 @@ infodir
docdir
oldincludedir
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-blktap2-prefix.diff xen-4.8.1/debian/patches/tools-blktap2-prefix.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-blktap2-prefix.diff 2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-blktap2-prefix.diff 2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
From: Bastian Blank <waldi@debian.org>
Date: Sat, 5 Jul 2014 11:46:53 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 a28766f0ef4d267d0af7becdca134ad5a1d669e1
+X-Dgit-Generated: 4.8.1-1 ad82a5763c9d4ebeb72fa838c4abc77b72596370
Subject: tools-blktap2-prefix.diff
Patch-Name: tools-blktap2-prefix.diff
---
---- xen-4.8.1~pre.2017.01.23.orig/tools/blktap2/control/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/blktap2/control/Makefile
+--- xen-4.8.1.orig/tools/blktap2/control/Makefile
++++ xen-4.8.1/tools/blktap2/control/Makefile
@@ -1,10 +1,7 @@
XEN_ROOT := $(CURDIR)/../../../
include $(XEN_ROOT)/tools/Rules.mk
@@ -68,8 +68,8 @@
rm -f *~
distclean: clean
---- xen-4.8.1~pre.2017.01.23.orig/tools/blktap2/vhd/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/blktap2/vhd/Makefile
+--- xen-4.8.1.orig/tools/blktap2/vhd/Makefile
++++ xen-4.8.1/tools/blktap2/vhd/Makefile
@@ -12,6 +12,7 @@ CFLAGS += -Werror
CFLAGS += -Wno-unused
CFLAGS += -I../include
@@ -78,8 +78,8 @@
ifeq ($(CONFIG_X86_64),y)
CFLAGS += -fPIC
---- xen-4.8.1~pre.2017.01.23.orig/tools/blktap2/vhd/lib/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/blktap2/vhd/lib/Makefile
+--- xen-4.8.1.orig/tools/blktap2/vhd/lib/Makefile
++++ xen-4.8.1/tools/blktap2/vhd/lib/Makefile
@@ -2,25 +2,19 @@ XEN_ROOT=$(CURDIR)/../../../..
BLKTAP_ROOT := ../..
include $(XEN_ROOT)/tools/Rules.mk
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-console-prefix.diff xen-4.8.1/debian/patches/tools-console-prefix.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-console-prefix.diff 2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-console-prefix.diff 2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
From: Bastian Blank <waldi@debian.org>
Date: Sat, 5 Jul 2014 11:46:54 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 cdeb6d3730e004fbea6063379cc6ca80f9db5788
+X-Dgit-Generated: 4.8.1-1 54721627e1abd8f67827b3383ddfa6c174b572b9
Subject: tools-console-prefix.diff
Patch-Name: tools-console-prefix.diff
---
---- xen-4.8.1~pre.2017.01.23.orig/tools/console/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/console/Makefile
+--- xen-4.8.1.orig/tools/console/Makefile
++++ xen-4.8.1/tools/console/Makefile
@@ -8,6 +8,7 @@ CFLAGS += $(CFLAGS_libxenstore)
LDLIBS += $(LDLIBS_libxenctrl)
LDLIBS += $(LDLIBS_libxenstore)
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-include-install.diff xen-4.8.1/debian/patches/tools-include-install.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-include-install.diff 2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-include-install.diff 2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
From: Bastian Blank <waldi@debian.org>
Date: Sat, 5 Jul 2014 11:47:30 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 7169aa91d7ccc325357b27120340b57561cf8438
+X-Dgit-Generated: 4.8.1-1 732acd91e545566bca164b886afa82027df7c463
Subject: tools-include-install.diff
Patch-Name: tools-include-install.diff
---
---- xen-4.8.1~pre.2017.01.23.orig/tools/include/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/include/Makefile
+--- xen-4.8.1.orig/tools/include/Makefile
++++ xen-4.8.1/tools/include/Makefile
@@ -14,7 +14,6 @@ xen-foreign:
xen/.dir:
@rm -rf xen
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-libfsimage-abiname.diff xen-4.8.1/debian/patches/tools-libfsimage-abiname.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-libfsimage-abiname.diff 2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-libfsimage-abiname.diff 2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
From: Bastian Blank <waldi@debian.org>
Date: Sat, 5 Jul 2014 11:46:47 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 4ad1691c46fa9bebaa95e9e29b7081e446243c9d
+X-Dgit-Generated: 4.8.1-1 2a020aa59aec69c1d00f3fb8c86b188873e802ea
Subject: tools-libfsimage-abiname.diff
Patch-Name: tools-libfsimage-abiname.diff
---
---- xen-4.8.1~pre.2017.01.23.orig/tools/libfsimage/common/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/libfsimage/common/Makefile
+--- xen-4.8.1.orig/tools/libfsimage/common/Makefile
++++ xen-4.8.1/tools/libfsimage/common/Makefile
@@ -1,9 +1,6 @@
XEN_ROOT = $(CURDIR)/../../..
include $(XEN_ROOT)/tools/libfsimage/Rules.mk
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-libfsimage-prefix.diff xen-4.8.1/debian/patches/tools-libfsimage-prefix.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-libfsimage-prefix.diff 2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-libfsimage-prefix.diff 2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
From: Bastian Blank <waldi@debian.org>
Date: Sat, 5 Jul 2014 11:46:55 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 a88cc0796836248ff332ad1c7e176c6a0609b4fa
+X-Dgit-Generated: 4.8.1-1 0fc6ef9d31deed6668d7f18924664cbde155ea85
Subject: tools-libfsimage-prefix.diff
Patch-Name: tools-libfsimage-prefix.diff
---
---- xen-4.8.1~pre.2017.01.23.orig/tools/libfsimage/Rules.mk
-+++ xen-4.8.1~pre.2017.01.23/tools/libfsimage/Rules.mk
+--- xen-4.8.1.orig/tools/libfsimage/Rules.mk
++++ xen-4.8.1/tools/libfsimage/Rules.mk
@@ -3,10 +3,11 @@ include $(XEN_ROOT)/tools/Rules.mk
CFLAGS += -Wno-unknown-pragmas -I$(XEN_ROOT)/tools/libfsimage/common/ -DFSIMAGE_FSDIR=\"$(FSDIR)\"
CFLAGS += -Werror -D_GNU_SOURCE
@@ -22,8 +22,8 @@
FSLIB = fsimage.so
---- xen-4.8.1~pre.2017.01.23.orig/tools/libfsimage/common/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/libfsimage/common/Makefile
+--- xen-4.8.1.orig/tools/libfsimage/common/Makefile
++++ xen-4.8.1/tools/libfsimage/common/Makefile
@@ -1,6 +1,8 @@
XEN_ROOT = $(CURDIR)/../../..
include $(XEN_ROOT)/tools/libfsimage/Rules.mk
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-libxc-abiname.diff xen-4.8.1/debian/patches/tools-libxc-abiname.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-libxc-abiname.diff 2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-libxc-abiname.diff 2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
From: Bastian Blank <waldi@debian.org>
Date: Sat, 5 Jul 2014 11:46:48 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 db4a464d8cb49e3c33a1bd2f74f4321f8e20df2d
+X-Dgit-Generated: 4.8.1-1 45ad000e7e61a57a78bca482c464c78badbfeab5
Subject: tools-libxc-abiname.diff
Patch-Name: tools-libxc-abiname.diff
---
---- xen-4.8.1~pre.2017.01.23.orig/tools/libxc/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/libxc/Makefile
+--- xen-4.8.1.orig/tools/libxc/Makefile
++++ xen-4.8.1/tools/libxc/Makefile
@@ -1,9 +1,6 @@
XEN_ROOT = $(CURDIR)/../..
include $(XEN_ROOT)/tools/Rules.mk
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-libxl-abiname.diff xen-4.8.1/debian/patches/tools-libxl-abiname.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-libxl-abiname.diff 2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-libxl-abiname.diff 2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
From: Bastian Blank <waldi@debian.org>
Date: Sat, 5 Jul 2014 11:46:49 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 4a36b7f60eeb84d27ba814d7dba8214bdb96fb0c
+X-Dgit-Generated: 4.8.1-1 b6860f8b5e4980eedd3e75e5e81be73343d92558
Subject: tools-libxl-abiname.diff
Patch-Name: tools-libxl-abiname.diff
---
---- xen-4.8.1~pre.2017.01.23.orig/tools/libxl/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/libxl/Makefile
+--- xen-4.8.1.orig/tools/libxl/Makefile
++++ xen-4.8.1/tools/libxl/Makefile
@@ -5,12 +5,6 @@
XEN_ROOT = $(CURDIR)/../..
include $(XEN_ROOT)/tools/Rules.mk
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-libxl-prefix.diff xen-4.8.1/debian/patches/tools-libxl-prefix.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-libxl-prefix.diff 2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-libxl-prefix.diff 2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
From: Bastian Blank <waldi@debian.org>
Date: Sat, 5 Jul 2014 11:46:57 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 a08a69ac537d3733ab0663d35991fa2c2c142108
+X-Dgit-Generated: 4.8.1-1 0c590f711182ecc6c0aaee5fc0bf89f384c98fce
Subject: tools-libxl-prefix.diff
Patch-Name: tools-libxl-prefix.diff
---
---- xen-4.8.1~pre.2017.01.23.orig/tools/libxl/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/libxl/Makefile
+--- xen-4.8.1.orig/tools/libxl/Makefile
++++ xen-4.8.1/tools/libxl/Makefile
@@ -12,6 +12,8 @@ CFLAGS += -I. -fPIC
ifeq ($(CONFIG_Linux),y)
LIBUUID_LIBS += -luuid
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-misc-prefix.diff xen-4.8.1/debian/patches/tools-misc-prefix.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-misc-prefix.diff 2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-misc-prefix.diff 2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
From: Bastian Blank <waldi@debian.org>
Date: Sat, 5 Jul 2014 11:46:59 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 f63435d7a553a80a3490c5ef8999b5ac175bc5fe
+X-Dgit-Generated: 4.8.1-1 92bd9e6a61c01d45b42463d3097f87935167e731
Subject: tools-misc-prefix.diff
Patch-Name: tools-misc-prefix.diff
---
---- xen-4.8.1~pre.2017.01.23.orig/tools/misc/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/misc/Makefile
+--- xen-4.8.1.orig/tools/misc/Makefile
++++ xen-4.8.1/tools/misc/Makefile
@@ -54,12 +54,8 @@ all build: $(TARGETS_BUILD)
.PHONY: install
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-pygrub-prefix.diff xen-4.8.1/debian/patches/tools-pygrub-prefix.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-pygrub-prefix.diff 2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-pygrub-prefix.diff 2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
From: Bastian Blank <waldi@debian.org>
Date: Sat, 5 Jul 2014 11:47:01 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 72f92b5bd96f0bd5726deadaeff420361fb13a0b
+X-Dgit-Generated: 4.8.1-1 e11fc351a6d75288200f781c656599ec3547c484
Subject: tools-pygrub-prefix.diff
Patch-Name: tools-pygrub-prefix.diff
---
---- xen-4.8.1~pre.2017.01.23.orig/tools/pygrub/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/pygrub/Makefile
+--- xen-4.8.1.orig/tools/pygrub/Makefile
++++ xen-4.8.1/tools/pygrub/Makefile
@@ -16,11 +16,6 @@ install: all
CC="$(CC)" CFLAGS="$(PY_CFLAGS)" LDFLAGS="$(PY_LDFLAGS)" $(PYTHON) \
setup.py install $(PYTHON_PREFIX_ARG) --root="$(DESTDIR)" \
@@ -21,8 +21,8 @@
.PHONY: clean
clean:
---- xen-4.8.1~pre.2017.01.23.orig/tools/pygrub/setup.py
-+++ xen-4.8.1~pre.2017.01.23/tools/pygrub/setup.py
+--- xen-4.8.1.orig/tools/pygrub/setup.py
++++ xen-4.8.1/tools/pygrub/setup.py
@@ -4,11 +4,13 @@ import os
import sys
@@ -37,8 +37,8 @@
include_dirs = [ XEN_ROOT + "/tools/libfsimage/common/" ],
library_dirs = [ XEN_ROOT + "/tools/libfsimage/common/" ],
libraries = ["fsimage"],
---- xen-4.8.1~pre.2017.01.23.orig/tools/pygrub/src/pygrub
-+++ xen-4.8.1~pre.2017.01.23/tools/pygrub/src/pygrub
+--- xen-4.8.1.orig/tools/pygrub/src/pygrub
++++ xen-4.8.1/tools/pygrub/src/pygrub
@@ -21,6 +21,8 @@ import xen.lowlevel.xc
import curses, _curses, curses.wrapper, curses.textpad, curses.ascii
import getopt
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-pygrub-remove-static-solaris-support xen-4.8.1/debian/patches/tools-pygrub-remove-static-solaris-support
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-pygrub-remove-static-solaris-support 2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-pygrub-remove-static-solaris-support 2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
From: Bastian Blank <waldi@debian.org>
Date: Sat, 5 Jul 2014 11:47:29 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 33f2e5cf7348dc23eaac81e8ac9a9c7e6ed94f15
+X-Dgit-Generated: 4.8.1-1 00315ed8c451173d0d212d55a831023166f3b212
Subject: Remove static solaris support from pygrub
Patch-Name: tools-pygrub-remove-static-solaris-support
---
---- xen-4.8.1~pre.2017.01.23.orig/tools/pygrub/src/pygrub
-+++ xen-4.8.1~pre.2017.01.23/tools/pygrub/src/pygrub
+--- xen-4.8.1.orig/tools/pygrub/src/pygrub
++++ xen-4.8.1/tools/pygrub/src/pygrub
@@ -16,7 +16,6 @@ import os, sys, string, struct, tempfile
import copy
import logging
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-python-prefix.diff xen-4.8.1/debian/patches/tools-python-prefix.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-python-prefix.diff 2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-python-prefix.diff 2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
From: Bastian Blank <waldi@debian.org>
Date: Sat, 5 Jul 2014 11:47:02 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 20d81801d70a3d3d6517a6a2d28fd4eabcd99e07
+X-Dgit-Generated: 4.8.1-1 5ea3aead5ce755e99c1e811dc3bdf74cec9e991f
Subject: tools-python-prefix.diff
Patch-Name: tools-python-prefix.diff
---
---- xen-4.8.1~pre.2017.01.23.orig/tools/python/setup.py
-+++ xen-4.8.1~pre.2017.01.23/tools/python/setup.py
+--- xen-4.8.1.orig/tools/python/setup.py
++++ xen-4.8.1/tools/python/setup.py
@@ -5,6 +5,7 @@ import os, sys
XEN_ROOT = "../.."
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-rpath.diff xen-4.8.1/debian/patches/tools-rpath.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-rpath.diff 2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-rpath.diff 2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
From: Bastian Blank <waldi@debian.org>
Date: Sat, 5 Jul 2014 11:46:51 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 e92e68cad6bfc85c595f35e4303932f34b985088
+X-Dgit-Generated: 4.8.1-1 31f508fde90e729a0f734dc00d0c75213f075a2e
Subject: tools-rpath.diff
Patch-Name: tools-rpath.diff
---
---- xen-4.8.1~pre.2017.01.23.orig/tools/Rules.mk
-+++ xen-4.8.1~pre.2017.01.23/tools/Rules.mk
+--- xen-4.8.1.orig/tools/Rules.mk
++++ xen-4.8.1/tools/Rules.mk
@@ -9,6 +9,8 @@ include $(XEN_ROOT)/Config.mk
export _INSTALL := $(INSTALL)
INSTALL = $(XEN_ROOT)/tools/cross-install
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-xcutils-rpath.diff xen-4.8.1/debian/patches/tools-xcutils-rpath.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-xcutils-rpath.diff 2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-xcutils-rpath.diff 2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
From: Bastian Blank <waldi@debian.org>
Date: Sat, 5 Jul 2014 11:47:05 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 dd796d91589bf69cf723f68c38eba61a483c2a17
+X-Dgit-Generated: 4.8.1-1 845c9126f103d039326ba1cc06575de8a2d32d39
Subject: tools-xcutils-rpath.diff
Patch-Name: tools-xcutils-rpath.diff
---
---- xen-4.8.1~pre.2017.01.23.orig/tools/xcutils/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/xcutils/Makefile
+--- xen-4.8.1.orig/tools/xcutils/Makefile
++++ xen-4.8.1/tools/xcutils/Makefile
@@ -19,6 +19,8 @@ CFLAGS += -Werror
CFLAGS_readnotes.o := $(CFLAGS_libxenevtchn) $(CFLAGS_libxenctrl) $(CFLAGS_libxenguest) -I$(XEN_ROOT)/tools/libxc $(CFLAGS_libxencall)
CFLAGS_lsevtchn.o := $(CFLAGS_libxenevtchn) $(CFLAGS_libxenctrl)
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-xenmon-install.diff xen-4.8.1/debian/patches/tools-xenmon-install.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-xenmon-install.diff 2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-xenmon-install.diff 2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
From: Bastian Blank <waldi@debian.org>
Date: Sat, 5 Jul 2014 11:47:31 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 3ddfd6f3b1e4b5acbb24ef0291eeb6edba20514d
+X-Dgit-Generated: 4.8.1-1 75dded97d0701561959c2fab12f0328058078b40
Subject: tools-xenmon-install.diff
Patch-Name: tools-xenmon-install.diff
---
---- xen-4.8.1~pre.2017.01.23.orig/tools/xenmon/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/xenmon/Makefile
+--- xen-4.8.1.orig/tools/xenmon/Makefile
++++ xen-4.8.1/tools/xenmon/Makefile
@@ -13,6 +13,10 @@
XEN_ROOT=$(CURDIR)/../..
include $(XEN_ROOT)/tools/Rules.mk
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-xenmon-prefix.diff xen-4.8.1/debian/patches/tools-xenmon-prefix.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-xenmon-prefix.diff 2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-xenmon-prefix.diff 2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
From: Bastian Blank <waldi@debian.org>
Date: Sat, 5 Jul 2014 11:47:06 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 61efe37be33cc79b1c0ec9e9f337aeb54dde9f08
+X-Dgit-Generated: 4.8.1-1 3c1dc49f92bcdb9e031a419f3c0014b57fcb96a9
Subject: tools-xenmon-prefix.diff
Patch-Name: tools-xenmon-prefix.diff
---
---- xen-4.8.1~pre.2017.01.23.orig/tools/xenmon/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/xenmon/Makefile
+--- xen-4.8.1.orig/tools/xenmon/Makefile
++++ xen-4.8.1/tools/xenmon/Makefile
@@ -18,6 +18,7 @@ CFLAGS += $(CFLAGS_libxenevtchn)
CFLAGS += $(CFLAGS_libxenctrl)
LDLIBS += $(LDLIBS_libxenctrl)
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-xenpaging-prefix.diff xen-4.8.1/debian/patches/tools-xenpaging-prefix.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-xenpaging-prefix.diff 2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-xenpaging-prefix.diff 2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
From: Bastian Blank <waldi@debian.org>
Date: Sat, 5 Jul 2014 11:47:08 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 6566733aebb9e3bfd448434762fe15e6de6ec927
+X-Dgit-Generated: 4.8.1-1 6b66a39ea6db832a88d94c4d8e256f77e08fe1a3
Subject: tools-xenpaging-prefix.diff
Patch-Name: tools-xenpaging-prefix.diff
---
---- xen-4.8.1~pre.2017.01.23.orig/tools/xenpaging/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/xenpaging/Makefile
+--- xen-4.8.1.orig/tools/xenpaging/Makefile
++++ xen-4.8.1/tools/xenpaging/Makefile
@@ -4,7 +4,7 @@ include $(XEN_ROOT)/tools/Rules.mk
# xenpaging.c and file_ops.c incorrectly use libxc internals
CFLAGS += $(CFLAGS_libxentoollog) $(CFLAGS_libxenevtchn) $(CFLAGS_libxenctrl) $(CFLAGS_libxenstore) $(PTHREAD_CFLAGS) -I$(XEN_ROOT)/tools/libxc $(CFLAGS_libxencall)
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-xenpmd-prefix.diff xen-4.8.1/debian/patches/tools-xenpmd-prefix.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-xenpmd-prefix.diff 2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-xenpmd-prefix.diff 2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
From: Bastian Blank <waldi@debian.org>
Date: Sat, 13 Dec 2014 19:37:02 +0100
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 f26ee91fd13377ac94dfb98d99f92bd5fb7afac1
+X-Dgit-Generated: 4.8.1-1 abbd6a5b077ff2f14d6e715c7f342f02f3b78ef8
Subject: tools-xenpmd-prefix.diff
Patch-Name: tools-xenpmd-prefix.diff
---
---- xen-4.8.1~pre.2017.01.23.orig/tools/xenpmd/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/xenpmd/Makefile
+--- xen-4.8.1.orig/tools/xenpmd/Makefile
++++ xen-4.8.1/tools/xenpmd/Makefile
@@ -11,8 +11,8 @@ all: xenpmd
.PHONY: install
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-xenstat-abiname.diff xen-4.8.1/debian/patches/tools-xenstat-abiname.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-xenstat-abiname.diff 2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-xenstat-abiname.diff 2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
From: Bastian Blank <waldi@debian.org>
Date: Sat, 5 Jul 2014 11:46:50 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 8b4f535d3fd07f59f7eeded0fb0533ece6dd03dd
+X-Dgit-Generated: 4.8.1-1 a968429393f380a5bf1eab604bd1720f31369fcd
Subject: tools-xenstat-abiname.diff
Patch-Name: tools-xenstat-abiname.diff
---
---- xen-4.8.1~pre.2017.01.23.orig/tools/xenstat/libxenstat/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/xenstat/libxenstat/Makefile
+--- xen-4.8.1.orig/tools/xenstat/libxenstat/Makefile
++++ xen-4.8.1/tools/xenstat/libxenstat/Makefile
@@ -18,18 +18,14 @@ include $(XEN_ROOT)/tools/Rules.mk
LDCONFIG=ldconfig
MAKE_LINK=ln -sf
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-xenstat-prefix.diff xen-4.8.1/debian/patches/tools-xenstat-prefix.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-xenstat-prefix.diff 2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-xenstat-prefix.diff 2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
From: Bastian Blank <waldi@debian.org>
Date: Sat, 5 Jul 2014 11:47:09 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 72db0c1ccf6eae7961706b4d1ceddb7b15adf23d
+X-Dgit-Generated: 4.8.1-1 fa062a38ebfa9a8d1e52ee698c72aff4cb39e969
Subject: tools-xenstat-prefix.diff
Patch-Name: tools-xenstat-prefix.diff
---
---- xen-4.8.1~pre.2017.01.23.orig/tools/xenstat/libxenstat/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/xenstat/libxenstat/Makefile
+--- xen-4.8.1.orig/tools/xenstat/libxenstat/Makefile
++++ xen-4.8.1/tools/xenstat/libxenstat/Makefile
@@ -20,7 +20,7 @@ MAKE_LINK=ln -sf
LIB=src/libxenstat.a
@@ -31,8 +31,8 @@
PYLIB=bindings/swig/python/_xenstat.so
PYMOD=bindings/swig/python/xenstat.py
---- xen-4.8.1~pre.2017.01.23.orig/tools/xenstat/xentop/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/xenstat/xentop/Makefile
+--- xen-4.8.1.orig/tools/xenstat/xentop/Makefile
++++ xen-4.8.1/tools/xenstat/xentop/Makefile
@@ -19,7 +19,9 @@ all install xentop:
else
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-xenstore-compatibility.diff xen-4.8.1/debian/patches/tools-xenstore-compatibility.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-xenstore-compatibility.diff 2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-xenstore-compatibility.diff 2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
From: Bastian Blank <waldi@debian.org>
Date: Sat, 5 Jul 2014 11:47:36 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 cd76fd2a0e5f54534a02691aaf30ff4ed585224b
+X-Dgit-Generated: 4.8.1-1 e0deca5e873be2aeb99ad58aed95eaa9c7c8ce35
Subject: tools-xenstore-compatibility.diff
Patch-Name: tools-xenstore-compatibility.diff
---
---- xen-4.8.1~pre.2017.01.23.orig/tools/xenstore/include/xenstore.h
-+++ xen-4.8.1~pre.2017.01.23/tools/xenstore/include/xenstore.h
+--- xen-4.8.1.orig/tools/xenstore/include/xenstore.h
++++ xen-4.8.1/tools/xenstore/include/xenstore.h
@@ -25,6 +25,7 @@
#define XS_OPEN_READONLY 1UL<<0
@@ -17,8 +17,8 @@
/*
* Setting XS_UNWATCH_FILTER arranges that after xs_unwatch, no
---- xen-4.8.1~pre.2017.01.23.orig/tools/xenstore/xenstore_client.c
-+++ xen-4.8.1~pre.2017.01.23/tools/xenstore/xenstore_client.c
+--- xen-4.8.1.orig/tools/xenstore/xenstore_client.c
++++ xen-4.8.1/tools/xenstore/xenstore_client.c
@@ -636,7 +636,7 @@ main(int argc, char **argv)
max_width = ws.ws_col - 2;
}
@@ -28,8 +28,8 @@
if (xsh == NULL) err(1, "xs_open");
again:
---- xen-4.8.1~pre.2017.01.23.orig/tools/xenstore/xs.c
-+++ xen-4.8.1~pre.2017.01.23/tools/xenstore/xs.c
+--- xen-4.8.1.orig/tools/xenstore/xs.c
++++ xen-4.8.1/tools/xenstore/xs.c
@@ -281,17 +281,19 @@ struct xs_handle *xs_daemon_open_readonl
struct xs_handle *xs_domain_open(void)
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-xenstore-prefix.diff xen-4.8.1/debian/patches/tools-xenstore-prefix.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-xenstore-prefix.diff 2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-xenstore-prefix.diff 2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
From: Bastian Blank <waldi@debian.org>
Date: Sat, 5 Jul 2014 11:47:12 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 cc61a6cd98de364b188ada3498d5090c84ecd079
+X-Dgit-Generated: 4.8.1-1 dda6e65fe8f36391534f781ebdf0bc9f9e58192a
Subject: tools-xenstore-prefix.diff
Patch-Name: tools-xenstore-prefix.diff
---
---- xen-4.8.1~pre.2017.01.23.orig/tools/helpers/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/helpers/Makefile
+--- xen-4.8.1.orig/tools/helpers/Makefile
++++ xen-4.8.1/tools/helpers/Makefile
@@ -31,7 +31,7 @@ xen-init-dom0: $(XEN_INIT_DOM0_OBJS)
$(INIT_XENSTORE_DOMAIN_OBJS): _paths.h
@@ -18,8 +18,8 @@
.PHONY: install
install: all
---- xen-4.8.1~pre.2017.01.23.orig/tools/xenstore/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/xenstore/Makefile
+--- xen-4.8.1.orig/tools/xenstore/Makefile
++++ xen-4.8.1/tools/xenstore/Makefile
@@ -20,6 +20,8 @@ LDFLAGS-$(CONFIG_SYSTEMD) += $(SYSTEMD_L
CFLAGS += $(CFLAGS-y)
LDFLAGS += $(LDFLAGS-y)
@@ -29,16 +29,16 @@
CLIENTS := xenstore-exists xenstore-list xenstore-read xenstore-rm xenstore-chmod
CLIENTS += xenstore-write xenstore-ls xenstore-watch
-@@ -73,7 +75,7 @@ endif
+@@ -74,7 +76,7 @@ endif
$(XENSTORED_OBJS): CFLAGS += $(CFLAGS_libxengnttab)
xenstored: $(XENSTORED_OBJS)
-- $(CC) $^ $(LDFLAGS) $(LDLIBS_libxenevtchn) $(LDLIBS_libxengnttab) $(LDLIBS_libxenctrl) $(SOCKET_LIBS) -o $@ $(APPEND_LDFLAGS)
-+ $(CC) $^ $(LDFLAGS) $(LDLIBS_libxenevtchn) $(LDLIBS_libxengnttab) $(LDLIBS_libxenctrl) $(SOCKET_LIBS) $(call LDFLAGS_RPATH,../lib) -o $@ $(APPEND_LDFLAGS)
+- $(CC) $^ $(LDFLAGS) $(LDLIBS_libxenevtchn) $(LDLIBS_libxengnttab) $(LDLIBS_libxenctrl) $(LDLIBS_xenstored) $(SOCKET_LIBS) -o $@ $(APPEND_LDFLAGS)
++ $(CC) $^ $(LDFLAGS) $(LDLIBS_libxenevtchn) $(LDLIBS_libxengnttab) $(LDLIBS_libxenctrl) $(SOCKET_LIBS) $(LDLIBS_xenstored) $(call LDFLAGS_RPATH,../lib) -o $@ $(APPEND_LDFLAGS)
xenstored.a: $(XENSTORED_OBJS)
$(AR) cr $@ $^
-@@ -126,13 +128,13 @@ tarball: clean
+@@ -127,13 +129,13 @@ tarball: clean
install: all
$(INSTALL_DIR) $(DESTDIR)$(bindir)
$(INSTALL_DIR) $(DESTDIR)$(includedir)
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/tools-xentrace-prefix.diff xen-4.8.1/debian/patches/tools-xentrace-prefix.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/tools-xentrace-prefix.diff 2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/tools-xentrace-prefix.diff 2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
From: Bastian Blank <waldi@debian.org>
Date: Sat, 5 Jul 2014 11:47:14 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 7ccc23bf0b827cab54f38ba86e3eed55eb149436
+X-Dgit-Generated: 4.8.1-1 bded2269fb168938a662711d0a632d9d644bfc30
Subject: tools-xentrace-prefix.diff
Patch-Name: tools-xentrace-prefix.diff
---
---- xen-4.8.1~pre.2017.01.23.orig/tools/xentrace/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/xentrace/Makefile
+--- xen-4.8.1.orig/tools/xentrace/Makefile
++++ xen-4.8.1/tools/xentrace/Makefile
@@ -8,6 +8,7 @@ CFLAGS += $(CFLAGS_libxenctrl)
LDLIBS += $(LDLIBS_libxenevtchn)
LDLIBS += $(LDLIBS_libxenctrl)
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/toolstestsx86_emulator-pass--no-pie--fno xen-4.8.1/debian/patches/toolstestsx86_emulator-pass--no-pie--fno
--- xen-4.8.1~pre.2017.01.23/debian/patches/toolstestsx86_emulator-pass--no-pie--fno 2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/toolstestsx86_emulator-pass--no-pie--fno 2017-04-18 18:07:28.000000000 +0100
@@ -1,6 +1,6 @@
From: Ian Jackson <ian.jackson@citrix.com>
Date: Tue, 1 Nov 2016 16:20:27 +0000
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 21ae6b2deae99f127e27ea1590a3821159e4c53a
+X-Dgit-Generated: 4.8.1-1 0b669a48e4ac450fded811b1ea297d644044d179
Subject: tools/tests/x86_emulator: Pass -no-pie -fno-pic to gcc on x86_32
The current build fails with GCC6 on Debian sid i386 (unstable):
@@ -33,8 +33,8 @@
---
---- xen-4.8.1~pre.2017.01.23.orig/tools/tests/x86_emulator/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/tests/x86_emulator/Makefile
+--- xen-4.8.1.orig/tools/tests/x86_emulator/Makefile
++++ xen-4.8.1/tools/tests/x86_emulator/Makefile
@@ -45,6 +45,10 @@ x86_emulate/x86_emulate.c x86_emulate/x8
HOSTCFLAGS += $(CFLAGS_xeninclude)
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/ubuntu-tools-libs-abiname.diff xen-4.8.1/debian/patches/ubuntu-tools-libs-abiname.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/ubuntu-tools-libs-abiname.diff 2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/ubuntu-tools-libs-abiname.diff 2017-04-18 18:07:28.000000000 +0100
@@ -1,13 +1,13 @@
From: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Date: Thu, 6 Oct 2016 14:24:46 +0100
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 1c23a037b6c6db944485b6b965660123d57edf05
+X-Dgit-Generated: 4.8.1-1 a80895b1222bf96c423953a78171ca38ee847a9f
Subject: ubuntu-tools-libs-abiname
---
---- xen-4.8.1~pre.2017.01.23.orig/tools/libs/call/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/libs/call/Makefile
+--- xen-4.8.1.orig/tools/libs/call/Makefile
++++ xen-4.8.1/tools/libs/call/Makefile
@@ -39,22 +39,22 @@ headers.chk: $(wildcard include/*.h)
libxencall.a: $(LIB_OBJS)
$(AR) rc $@ $^
@@ -47,8 +47,8 @@
rm -f headers.chk
.PHONY: distclean
---- xen-4.8.1~pre.2017.01.23.orig/tools/libs/evtchn/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/libs/evtchn/Makefile
+--- xen-4.8.1.orig/tools/libs/evtchn/Makefile
++++ xen-4.8.1/tools/libs/evtchn/Makefile
@@ -39,22 +39,22 @@ headers.chk: $(wildcard include/*.h)
libxenevtchn.a: $(LIB_OBJS)
$(AR) rc $@ $^
@@ -88,8 +88,8 @@
rm -f headers.chk
.PHONY: distclean
---- xen-4.8.1~pre.2017.01.23.orig/tools/libs/foreignmemory/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/libs/foreignmemory/Makefile
+--- xen-4.8.1.orig/tools/libs/foreignmemory/Makefile
++++ xen-4.8.1/tools/libs/foreignmemory/Makefile
@@ -39,22 +39,22 @@ headers.chk: $(wildcard include/*.h)
libxenforeignmemory.a: $(LIB_OBJS)
$(AR) rc $@ $^
@@ -129,8 +129,8 @@
rm -f headers.chk
.PHONY: distclean
---- xen-4.8.1~pre.2017.01.23.orig/tools/libs/gnttab/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/libs/gnttab/Makefile
+--- xen-4.8.1.orig/tools/libs/gnttab/Makefile
++++ xen-4.8.1/tools/libs/gnttab/Makefile
@@ -41,22 +41,22 @@ headers.chk: $(wildcard include/*.h)
libxengnttab.a: $(LIB_OBJS)
$(AR) rc $@ $^
@@ -170,8 +170,8 @@
rm -f headers.chk
.PHONY: distclean
---- xen-4.8.1~pre.2017.01.23.orig/tools/libs/toollog/Makefile
-+++ xen-4.8.1~pre.2017.01.23/tools/libs/toollog/Makefile
+--- xen-4.8.1.orig/tools/libs/toollog/Makefile
++++ xen-4.8.1/tools/libs/toollog/Makefile
@@ -34,22 +34,22 @@ headers.chk: $(wildcard include/*.h)
libxentoollog.a: $(LIB_OBJS)
$(AR) rc $@ $^
diff -Nru xen-4.8.1~pre.2017.01.23/debian/patches/version.diff xen-4.8.1/debian/patches/version.diff
--- xen-4.8.1~pre.2017.01.23/debian/patches/version.diff 2017-01-23 16:28:33.000000000 +0000
+++ xen-4.8.1/debian/patches/version.diff 2017-04-18 18:07:28.000000000 +0100
@@ -1,14 +1,14 @@
From: Bastian Blank <waldi@debian.org>
Date: Sat, 5 Jul 2014 11:46:43 +0200
-X-Dgit-Generated: 4.8.1~pre.2017.01.23-1 d4c74ba58fa9aa7bde7b1a0a61a9361ce6e55919
+X-Dgit-Generated: 4.8.1-1 adc50830f6c334569f54255310fc489d139d542f
Subject: version
Patch-Name: version.diff
---
---- xen-4.8.1~pre.2017.01.23.orig/xen/Makefile
-+++ xen-4.8.1~pre.2017.01.23/xen/Makefile
+--- xen-4.8.1.orig/xen/Makefile
++++ xen-4.8.1/xen/Makefile
@@ -160,7 +160,7 @@ delete-unfresh-files:
@mv -f $@.tmp $@
@@ -32,8 +32,8 @@
@mv -f $@.new $@
include/asm-$(TARGET_ARCH)/asm-offsets.h: arch/$(TARGET_ARCH)/asm-offsets.s
---- xen-4.8.1~pre.2017.01.23.orig/xen/common/kernel.c
-+++ xen-4.8.1~pre.2017.01.23/xen/common/kernel.c
+--- xen-4.8.1.orig/xen/common/kernel.c
++++ xen-4.8.1/xen/common/kernel.c
@@ -252,8 +252,8 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDL
memset(&info, 0, sizeof(info));
@@ -45,8 +45,8 @@
safe_strcpy(info.compile_date, deny ? xen_deny() : xen_compile_date());
if ( copy_to_guest(arg, &info, 1) )
return -EFAULT;
---- xen-4.8.1~pre.2017.01.23.orig/xen/common/version.c
-+++ xen-4.8.1~pre.2017.01.23/xen/common/version.c
+--- xen-4.8.1.orig/xen/common/version.c
++++ xen-4.8.1/xen/common/version.c
@@ -20,19 +20,24 @@ const char *xen_compile_time(void)
return XEN_COMPILE_TIME;
}
@@ -90,8 +90,8 @@
const char *xen_deny(void)
{
return "<denied>";
---- xen-4.8.1~pre.2017.01.23.orig/xen/drivers/char/console.c
-+++ xen-4.8.1~pre.2017.01.23/xen/drivers/char/console.c
+--- xen-4.8.1.orig/xen/drivers/char/console.c
++++ xen-4.8.1/xen/drivers/char/console.c
@@ -732,14 +732,11 @@ void __init console_init_preirq(void)
serial_set_rx_handler(sercon_handle, serial_rx);
@@ -110,8 +110,8 @@
if ( opt_sync_console )
{
---- xen-4.8.1~pre.2017.01.23.orig/xen/include/xen/compile.h.in
-+++ xen-4.8.1~pre.2017.01.23/xen/include/xen/compile.h.in
+--- xen-4.8.1.orig/xen/include/xen/compile.h.in
++++ xen-4.8.1/xen/include/xen/compile.h.in
@@ -1,8 +1,9 @@
#define XEN_COMPILE_DATE "@@date@@"
#define XEN_COMPILE_TIME "@@time@@"
@@ -130,8 +130,8 @@
#define XEN_CHANGESET "@@changeset@@"
-#define XEN_BANNER \
---- xen-4.8.1~pre.2017.01.23.orig/xen/include/xen/version.h
-+++ xen-4.8.1~pre.2017.01.23/xen/include/xen/version.h
+--- xen-4.8.1.orig/xen/include/xen/version.h
++++ xen-4.8.1/xen/include/xen/version.h
@@ -6,9 +6,10 @@
const char *xen_compile_date(void);
diff -Nru xen-4.8.1~pre.2017.01.23/docs/misc/xen-command-line.markdown xen-4.8.1/docs/misc/xen-command-line.markdown
--- xen-4.8.1~pre.2017.01.23/docs/misc/xen-command-line.markdown 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/docs/misc/xen-command-line.markdown 2017-04-10 14:21:48.000000000 +0100
@@ -1619,6 +1619,21 @@
As the virtualisation is not 100% safe, don't use the vpmu flag on
production systems (see http://xenbits.xen.org/xsa/advisory-163.html)!
+### vwfi
+> `= trap | native
+
+> Default: `trap`
+
+WFI is the ARM instruction to "wait for interrupt". WFE is similar and
+means "wait for event". This option, which is ARM specific, changes the
+way guest WFI and WFE are implemented in Xen. By default, Xen traps both
+instructions. In the case of WFI, Xen blocks the guest vcpu; in the case
+of WFE, Xen yield the guest vcpu. When setting vwfi to `native`, Xen
+doesn't trap either instruction, running them in guest context. Setting
+vwfi to `native` reduces irq latency significantly. It can also lead to
+suboptimal scheduling decisions, but only when the system is
+oversubscribed (i.e., in total there are more vCPUs than pCPUs).
+
### watchdog
> `= force | <boolean>`
diff -Nru xen-4.8.1~pre.2017.01.23/tools/libxc/include/xenctrl.h xen-4.8.1/tools/libxc/include/xenctrl.h
--- xen-4.8.1~pre.2017.01.23/tools/libxc/include/xenctrl.h 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/libxc/include/xenctrl.h 2017-04-10 14:21:48.000000000 +0100
@@ -2710,6 +2710,14 @@
int xc_livepatch_unload(xc_interface *xch, char *name, uint32_t timeout);
int xc_livepatch_replace(xc_interface *xch, char *name, uint32_t timeout);
+/*
+ * Ensure cache coherency after memory modifications. A call to this function
+ * is only required on ARM as the x86 architecture provides cache coherency
+ * guarantees. Calling this function on x86 is allowed but has no effect.
+ */
+int xc_domain_cacheflush(xc_interface *xch, uint32_t domid,
+ xen_pfn_t start_pfn, xen_pfn_t nr_pfns);
+
/* Compat shims */
#include "xenctrl_compat.h"
diff -Nru xen-4.8.1~pre.2017.01.23/tools/libxc/xc_domain.c xen-4.8.1/tools/libxc/xc_domain.c
--- xen-4.8.1~pre.2017.01.23/tools/libxc/xc_domain.c 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/libxc/xc_domain.c 2017-04-10 14:21:48.000000000 +0100
@@ -74,10 +74,10 @@
/*
* The x86 architecture provides cache coherency guarantees which prevent
* the need for this hypercall. Avoid the overhead of making a hypercall
- * just for Xen to return -ENOSYS.
+ * just for Xen to return -ENOSYS. It is safe to ignore this call on x86
+ * so we just return 0.
*/
- errno = ENOSYS;
- return -1;
+ return 0;
#else
DECLARE_DOMCTL;
domctl.cmd = XEN_DOMCTL_cacheflush;
diff -Nru xen-4.8.1~pre.2017.01.23/tools/libxc/xc_private.c xen-4.8.1/tools/libxc/xc_private.c
--- xen-4.8.1~pre.2017.01.23/tools/libxc/xc_private.c 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/libxc/xc_private.c 2017-04-10 14:21:48.000000000 +0100
@@ -64,8 +64,7 @@
goto err;
xch->fmem = xenforeignmemory_open(xch->error_handler, 0);
-
- if ( xch->xcall == NULL )
+ if ( xch->fmem == NULL )
goto err;
return xch;
diff -Nru xen-4.8.1~pre.2017.01.23/tools/libxc/xc_private.h xen-4.8.1/tools/libxc/xc_private.h
--- xen-4.8.1~pre.2017.01.23/tools/libxc/xc_private.h 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/libxc/xc_private.h 2017-04-10 14:21:48.000000000 +0100
@@ -366,9 +366,6 @@
/* Optionally flush file to disk and discard page cache */
void discard_file_cache(xc_interface *xch, int fd, int flush);
-int xc_domain_cacheflush(xc_interface *xch, uint32_t domid,
- xen_pfn_t start_pfn, xen_pfn_t nr_pfns);
-
#define MAX_MMU_UPDATES 1024
struct xc_mmu {
mmu_update_t updates[MAX_MMU_UPDATES];
diff -Nru xen-4.8.1~pre.2017.01.23/tools/libxl/libxl.c xen-4.8.1/tools/libxl/libxl.c
--- xen-4.8.1~pre.2017.01.23/tools/libxl/libxl.c 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/libxl/libxl.c 2017-04-10 14:21:48.000000000 +0100
@@ -2255,7 +2255,8 @@
case LIBXL_DISK_BACKEND_QDISK:
flexarray_append(back, "params");
flexarray_append(back, GCSPRINTF("%s:%s",
- libxl__device_disk_string_of_format(disk->format), disk->pdev_path));
+ libxl__device_disk_string_of_format(disk->format),
+ disk->pdev_path ? : ""));
if (libxl_defbool_val(disk->colo_enable)) {
flexarray_append(back, "colo-host");
flexarray_append(back, libxl__sprintf(gc, "%s", disk->colo_host));
diff -Nru xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/Makefile xen-4.8.1/tools/ocaml/xenstored/Makefile
--- xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/Makefile 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/ocaml/xenstored/Makefile 2017-04-10 14:21:48.000000000 +0100
@@ -53,6 +53,7 @@
domains \
connection \
connections \
+ history \
parse_arg \
process \
xenstored
diff -Nru xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/connection.ml xen-4.8.1/tools/ocaml/xenstored/connection.ml
--- xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/connection.ml 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/ocaml/xenstored/connection.ml 2017-04-10 14:21:48.000000000 +0100
@@ -296,3 +296,8 @@
let domid = get_domstr con in
let watches = List.map (fun (path, token) -> Printf.sprintf "watch %s: %s %s\n" domid path token) (list_watches con) in
String.concat "" watches
+
+let decr_conflict_credit doms con =
+ match con.dom with
+ | None -> () (* It's a socket connection. We don't know which domain we're in, so treat it as if it's free to conflict *)
+ | Some dom -> Domains.decr_conflict_credit doms dom
diff -Nru xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/connections.ml xen-4.8.1/tools/ocaml/xenstored/connections.ml
--- xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/connections.ml 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/ocaml/xenstored/connections.ml 2017-04-10 14:21:48.000000000 +0100
@@ -44,12 +44,14 @@
| Some p -> Hashtbl.add cons.ports p con;
| None -> ()
-let select cons =
- Hashtbl.fold
- (fun _ con (ins, outs) ->
- let fd = Connection.get_fd con in
- (fd :: ins, if Connection.has_output con then fd :: outs else outs))
- cons.anonymous ([], [])
+let select ?(only_if = (fun _ -> true)) cons =
+ Hashtbl.fold (fun _ con (ins, outs) ->
+ if (only_if con) then (
+ let fd = Connection.get_fd con in
+ (fd :: ins, if Connection.has_output con then fd :: outs else outs)
+ ) else (ins, outs)
+ )
+ cons.anonymous ([], [])
let find cons =
Hashtbl.find cons.anonymous
diff -Nru xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/define.ml xen-4.8.1/tools/ocaml/xenstored/define.ml
--- xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/define.ml 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/ocaml/xenstored/define.ml 2017-04-10 14:21:48.000000000 +0100
@@ -29,6 +29,10 @@
let maxtransaction = ref (20)
let maxrequests = ref (-1) (* maximum requests per transaction *)
+let conflict_burst_limit = ref 5.0
+let conflict_max_history_seconds = ref 0.05
+let conflict_rate_limit_is_aggregate = ref true
+
let domid_self = 0x7FF0
exception Not_a_directory of string
diff -Nru xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/domain.ml xen-4.8.1/tools/ocaml/xenstored/domain.ml
--- xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/domain.ml 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/ocaml/xenstored/domain.ml 2017-04-10 14:21:48.000000000 +0100
@@ -31,8 +31,13 @@
mutable io_credit: int; (* the rounds of ring process left to do, default is 0,
usually set to 1 when there is work detected, could
also set to n to give "lazy" clients extra credit *)
+ mutable conflict_credit: float; (* Must be positive to perform writes; a commit
+ that later causes conflict with another
+ domain's transaction costs credit. *)
+ mutable caused_conflicts: int64;
}
+let is_dom0 d = d.id = 0
let get_path dom = "/local/domain/" ^ (sprintf "%u" dom.id)
let get_id domain = domain.id
let get_interface d = d.interface
@@ -48,6 +53,10 @@
let incr_io_credit domain = domain.io_credit <- domain.io_credit + 1
let decr_io_credit domain = domain.io_credit <- max 0 (domain.io_credit - 1)
+let is_paused_for_conflict dom = dom.conflict_credit <= 0.0
+
+let is_free_to_conflict = is_dom0
+
let string_of_port = function
| None -> "None"
| Some x -> string_of_int (Xeneventchn.to_int x)
@@ -84,6 +93,12 @@
port = None;
bad_client = false;
io_credit = 0;
+ conflict_credit = !Define.conflict_burst_limit;
+ caused_conflicts = 0L;
}
-let is_dom0 d = d.id = 0
+let log_and_reset_conflict_stats logfn dom =
+ if dom.caused_conflicts > 0L then (
+ logfn dom.id dom.caused_conflicts;
+ dom.caused_conflicts <- 0L
+ )
diff -Nru xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/domains.ml xen-4.8.1/tools/ocaml/xenstored/domains.ml
--- xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/domains.ml 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/ocaml/xenstored/domains.ml 2017-04-10 14:21:48.000000000 +0100
@@ -15,20 +15,77 @@
*)
let debug fmt = Logging.debug "domains" fmt
+let error fmt = Logging.error "domains" fmt
+let warn fmt = Logging.warn "domains" fmt
type domains = {
eventchn: Event.t;
table: (Xenctrl.domid, Domain.t) Hashtbl.t;
+
+ (* N.B. the Queue module is not thread-safe but oxenstored is single-threaded. *)
+ (* Domains queue up to regain conflict-credit; we have a queue for
+ domains that are carrying some penalty and so are below the
+ maximum credit, and another queue for domains that have run out of
+ credit and so have had their access paused. *)
+ doms_conflict_paused: (Domain.t option ref) Queue.t;
+ doms_with_conflict_penalty: (Domain.t option ref) Queue.t;
+
+ (* A callback function to be called when we go from zero to one paused domain.
+ This will be to reset the countdown until the next unit of credit is issued. *)
+ on_first_conflict_pause: unit -> unit;
+
+ (* If config is set to use individual instead of aggregate conflict-rate-limiting,
+ we use these counts instead of the queues. The second one includes the first. *)
+ mutable n_paused: int; (* Number of domains with zero or negative credit *)
+ mutable n_penalised: int; (* Number of domains with less than maximum credit *)
}
-let init eventchn =
- { eventchn = eventchn; table = Hashtbl.create 10 }
+let init eventchn on_first_conflict_pause = {
+ eventchn = eventchn;
+ table = Hashtbl.create 10;
+ doms_conflict_paused = Queue.create ();
+ doms_with_conflict_penalty = Queue.create ();
+ on_first_conflict_pause = on_first_conflict_pause;
+ n_paused = 0;
+ n_penalised = 0;
+}
let del doms id = Hashtbl.remove doms.table id
let exist doms id = Hashtbl.mem doms.table id
let find doms id = Hashtbl.find doms.table id
let number doms = Hashtbl.length doms.table
let iter doms fct = Hashtbl.iter (fun _ b -> fct b) doms.table
+let rec is_empty_queue q =
+ Queue.is_empty q ||
+ if !(Queue.peek q) = None
+ then (
+ ignore (Queue.pop q);
+ is_empty_queue q
+ ) else false
+
+let all_at_max_credit doms =
+ if !Define.conflict_rate_limit_is_aggregate
+ then
+ (* Check both becuase if burst limit is 1.0 then a domain can go straight
+ * from max-credit to paused without getting into the penalty queue. *)
+ is_empty_queue doms.doms_with_conflict_penalty
+ && is_empty_queue doms.doms_conflict_paused
+ else doms.n_penalised = 0
+
+(* Functions to handle queues of domains given that the domain might be deleted while in a queue. *)
+let push dom queue =
+ Queue.push (ref (Some dom)) queue
+
+let rec pop queue =
+ match !(Queue.pop queue) with
+ | None -> pop queue
+ | Some x -> x
+
+let remove_from_queue dom queue =
+ Queue.iter (fun d -> match !d with
+ | None -> ()
+ | Some x -> if x=dom then d := None) queue
+
let cleanup xc doms =
let notify = ref false in
let dead_dom = ref [] in
@@ -52,6 +109,11 @@
let dom = Hashtbl.find doms.table id in
Domain.close dom;
Hashtbl.remove doms.table id;
+ if dom.Domain.conflict_credit <= !Define.conflict_burst_limit
+ then (
+ remove_from_queue dom doms.doms_with_conflict_penalty;
+ if (dom.Domain.conflict_credit <= 0.) then remove_from_queue dom doms.doms_conflict_paused
+ )
) !dead_dom;
!notify, !dead_dom
@@ -82,3 +144,74 @@
Domain.bind_interdomain dom;
Domain.notify dom;
dom
+
+let decr_conflict_credit doms dom =
+ dom.Domain.caused_conflicts <- Int64.add 1L dom.Domain.caused_conflicts;
+ let before = dom.Domain.conflict_credit in
+ let after = max (-1.0) (before -. 1.0) in
+ debug "decr_conflict_credit dom%d %F -> %F" (Domain.get_id dom) before after;
+ dom.Domain.conflict_credit <- after;
+ let newly_penalised =
+ before >= !Define.conflict_burst_limit
+ && after < !Define.conflict_burst_limit in
+ let newly_paused = before > 0.0 && after <= 0.0 in
+ if !Define.conflict_rate_limit_is_aggregate then (
+ if newly_penalised
+ && after > 0.0
+ then (
+ push dom doms.doms_with_conflict_penalty
+ ) else if newly_paused
+ then (
+ let first_pause = Queue.is_empty doms.doms_conflict_paused in
+ push dom doms.doms_conflict_paused;
+ if first_pause then doms.on_first_conflict_pause ()
+ ) else (
+ (* The queues are correct already: no further action needed. *)
+ )
+ ) else (
+ if newly_penalised then doms.n_penalised <- doms.n_penalised + 1;
+ if newly_paused then (
+ doms.n_paused <- doms.n_paused + 1;
+ if doms.n_paused = 1 then doms.on_first_conflict_pause ()
+ )
+ )
+
+(* Give one point of credit to one domain, and update the queues appropriately. *)
+let incr_conflict_credit_from_queue doms =
+ let process_queue q requeue_test =
+ let d = pop q in
+ let before = d.Domain.conflict_credit in (* just for debug-logging *)
+ d.Domain.conflict_credit <- min (d.Domain.conflict_credit +. 1.0) !Define.conflict_burst_limit;
+ debug "incr_conflict_credit_from_queue: dom%d: %F -> %F" (Domain.get_id d) before d.Domain.conflict_credit;
+ if requeue_test d.Domain.conflict_credit then (
+ push d q (* Make it queue up again for its next point of credit. *)
+ )
+ in
+ let paused_queue_test cred = cred <= 0.0 in
+ let penalty_queue_test cred = cred < !Define.conflict_burst_limit in
+ try process_queue doms.doms_conflict_paused paused_queue_test
+ with Queue.Empty -> (
+ try process_queue doms.doms_with_conflict_penalty penalty_queue_test
+ with Queue.Empty -> () (* Both queues are empty: nothing to do here. *)
+ )
+
+let incr_conflict_credit doms =
+ if !Define.conflict_rate_limit_is_aggregate
+ then incr_conflict_credit_from_queue doms
+ else (
+ (* Give a point of credit to every domain, subject only to the cap. *)
+ let inc dom =
+ let before = dom.Domain.conflict_credit in
+ let after = min (before +. 1.0) !Define.conflict_burst_limit in
+ dom.Domain.conflict_credit <- after;
+ debug "incr_conflict_credit dom%d: %F -> %F" (Domain.get_id dom) before after;
+
+ if before <= 0.0 && after > 0.0
+ then doms.n_paused <- doms.n_paused - 1;
+
+ if before < !Define.conflict_burst_limit
+ && after >= !Define.conflict_burst_limit
+ then doms.n_penalised <- doms.n_penalised - 1
+ in
+ if doms.n_penalised > 0 then iter doms inc
+ )
diff -Nru xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/history.ml xen-4.8.1/tools/ocaml/xenstored/history.ml
--- xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/history.ml 1970-01-01 01:00:00.000000000 +0100
+++ xen-4.8.1/tools/ocaml/xenstored/history.ml 2017-04-10 14:21:48.000000000 +0100
@@ -0,0 +1,73 @@
+(*
+ * Copyright (c) 2017 Citrix Systems Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU Lesser General Public License for more details.
+ *)
+
+type history_record = {
+ con: Connection.t; (* connection that made a change *)
+ tid: int; (* transaction id of the change (may be Transaction.none) *)
+ before: Store.t; (* the store before the change *)
+ after: Store.t; (* the store after the change *)
+ finish_count: int64; (* the commit-count at which the transaction finished *)
+}
+
+let history : history_record list ref = ref []
+
+(* Called from periodic_ops to ensure we don't discard symbols that are still needed. *)
+(* There is scope for optimisation here, since in consecutive commits one commit's `after`
+ * is the same thing as the next commit's `before`, but not all commits in history are
+ * consecutive. *)
+let mark_symbols () =
+ (* There are gaps where dom0's commits are missing. Otherwise we could assume that
+ * each element's `before` is the same thing as the next element's `after`
+ * since the next element is the previous commit *)
+ List.iter (fun hist_rec ->
+ Store.mark_symbols hist_rec.before;
+ Store.mark_symbols hist_rec.after;
+ )
+ !history
+
+(* Keep only enough commit-history to protect the running transactions that we are still tracking *)
+(* There is scope for optimisation here, replacing List.filter with something more efficient,
+ * probably on a different list-like structure. *)
+let trim ?txn () =
+ Transaction.trim_short_running_transactions txn;
+ history := match Transaction.oldest_short_running_transaction () with
+ | None -> [] (* We have no open transaction, so no history is needed *)
+ | Some (_, txn) -> (
+ (* keep records with finish_count recent enough to be relevant *)
+ List.filter (fun r -> r.finish_count > txn.Transaction.start_count) !history
+ )
+
+let end_transaction txn con tid commit =
+ let success = Connection.end_transaction con tid commit in
+ trim ~txn ();
+ success
+
+let push (x: history_record) =
+ let dom = x.con.Connection.dom in
+ match dom with
+ | None -> () (* treat socket connections as always free to conflict *)
+ | Some d -> if not (Domain.is_free_to_conflict d) then history := x :: !history
+
+(* Find the connections from records since commit-count [since] for which [f record] returns [true] *)
+let filter_connections ~ignore ~since ~f =
+ (* The "mem" call is an optimisation, to avoid calling f if we have picked con already. *)
+ (* Using a hash table rather than a list is to optimise the "mem" call. *)
+ List.fold_left (fun acc hist_rec ->
+ if hist_rec.finish_count > since
+ && not (hist_rec.con == ignore)
+ && not (Hashtbl.mem acc hist_rec.con)
+ && f hist_rec
+ then Hashtbl.replace acc hist_rec.con ();
+ acc
+ ) (Hashtbl.create 1023) !history
diff -Nru xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/oxenstored.conf.in xen-4.8.1/tools/ocaml/xenstored/oxenstored.conf.in
--- xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/oxenstored.conf.in 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/ocaml/xenstored/oxenstored.conf.in 2017-04-10 14:21:48.000000000 +0100
@@ -9,6 +9,38 @@
# Activate transaction merge support
merge-activate = true
+# Limits applied to domains whose writes cause other domains' transaction
+# commits to fail. Must include decimal point.
+
+# The burst limit is the number of conflicts a domain can cause to
+# fail in a short period; this value is used for both the initial and
+# the maximum value of each domain's conflict-credit, which falls by
+# one point for each conflict caused, and when it reaches zero the
+# domain's requests are ignored.
+conflict-burst-limit = 5.0
+
+# The conflict-credit is replenished over time:
+# one point is issued after each conflict-max-history-seconds, so this
+# is the minimum pause-time during which a domain will be ignored.
+conflict-max-history-seconds = 0.05
+
+# If the conflict-rate-limit-is-aggregate flag is true then after each
+# tick one point of conflict-credit is given to just one domain: the
+# one at the front of the queue. If false, then after each tick each
+# domain gets a point of conflict-credit.
+#
+# In environments where it is known that every transaction will
+# involve a set of nodes that is writable by at most one other domain,
+# then it is safe to set this aggregate-limit flag to false for better
+# performance. (This can be determined by considering the layout of
+# the xenstore tree and permissions, together with the content of the
+# transactions that require protection.)
+#
+# A transaction which involves a set of nodes which can be modified by
+# multiple other domains can suffer conflicts caused by any of those
+# domains, so the flag must be set to true.
+conflict-rate-limit-is-aggregate = true
+
# Activate node permission system
perms-activate = true
diff -Nru xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/process.ml xen-4.8.1/tools/ocaml/xenstored/process.ml
--- xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/process.ml 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/ocaml/xenstored/process.ml 2017-04-10 14:21:48.000000000 +0100
@@ -16,6 +16,7 @@
let error fmt = Logging.error "process" fmt
let info fmt = Logging.info "process" fmt
+let debug fmt = Logging.debug "process" fmt
open Printf
open Stdext
@@ -25,6 +26,7 @@
exception Domain_not_match
exception Invalid_Cmd_Args
+(* This controls the do_debug fn in this module, not the debug logging-function. *)
let allow_debug = ref false
let c_int_of_string s =
@@ -293,6 +295,11 @@
| Packet.Reply x -> write_answer_log ~ty ~tid ~con ~data:x
| Packet.Error e -> write_answer_log ~ty:(Xenbus.Xb.Op.Error) ~tid ~con ~data:e
+let record_commit ~con ~tid ~before ~after =
+ let inc r = r := Int64.add 1L !r in
+ let finish_count = inc Transaction.counter; !Transaction.counter in
+ History.push {History.con=con; tid=tid; before=before; after=after; finish_count=finish_count}
+
(* Replay a stored transaction against a fresh store, check the responses are
all equivalent: if so, commit the transaction. Otherwise send the abort to
the client. *)
@@ -301,25 +308,57 @@
| Transaction.No ->
error "attempted to replay a non-full transaction";
false
- | Transaction.Full(id, oldroot, cstore) ->
+ | Transaction.Full(id, oldstore, cstore) ->
let tid = Connection.start_transaction c cstore in
- let new_t = Transaction.make tid cstore in
+ let replay_t = Transaction.make ~internal:true tid cstore in
let con = sprintf "r(%d):%s" id (Connection.get_domstr c) in
- let perform_exn (request, response) =
- write_access_log ~ty:request.Packet.ty ~tid ~con ~data:request.Packet.data;
+
+ let perform_exn ~wlog txn (request, response) =
+ if wlog then write_access_log ~ty:request.Packet.ty ~tid ~con ~data:request.Packet.data;
let fct = function_of_type_simple_op request.Packet.ty in
- let response' = input_handle_error ~cons ~doms ~fct ~con:c ~t:new_t ~req:request in
- write_response_log ~ty:request.Packet.ty ~tid ~con ~response:response';
- if not(Packet.response_equal response response') then raise Transaction_again in
+ let response' = input_handle_error ~cons ~doms ~fct ~con:c ~t:txn ~req:request in
+ if wlog then write_response_log ~ty:request.Packet.ty ~tid ~con ~response:response';
+ if not(Packet.response_equal response response') then raise Transaction_again
+ in
finally
(fun () ->
try
Logging.start_transaction ~con ~tid;
- List.iter perform_exn (Transaction.get_operations t);
- Logging.end_transaction ~con ~tid;
+ List.iter (perform_exn ~wlog:true replay_t) (Transaction.get_operations t); (* May throw EAGAIN *)
- Transaction.commit ~con new_t
- with e ->
+ Logging.end_transaction ~con ~tid;
+ Transaction.commit ~con replay_t
+ with
+ | Transaction_again -> (
+ Transaction.failed_commits := Int64.add !Transaction.failed_commits 1L;
+ let victim_domstr = Connection.get_domstr c in
+ debug "Apportioning blame for EAGAIN in txn %d, domain=%s" id victim_domstr;
+ let punish guilty_con =
+ debug "Blaming domain %s for conflict with domain %s txn %d"
+ (Connection.get_domstr guilty_con) victim_domstr id;
+ Connection.decr_conflict_credit doms guilty_con
+ in
+ let judge_and_sentence hist_rec = (
+ let can_apply_on store = (
+ let store = Store.copy store in
+ let trial_t = Transaction.make ~internal:true Transaction.none store in
+ try List.iter (perform_exn ~wlog:false trial_t) (Transaction.get_operations t);
+ true
+ with Transaction_again -> false
+ ) in
+ if can_apply_on hist_rec.History.before
+ && not (can_apply_on hist_rec.History.after)
+ then (punish hist_rec.History.con; true)
+ else false
+ ) in
+ let guilty_cons = History.filter_connections ~ignore:c ~since:t.Transaction.start_count ~f:judge_and_sentence in
+ if Hashtbl.length guilty_cons = 0 then (
+ debug "Found no culprit for conflict in %s: must be self or not in history." con;
+ Transaction.failed_commits_no_culprit := Int64.add !Transaction.failed_commits_no_culprit 1L
+ );
+ false
+ )
+ | e ->
info "transaction_replay %d caught: %s" tid (Printexc.to_string e);
false
)
@@ -358,13 +397,20 @@
| x :: _ -> raise (Invalid_argument x)
| _ -> raise Invalid_Cmd_Args
in
+ let commit = commit && not (Transaction.is_read_only t) in
let success =
let commit = if commit then Some (fun con trans -> transaction_replay con trans domains cons) else None in
- Connection.end_transaction con (Transaction.get_id t) commit in
+ History.end_transaction t con (Transaction.get_id t) commit in
if not success then
raise Transaction_again;
- if commit then
- process_watch (List.rev (Transaction.get_paths t)) cons
+ if commit then begin
+ process_watch (List.rev (Transaction.get_paths t)) cons;
+ match t.Transaction.ty with
+ | Transaction.No ->
+ () (* no need to record anything *)
+ | Transaction.Full(id, oldstore, cstore) ->
+ record_commit ~con ~tid:id ~before:oldstore ~after:cstore
+ end
let do_introduce con t domains cons data =
if not (Connection.is_dom0 con)
@@ -434,6 +480,37 @@
| _ -> function_of_type_simple_op ty
(**
+ * Determines which individual (non-transactional) operations we want to retain.
+ * We only want to retain operations that have side-effects in the store since
+ * these can be the cause of transactions failing.
+ *)
+let retain_op_in_history ty =
+ match ty with
+ | Xenbus.Xb.Op.Write
+ | Xenbus.Xb.Op.Mkdir
+ | Xenbus.Xb.Op.Rm
+ | Xenbus.Xb.Op.Setperms -> true
+ | Xenbus.Xb.Op.Debug
+ | Xenbus.Xb.Op.Directory
+ | Xenbus.Xb.Op.Read
+ | Xenbus.Xb.Op.Getperms
+ | Xenbus.Xb.Op.Watch
+ | Xenbus.Xb.Op.Unwatch
+ | Xenbus.Xb.Op.Transaction_start
+ | Xenbus.Xb.Op.Transaction_end
+ | Xenbus.Xb.Op.Introduce
+ | Xenbus.Xb.Op.Release
+ | Xenbus.Xb.Op.Getdomainpath
+ | Xenbus.Xb.Op.Watchevent
+ | Xenbus.Xb.Op.Error
+ | Xenbus.Xb.Op.Isintroduced
+ | Xenbus.Xb.Op.Resume
+ | Xenbus.Xb.Op.Set_target
+ | Xenbus.Xb.Op.Restrict
+ | Xenbus.Xb.Op.Reset_watches
+ | Xenbus.Xb.Op.Invalid -> false
+
+(**
* Nothrow guarantee.
*)
let process_packet ~store ~cons ~doms ~con ~req =
@@ -448,7 +525,19 @@
else
Connection.get_transaction con tid
in
- let response = input_handle_error ~cons ~doms ~fct ~con ~t ~req in
+
+ let execute () = input_handle_error ~cons ~doms ~fct ~con ~t ~req in
+
+ let response =
+ (* Note that transactions are recorded in history separately. *)
+ if tid = Transaction.none && retain_op_in_history ty then begin
+ let before = Store.copy store in
+ let response = execute () in
+ let after = Store.copy store in
+ record_commit ~con ~tid ~before ~after;
+ response
+ end else execute ()
+ in
let response = try
if tid <> Transaction.none then
diff -Nru xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/store.ml xen-4.8.1/tools/ocaml/xenstored/store.ml
--- xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/store.ml 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/ocaml/xenstored/store.ml 2017-04-10 14:21:48.000000000 +0100
@@ -211,6 +211,7 @@
lookup rnode path fct
end
+(* The Store.t type *)
type t =
{
mutable stat_transaction_coalesce: int;
diff -Nru xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/transaction.ml xen-4.8.1/tools/ocaml/xenstored/transaction.ml
--- xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/transaction.ml 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/ocaml/xenstored/transaction.ml 2017-04-10 14:21:48.000000000 +0100
@@ -14,6 +14,8 @@
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU Lesser General Public License for more details.
*)
+let error fmt = Logging.error "transaction" fmt
+
open Stdext
let none = 0
@@ -69,34 +71,73 @@
else
false
-type ty = No | Full of (int * Store.Node.t * Store.t)
+type ty = No | Full of (
+ int * (* Transaction id *)
+ Store.t * (* Original store *)
+ Store.t (* A pointer to the canonical store: its root changes on each transaction-commit *)
+)
type t = {
ty: ty;
- store: Store.t;
+ start_count: int64;
+ store: Store.t; (* This is the store that we change in write operations. *)
quota: Quota.t;
mutable paths: (Xenbus.Xb.Op.operation * Store.Path.t) list;
mutable operations: (Packet.request * Packet.response) list;
mutable read_lowpath: Store.Path.t option;
mutable write_lowpath: Store.Path.t option;
}
+let get_id t = match t.ty with No -> none | Full (id, _, _) -> id
-let make id store =
- let ty = if id = none then No else Full(id, Store.get_root store, store) in
- {
+let counter = ref 0L
+let failed_commits = ref 0L
+let failed_commits_no_culprit = ref 0L
+let reset_conflict_stats () =
+ failed_commits := 0L;
+ failed_commits_no_culprit := 0L
+
+(* Scope for optimisation: different data-structure and functions to search/filter it *)
+let short_running_txns = ref []
+
+let oldest_short_running_transaction () =
+ let rec last = function
+ | [] -> None
+ | [x] -> Some x
+ | x :: xs -> last xs
+ in last !short_running_txns
+
+let trim_short_running_transactions txn =
+ let cutoff = Unix.gettimeofday () -. !Define.conflict_max_history_seconds in
+ let keep = match txn with
+ | None -> (function (start_time, _) -> start_time >= cutoff)
+ | Some t -> (function (start_time, tx) -> start_time >= cutoff && tx != t)
+ in
+ short_running_txns := List.filter
+ keep
+ !short_running_txns
+
+let make ?(internal=false) id store =
+ let ty = if id = none then No else Full(id, Store.copy store, store) in
+ let txn = {
ty = ty;
+ start_count = !counter;
store = if id = none then store else Store.copy store;
quota = Quota.copy store.Store.quota;
paths = [];
operations = [];
read_lowpath = None;
write_lowpath = None;
- }
+ } in
+ if id <> none && not internal then (
+ let now = Unix.gettimeofday () in
+ short_running_txns := (now, txn) :: !short_running_txns
+ );
+ txn
-let get_id t = match t.ty with No -> none | Full (id, _, _) -> id
let get_store t = t.store
let get_paths t = t.paths
+let is_read_only t = t.paths = []
let add_wop t ty path = t.paths <- (ty, path) :: t.paths
let add_operation ~perm t request response =
if !Define.maxrequests >= 0
@@ -155,7 +196,7 @@
let has_commited =
match t.ty with
| No -> true
- | Full (id, oldroot, cstore) ->
+ | Full (id, oldstore, cstore) -> (* "cstore" meaning current canonical store *)
let commit_partial oldroot cstore store =
(* get the lowest path of the query and verify that it hasn't
been modified by others transactions. *)
@@ -198,7 +239,7 @@
if !test_eagain && Random.int 3 = 0 then
false
else
- try_commit oldroot cstore t.store
+ try_commit (Store.get_root oldstore) cstore t.store
in
if has_commited && has_write_ops then
Disk.write t.store;
diff -Nru xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/xenstored.ml xen-4.8.1/tools/ocaml/xenstored/xenstored.ml
--- xen-4.8.1~pre.2017.01.23/tools/ocaml/xenstored/xenstored.ml 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/ocaml/xenstored/xenstored.ml 2017-04-10 14:21:48.000000000 +0100
@@ -53,14 +53,16 @@
let process_domains store cons domains =
let do_io_domain domain =
- if not (Domain.is_bad_domain domain) then
- let io_credit = Domain.get_io_credit domain in
- if io_credit > 0 then (
- let con = Connections.find_domain cons (Domain.get_id domain) in
- Process.do_input store cons domains con;
- Process.do_output store cons domains con;
- Domain.decr_io_credit domain;
- ) in
+ if Domain.is_bad_domain domain
+ || Domain.get_io_credit domain <= 0
+ || Domain.is_paused_for_conflict domain
+ then () (* nothing to do *)
+ else (
+ let con = Connections.find_domain cons (Domain.get_id domain) in
+ Process.do_input store cons domains con;
+ Process.do_output store cons domains con;
+ Domain.decr_io_credit domain
+ ) in
Domains.iter domains do_io_domain
let sigusr1_handler store =
@@ -89,6 +91,9 @@
let pidfile = ref default_pidfile in
let options = [
("merge-activate", Config.Set_bool Transaction.do_coalesce);
+ ("conflict-burst-limit", Config.Set_float Define.conflict_burst_limit);
+ ("conflict-max-history-seconds", Config.Set_float Define.conflict_max_history_seconds);
+ ("conflict-rate-limit-is-aggregate", Config.Set_bool Define.conflict_rate_limit_is_aggregate);
("perms-activate", Config.Set_bool Perms.activate);
("quota-activate", Config.Set_bool Quota.activate);
("quota-maxwatch", Config.Set_int Define.maxwatch);
@@ -260,7 +265,23 @@
let store = Store.create () in
let eventchn = Event.init () in
- let domains = Domains.init eventchn in
+ let next_frequent_ops = ref 0. in
+ let advance_next_frequent_ops () =
+ next_frequent_ops := (Unix.gettimeofday () +. !Define.conflict_max_history_seconds)
+ in
+ let delay_next_frequent_ops_by duration =
+ next_frequent_ops := !next_frequent_ops +. duration
+ in
+ let domains = Domains.init eventchn advance_next_frequent_ops in
+
+ (* For things that need to be done periodically but more often
+ * than the periodic_ops function *)
+ let frequent_ops () =
+ if Unix.gettimeofday () > !next_frequent_ops then (
+ History.trim ();
+ Domains.incr_conflict_credit domains;
+ advance_next_frequent_ops ()
+ ) in
let cons = Connections.create () in
let quit = ref false in
@@ -356,6 +377,7 @@
let last_scan_time = ref 0. in
let periodic_ops now =
+ debug "periodic_ops starting";
(* we garbage collect the string->int dictionary after a sizeable amount of operations,
* there's no need to be really fast even if we got loose
* objects since names are often reuse.
@@ -365,6 +387,7 @@
Symbol.mark_all_as_unused ();
Store.mark_symbols store;
Connections.iter cons Connection.mark_symbols;
+ History.mark_symbols ();
Symbol.garbage ()
end;
@@ -374,7 +397,11 @@
(* make sure we don't print general stats faster than 2 min *)
if now > (!last_stat_time +. 120.) then (
+ info "Transaction conflict statistics for last %F seconds:" (now -. !last_stat_time);
last_stat_time := now;
+ Domains.iter domains (Domain.log_and_reset_conflict_stats (info "Dom%d caused %Ld conflicts"));
+ info "%Ld failed transactions; of these no culprit was found for %Ld" !Transaction.failed_commits !Transaction.failed_commits_no_culprit;
+ Transaction.reset_conflict_stats ();
let gc = Gc.stat () in
let (lanon, lanon_ops, lanon_watchs,
@@ -392,23 +419,38 @@
gc.Gc.heap_words gc.Gc.heap_chunks
gc.Gc.live_words gc.Gc.live_blocks
gc.Gc.free_words gc.Gc.free_blocks
- )
- in
+ );
+ let elapsed = Unix.gettimeofday () -. now in
+ debug "periodic_ops took %F seconds." elapsed;
+ delay_next_frequent_ops_by elapsed
+ in
- let period_ops_interval = 15. in
- let period_start = ref 0. in
+ let period_ops_interval = 15. in
+ let period_start = ref 0. in
let main_loop () =
-
+ let is_peaceful c =
+ match Connection.get_domain c with
+ | None -> true (* Treat socket-connections as exempt, and free to conflict. *)
+ | Some dom -> not (Domain.is_paused_for_conflict dom)
+ in
+ frequent_ops ();
let mw = Connections.has_more_work cons in
+ let peaceful_mw = List.filter is_peaceful mw in
List.iter
(fun c ->
match Connection.get_domain c with
| None -> () | Some d -> Domain.incr_io_credit d)
- mw;
+ peaceful_mw;
+ let start_time = Unix.gettimeofday () in
let timeout =
- if List.length mw > 0 then 0. else period_ops_interval in
- let inset, outset = Connections.select cons in
+ let until_next_activity =
+ if Domains.all_at_max_credit domains
+ then period_ops_interval
+ else min (max 0. (!next_frequent_ops -. start_time)) period_ops_interval in
+ if peaceful_mw <> [] then 0. else until_next_activity
+ in
+ let inset, outset = Connections.select ~only_if:is_peaceful cons in
let rset, wset, _ =
try
Select.select (spec_fds @ inset) outset [] timeout
@@ -418,6 +460,7 @@
List.partition (fun fd -> List.mem fd spec_fds) rset in
if List.length sfds > 0 then
process_special_fds sfds;
+
if List.length cfds > 0 || List.length wset > 0 then
process_connection_fds store cons domains cfds wset;
if timeout <> 0. then (
@@ -425,6 +468,7 @@
if now > !period_start +. period_ops_interval then
(period_start := now; periodic_ops now)
);
+
process_domains store cons domains
in
diff -Nru xen-4.8.1~pre.2017.01.23/tools/tests/x86_emulator/test_x86_emulator.c xen-4.8.1/tools/tests/x86_emulator/test_x86_emulator.c
--- xen-4.8.1~pre.2017.01.23/tools/tests/x86_emulator/test_x86_emulator.c 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/tests/x86_emulator/test_x86_emulator.c 2017-04-10 14:21:48.000000000 +0100
@@ -163,6 +163,18 @@
(ebx & (1U << 5)) != 0; \
})
+static int read_segment(
+ enum x86_segment seg,
+ struct segment_register *reg,
+ struct x86_emulate_ctxt *ctxt)
+{
+ if ( !is_x86_user_segment(seg) )
+ return X86EMUL_UNHANDLEABLE;
+ memset(reg, 0, sizeof(*reg));
+ reg->attr.fields.p = 1;
+ return X86EMUL_OKAY;
+}
+
static int read_cr(
unsigned int reg,
unsigned long *val,
@@ -215,6 +227,7 @@
.write = write,
.cmpxchg = cmpxchg,
.cpuid = cpuid,
+ .read_segment = read_segment,
.read_cr = read_cr,
.get_fpu = get_fpu,
};
@@ -732,6 +745,27 @@
goto fail;
printf("okay\n");
+ printf("%-40s", "Testing mov %%cr4,%%esi (bad ModRM)...");
+ /*
+ * Mod = 1, Reg = 4, R/M = 6 would normally encode a memory reference of
+ * disp8(%esi), but mov to/from cr/dr are special and behave as if they
+ * were encoded with Mod == 3.
+ */
+ instr[0] = 0x0f; instr[1] = 0x20, instr[2] = 0x66;
+ instr[3] = 0; /* Supposed disp8. */
+ regs.esi = 0;
+ regs.eip = (unsigned long)&instr[0];
+ rc = x86_emulate(&ctxt, &emulops);
+ /*
+ * We don't care precicely what gets read from %cr4 into %esi, just so
+ * long as ModRM is treated as a register operand and 0(%esi) isn't
+ * followed as a memory reference.
+ */
+ if ( (rc != X86EMUL_OKAY) ||
+ (regs.eip != (unsigned long)&instr[3]) )
+ goto fail;
+ printf("okay\n");
+
#define decl_insn(which) extern const unsigned char which[], which##_len[]
#define put_insn(which, insn) ".pushsection .test, \"ax\", @progbits\n" \
#which ": " insn "\n" \
diff -Nru xen-4.8.1~pre.2017.01.23/tools/xenstore/Makefile xen-4.8.1/tools/xenstore/Makefile
--- xen-4.8.1~pre.2017.01.23/tools/xenstore/Makefile 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/xenstore/Makefile 2017-04-10 14:21:48.000000000 +0100
@@ -32,6 +32,7 @@
XENSTORED_OBJS_$(CONFIG_MiniOS) = xenstored_minios.o
XENSTORED_OBJS += $(XENSTORED_OBJS_y)
+LDLIBS_xenstored += -lrt
ifneq ($(XENSTORE_STATIC_CLIENTS),y)
LIBXENSTORE := libxenstore.so
@@ -73,7 +74,7 @@
$(XENSTORED_OBJS): CFLAGS += $(CFLAGS_libxengnttab)
xenstored: $(XENSTORED_OBJS)
- $(CC) $^ $(LDFLAGS) $(LDLIBS_libxenevtchn) $(LDLIBS_libxengnttab) $(LDLIBS_libxenctrl) $(SOCKET_LIBS) -o $@ $(APPEND_LDFLAGS)
+ $(CC) $^ $(LDFLAGS) $(LDLIBS_libxenevtchn) $(LDLIBS_libxengnttab) $(LDLIBS_libxenctrl) $(LDLIBS_xenstored) $(SOCKET_LIBS) -o $@ $(APPEND_LDFLAGS)
xenstored.a: $(XENSTORED_OBJS)
$(AR) cr $@ $^
diff -Nru xen-4.8.1~pre.2017.01.23/tools/xenstore/xenstored_core.c xen-4.8.1/tools/xenstore/xenstored_core.c
--- xen-4.8.1~pre.2017.01.23/tools/xenstore/xenstored_core.c 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/xenstore/xenstored_core.c 2017-04-10 14:21:48.000000000 +0100
@@ -358,6 +358,7 @@
int *ptimeout)
{
struct connection *conn;
+ struct wrl_timestampt now;
if (fds)
memset(fds, 0, sizeof(struct pollfd) * current_array_size);
@@ -377,8 +378,12 @@
xce_pollfd_idx = set_fd(xenevtchn_fd(xce_handle),
POLLIN|POLLPRI);
+ wrl_gettime_now(&now);
+ wrl_log_periodic(now);
+
list_for_each_entry(conn, &connections, list) {
if (conn->domain) {
+ wrl_check_timeout(conn->domain, now, ptimeout);
if (domain_can_read(conn) ||
(domain_can_write(conn) &&
!list_empty(&conn->out_list)))
@@ -833,6 +838,7 @@
corrupt(conn, "Could not delete '%s'", node->name);
return;
}
+
domain_entry_dec(conn, node);
}
@@ -972,6 +978,7 @@
}
add_change_node(conn->transaction, name, false);
+ wrl_apply_debit_direct(conn);
fire_watches(conn, in, name, false);
send_ack(conn, XS_WRITE);
}
@@ -1003,6 +1010,7 @@
return;
}
add_change_node(conn->transaction, name, false);
+ wrl_apply_debit_direct(conn);
fire_watches(conn, in, name, false);
}
send_ack(conn, XS_MKDIR);
@@ -1129,6 +1137,7 @@
if (_rm(conn, node, name)) {
add_change_node(conn->transaction, name, true);
+ wrl_apply_debit_direct(conn);
fire_watches(conn, in, name, true);
send_ack(conn, XS_RM);
}
@@ -1205,6 +1214,7 @@
}
add_change_node(conn->transaction, name, false);
+ wrl_apply_debit_direct(conn);
fire_watches(conn, in, name, false);
send_ack(conn, XS_SET_PERMS);
}
diff -Nru xen-4.8.1~pre.2017.01.23/tools/xenstore/xenstored_core.h xen-4.8.1/tools/xenstore/xenstored_core.h
--- xen-4.8.1~pre.2017.01.23/tools/xenstore/xenstored_core.h 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/xenstore/xenstored_core.h 2017-04-10 14:21:48.000000000 +0100
@@ -33,6 +33,12 @@
#include "list.h"
#include "tdb.h"
+#define MIN(a, b) (((a) < (b))? (a) : (b))
+
+typedef int32_t wrl_creditt;
+#define WRL_CREDIT_MAX (1000*1000*1000)
+/* ^ satisfies non-overflow condition for wrl_xfer_credit */
+
struct buffered_data
{
struct list_head list;
diff -Nru xen-4.8.1~pre.2017.01.23/tools/xenstore/xenstored_domain.c xen-4.8.1/tools/xenstore/xenstored_domain.c
--- xen-4.8.1~pre.2017.01.23/tools/xenstore/xenstored_domain.c 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/xenstore/xenstored_domain.c 2017-04-10 14:21:48.000000000 +0100
@@ -21,6 +21,8 @@
#include <unistd.h>
#include <stdlib.h>
#include <stdarg.h>
+#include <time.h>
+#include <syslog.h>
#include "utils.h"
#include "talloc.h"
@@ -74,6 +76,11 @@
/* number of watch for this domain */
int nbwatch;
+
+ /* write rate limit */
+ wrl_creditt wrl_credit; /* [ -wrl_config_writecost, +_dburst ] */
+ struct wrl_timestampt wrl_timestamp;
+ bool wrl_delay_logged;
};
static LIST_HEAD(domains);
@@ -206,6 +213,8 @@
fire_watches(NULL, domain, "@releaseDomain", false);
+ wrl_domain_destroy(domain);
+
return 0;
}
@@ -253,6 +262,9 @@
bool domain_can_read(struct connection *conn)
{
struct xenstore_domain_interface *intf = conn->domain->interface;
+
+ if (domain_is_unprivileged(conn) && conn->domain->wrl_credit < 0)
+ return false;
return (intf->req_cons != intf->req_prod);
}
@@ -284,6 +296,8 @@
domain->domid = domid;
domain->path = talloc_domain_path(domain, domid);
+ wrl_domain_new(domain);
+
list_add(&domain->list, &domains);
talloc_set_destructor(domain, destroy_domain);
@@ -751,6 +765,233 @@
: 0;
}
+static wrl_creditt wrl_config_writecost = WRL_FACTOR;
+static wrl_creditt wrl_config_rate = WRL_RATE * WRL_FACTOR;
+static wrl_creditt wrl_config_dburst = WRL_DBURST * WRL_FACTOR;
+static wrl_creditt wrl_config_gburst = WRL_GBURST * WRL_FACTOR;
+static wrl_creditt wrl_config_newdoms_dburst =
+ WRL_DBURST * WRL_NEWDOMS * WRL_FACTOR;
+
+long wrl_ntransactions;
+
+static long wrl_ndomains;
+static wrl_creditt wrl_reserve; /* [-wrl_config_newdoms_dburst, +_gburst ] */
+static time_t wrl_log_last_warning; /* 0: no previous warning */
+
+void wrl_gettime_now(struct wrl_timestampt *now_wt)
+{
+ struct timespec now_ts;
+ int r;
+
+ r = clock_gettime(CLOCK_MONOTONIC, &now_ts);
+ if (r)
+ barf_perror("Could not find time (clock_gettime failed)");
+
+ now_wt->sec = now_ts.tv_sec;
+ now_wt->msec = now_ts.tv_nsec / 1000000;
+}
+
+static void wrl_xfer_credit(wrl_creditt *debit, wrl_creditt debit_floor,
+ wrl_creditt *credit, wrl_creditt credit_ceil)
+ /*
+ * Transfers zero or more credit from "debit" to "credit".
+ * Transfers as much as possible while maintaining
+ * debit >= debit_floor and credit <= credit_ceil.
+ * (If that's violated already, does nothing.)
+ *
+ * Sufficient conditions to avoid overflow, either of:
+ * |every argument| <= 0x3fffffff
+ * |every argument| <= 1E9
+ * |every argument| <= WRL_CREDIT_MAX
+ * (And this condition is preserved.)
+ */
+{
+ wrl_creditt xfer = MIN( *debit - debit_floor,
+ credit_ceil - *credit );
+ if (xfer > 0) {
+ *debit -= xfer;
+ *credit += xfer;
+ }
+}
+
+void wrl_domain_new(struct domain *domain)
+{
+ domain->wrl_credit = 0;
+ wrl_gettime_now(&domain->wrl_timestamp);
+ wrl_ndomains++;
+ /* Steal up to DBURST from the reserve */
+ wrl_xfer_credit(&wrl_reserve, -wrl_config_newdoms_dburst,
+ &domain->wrl_credit, wrl_config_dburst);
+}
+
+void wrl_domain_destroy(struct domain *domain)
+{
+ wrl_ndomains--;
+ /*
+ * Don't bother recalculating domain's credit - this just
+ * means we don't give the reserve the ending domain's credit
+ * for time elapsed since last update.
+ */
+ wrl_xfer_credit(&domain->wrl_credit, 0,
+ &wrl_reserve, wrl_config_dburst);
+}
+
+void wrl_credit_update(struct domain *domain, struct wrl_timestampt now)
+{
+ /*
+ * We want to calculate
+ * credit += (now - timestamp) * RATE / ndoms;
+ * But we want it to saturate, and to avoid floating point.
+ * To avoid rounding errors from constantly adding small
+ * amounts of credit, we only add credit for whole milliseconds.
+ */
+ long seconds = now.sec - domain->wrl_timestamp.sec;
+ long milliseconds = now.msec - domain->wrl_timestamp.msec;
+ long msec;
+ int64_t denom, num;
+ wrl_creditt surplus;
+
+ seconds = MIN(seconds, 1000*1000); /* arbitrary, prevents overflow */
+ msec = seconds * 1000 + milliseconds;
+
+ if (msec < 0)
+ /* shouldn't happen with CLOCK_MONOTONIC */
+ msec = 0;
+
+ /* 32x32 -> 64 cannot overflow */
+ denom = (int64_t)msec * wrl_config_rate;
+ num = (int64_t)wrl_ndomains * 1000;
+ /* denom / num <= 1E6 * wrl_config_rate, so with
+ reasonable wrl_config_rate, denom / num << 2^64 */
+
+ /* at last! */
+ domain->wrl_credit = MIN( (int64_t)domain->wrl_credit + denom / num,
+ WRL_CREDIT_MAX );
+ /* (maybe briefly violating the DBURST cap on wrl_credit) */
+
+ /* maybe take from the reserve to make us nonnegative */
+ wrl_xfer_credit(&wrl_reserve, 0,
+ &domain->wrl_credit, 0);
+
+ /* return any surplus (over DBURST) to the reserve */
+ surplus = 0;
+ wrl_xfer_credit(&domain->wrl_credit, wrl_config_dburst,
+ &surplus, WRL_CREDIT_MAX);
+ wrl_xfer_credit(&surplus, 0,
+ &wrl_reserve, wrl_config_gburst);
+ /* surplus is now implicitly discarded */
+
+ domain->wrl_timestamp = now;
+
+ trace("wrl: dom %4d %6ld msec %9ld credit %9ld reserve"
+ " %9ld discard\n",
+ domain->domid,
+ msec,
+ (long)domain->wrl_credit, (long)wrl_reserve,
+ (long)surplus);
+}
+
+void wrl_check_timeout(struct domain *domain,
+ struct wrl_timestampt now,
+ int *ptimeout)
+{
+ uint64_t num, denom;
+ int wakeup;
+
+ wrl_credit_update(domain, now);
+
+ if (domain->wrl_credit >= 0)
+ /* not blocked */
+ return;
+
+ if (!*ptimeout)
+ /* already decided on immediate wakeup,
+ so no need to calculate our timeout */
+ return;
+
+ /* calculate wakeup = now + -credit / (RATE / ndoms); */
+
+ /* credit cannot go more -ve than one transaction,
+ * so the first multiplication cannot overflow even 32-bit */
+ num = (uint64_t)(-domain->wrl_credit * 1000) * wrl_ndomains;
+ denom = wrl_config_rate;
+
+ wakeup = MIN( num / denom /* uint64_t */, INT_MAX );
+ if (*ptimeout==-1 || wakeup < *ptimeout)
+ *ptimeout = wakeup;
+
+ trace("wrl: domain %u credit=%ld (reserve=%ld) SLEEPING for %d\n",
+ domain->domid,
+ (long)domain->wrl_credit, (long)wrl_reserve,
+ wakeup);
+}
+
+#define WRL_LOG(now, ...) \
+ (syslog(LOG_WARNING, "write rate limit: " __VA_ARGS__))
+
+void wrl_apply_debit_actual(struct domain *domain)
+{
+ struct wrl_timestampt now;
+
+ if (!domain)
+ /* sockets escape the write rate limit */
+ return;
+
+ wrl_gettime_now(&now);
+ wrl_credit_update(domain, now);
+
+ domain->wrl_credit -= wrl_config_writecost;
+ trace("wrl: domain %u credit=%ld (reserve=%ld)\n",
+ domain->domid,
+ (long)domain->wrl_credit, (long)wrl_reserve);
+
+ if (domain->wrl_credit < 0) {
+ if (!domain->wrl_delay_logged) {
+ domain->wrl_delay_logged = true;
+ WRL_LOG(now, "domain %ld is affected",
+ (long)domain->domid);
+ } else if (!wrl_log_last_warning) {
+ WRL_LOG(now, "rate limiting restarts");
+ }
+ wrl_log_last_warning = now.sec;
+ }
+}
+
+void wrl_log_periodic(struct wrl_timestampt now)
+{
+ if (wrl_log_last_warning &&
+ (now.sec - wrl_log_last_warning) > WRL_LOGEVERY) {
+ WRL_LOG(now, "not in force recently");
+ wrl_log_last_warning = 0;
+ }
+}
+
+void wrl_apply_debit_direct(struct connection *conn)
+{
+ if (!conn)
+ /* some writes are generated internally */
+ return;
+
+ if (conn->transaction)
+ /* these are accounted for when the transaction ends */
+ return;
+
+ if (!wrl_ntransactions)
+ /* we don't conflict with anyone */
+ return;
+
+ wrl_apply_debit_actual(conn->domain);
+}
+
+void wrl_apply_debit_trans_commit(struct connection *conn)
+{
+ if (wrl_ntransactions <= 1)
+ /* our own transaction appears in the counter */
+ return;
+
+ wrl_apply_debit_actual(conn->domain);
+}
+
/*
* Local variables:
* c-file-style: "linux"
diff -Nru xen-4.8.1~pre.2017.01.23/tools/xenstore/xenstored_domain.h xen-4.8.1/tools/xenstore/xenstored_domain.h
--- xen-4.8.1~pre.2017.01.23/tools/xenstore/xenstored_domain.h 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/xenstore/xenstored_domain.h 2017-04-10 14:21:48.000000000 +0100
@@ -65,4 +65,31 @@
void domain_watch_dec(struct connection *conn);
int domain_watch(struct connection *conn);
+/* Write rate limiting */
+
+#define WRL_FACTOR 1000 /* for fixed-point arithmetic */
+#define WRL_RATE 200
+#define WRL_DBURST 10
+#define WRL_GBURST 1000
+#define WRL_NEWDOMS 5
+#define WRL_LOGEVERY 120 /* seconds */
+
+struct wrl_timestampt {
+ time_t sec;
+ int msec;
+};
+
+extern long wrl_ntransactions;
+
+void wrl_gettime_now(struct wrl_timestampt *now_ts);
+void wrl_domain_new(struct domain *domain);
+void wrl_domain_destroy(struct domain *domain);
+void wrl_credit_update(struct domain *domain, struct wrl_timestampt now);
+void wrl_check_timeout(struct domain *domain,
+ struct wrl_timestampt now,
+ int *ptimeout);
+void wrl_log_periodic(struct wrl_timestampt now);
+void wrl_apply_debit_direct(struct connection *conn);
+void wrl_apply_debit_trans_commit(struct connection *conn);
+
#endif /* _XENSTORED_DOMAIN_H */
diff -Nru xen-4.8.1~pre.2017.01.23/tools/xenstore/xenstored_transaction.c xen-4.8.1/tools/xenstore/xenstored_transaction.c
--- xen-4.8.1~pre.2017.01.23/tools/xenstore/xenstored_transaction.c 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/tools/xenstore/xenstored_transaction.c 2017-04-10 14:21:48.000000000 +0100
@@ -120,6 +120,7 @@
{
struct transaction *trans = _transaction;
+ wrl_ntransactions--;
trace_destroy(trans, "transaction");
if (trans->tdb)
tdb_close(trans->tdb);
@@ -183,6 +184,7 @@
talloc_steal(conn, trans);
talloc_set_destructor(trans, destroy_transaction);
conn->transaction_started++;
+ wrl_ntransactions++;
snprintf(id_str, sizeof(id_str), "%u", trans->id);
send_reply(conn, XS_TRANSACTION_START, id_str, strlen(id_str)+1);
@@ -218,6 +220,9 @@
send_error(conn, EAGAIN);
return;
}
+
+ wrl_apply_debit_trans_commit(conn);
+
if (!replace_tdb(trans->tdb_name, trans->tdb)) {
send_error(conn, errno);
return;
diff -Nru xen-4.8.1~pre.2017.01.23/xen/Makefile xen-4.8.1/xen/Makefile
--- xen-4.8.1~pre.2017.01.23/xen/Makefile 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/Makefile 2017-04-10 14:21:48.000000000 +0100
@@ -2,7 +2,7 @@
# All other places this is stored (eg. compile.h) should be autogenerated.
export XEN_VERSION = 4
export XEN_SUBVERSION = 8
-export XEN_EXTRAVERSION ?= .1-pre$(XEN_VENDORVERSION)
+export XEN_EXTRAVERSION ?= .1$(XEN_VENDORVERSION)
export XEN_FULLVERSION = $(XEN_VERSION).$(XEN_SUBVERSION)$(XEN_EXTRAVERSION)
-include xen-version
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/arm/alternative.c xen-4.8.1/xen/arch/arm/alternative.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/arm/alternative.c 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/arm/alternative.c 2017-04-10 14:21:48.000000000 +0100
@@ -25,6 +25,7 @@
#include <xen/vmap.h>
#include <xen/smp.h>
#include <xen/stop_machine.h>
+#include <xen/virtual_region.h>
#include <asm/alternative.h>
#include <asm/atomic.h>
#include <asm/byteorder.h>
@@ -155,8 +156,12 @@
int ret;
struct alt_region region;
mfn_t xen_mfn = _mfn(virt_to_mfn(_start));
- unsigned int xen_order = get_order_from_bytes(_end - _start);
+ paddr_t xen_size = _end - _start;
+ unsigned int xen_order = get_order_from_bytes(xen_size);
void *xenmap;
+ struct virtual_region patch_region = {
+ .list = LIST_HEAD_INIT(patch_region.list),
+ };
BUG_ON(patched);
@@ -170,6 +175,15 @@
BUG_ON(!xenmap);
/*
+ * If we generate a new branch instruction, the target will be
+ * calculated in this re-mapped Xen region. So we have to register
+ * this re-mapped Xen region as a virtual region temporarily.
+ */
+ patch_region.start = xenmap;
+ patch_region.end = xenmap + xen_size;
+ register_virtual_region(&patch_region);
+
+ /*
* Find the virtual address of the alternative region in the new
* mapping.
* alt_instr contains relative offset, so the function
@@ -183,6 +197,8 @@
/* The patching is not expected to fail during boot. */
BUG_ON(ret != 0);
+ unregister_virtual_region(&patch_region);
+
vunmap(xenmap);
/* Barriers provided by the cache flushing */
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/arm/domain_build.c xen-4.8.1/xen/arch/arm/domain_build.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/arm/domain_build.c 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/arm/domain_build.c 2017-04-10 14:21:48.000000000 +0100
@@ -48,20 +48,6 @@
p2m_type_t p2mt;
};
-static const struct dt_device_match dev_map_attrs[] __initconst =
-{
- {
- __DT_MATCH_COMPATIBLE("mmio-sram"),
- __DT_MATCH_PROP("no-memory-wc"),
- .data = (void *) (uintptr_t) p2m_mmio_direct_dev,
- },
- {
- __DT_MATCH_COMPATIBLE("mmio-sram"),
- .data = (void *) (uintptr_t) p2m_mmio_direct_nc,
- },
- { /* sentinel */ },
-};
-
//#define DEBUG_11_ALLOCATION
#ifdef DEBUG_11_ALLOCATION
# define D11PRINT(fmt, args...) printk(XENLOG_DEBUG fmt, ##args)
@@ -1159,21 +1145,6 @@
return 0;
}
-static p2m_type_t lookup_map_attr(struct dt_device_node *node,
- p2m_type_t parent_p2mt)
-{
- const struct dt_device_match *r;
-
- /* Search and if nothing matches, use the parent's attributes. */
- r = dt_match_node(dev_map_attrs, node);
-
- /*
- * If this node does not dictate specific mapping attributes,
- * it inherits its parent's attributes.
- */
- return r ? (uintptr_t) r->data : parent_p2mt;
-}
-
static int handle_node(struct domain *d, struct kernel_info *kinfo,
struct dt_device_node *node,
p2m_type_t p2mt)
@@ -1264,7 +1235,6 @@
"WARNING: Path %s is reserved, skip the node as we may re-use the path.\n",
path);
- p2mt = lookup_map_attr(node, p2mt);
res = handle_device(d, node, p2mt);
if ( res)
return res;
@@ -1319,7 +1289,7 @@
static int prepare_dtb(struct domain *d, struct kernel_info *kinfo)
{
- const p2m_type_t default_p2mt = p2m_mmio_direct_dev;
+ const p2m_type_t default_p2mt = p2m_mmio_direct_c;
const void *fdt;
int new_size;
int ret;
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/arm/gic.c xen-4.8.1/xen/arch/arm/gic.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/arm/gic.c 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/arm/gic.c 2017-04-10 14:21:48.000000000 +0100
@@ -205,7 +205,10 @@
*/
if ( test_bit(_IRQ_INPROGRESS, &desc->status) ||
!test_bit(_IRQ_DISABLED, &desc->status) )
+ {
+ vgic_unlock_rank(v_target, rank, flags);
return -EBUSY;
+ }
}
clear_bit(_IRQ_GUEST, &desc->status);
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/arm/irq.c xen-4.8.1/xen/arch/arm/irq.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/arm/irq.c 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/arm/irq.c 2017-04-10 14:21:48.000000000 +0100
@@ -477,26 +477,32 @@
*/
if ( desc->action != NULL )
{
- struct domain *ad = irq_get_domain(desc);
-
- if ( test_bit(_IRQ_GUEST, &desc->status) && d == ad )
+ if ( test_bit(_IRQ_GUEST, &desc->status) )
{
- if ( irq_get_guest_info(desc)->virq != virq )
+ struct domain *ad = irq_get_domain(desc);
+
+ if ( d == ad )
+ {
+ if ( irq_get_guest_info(desc)->virq != virq )
+ {
+ printk(XENLOG_G_ERR
+ "d%u: IRQ %u is already assigned to vIRQ %u\n",
+ d->domain_id, irq, irq_get_guest_info(desc)->virq);
+ retval = -EBUSY;
+ }
+ }
+ else
{
- printk(XENLOG_G_ERR
- "d%u: IRQ %u is already assigned to vIRQ %u\n",
- d->domain_id, irq, irq_get_guest_info(desc)->virq);
+ printk(XENLOG_G_ERR "IRQ %u is already used by domain %u\n",
+ irq, ad->domain_id);
retval = -EBUSY;
}
- goto out;
}
-
- if ( test_bit(_IRQ_GUEST, &desc->status) )
- printk(XENLOG_G_ERR "IRQ %u is already used by domain %u\n",
- irq, ad->domain_id);
else
+ {
printk(XENLOG_G_ERR "IRQ %u is already used by Xen\n", irq);
- retval = -EBUSY;
+ retval = -EBUSY;
+ }
goto out;
}
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/arm/mm.c xen-4.8.1/xen/arch/arm/mm.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/arm/mm.c 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/arm/mm.c 2017-04-10 14:21:48.000000000 +0100
@@ -390,6 +390,16 @@
clean_and_invalidate_dcache_va_range(v, PAGE_SIZE);
unmap_domain_page(v);
+
+ /*
+ * For some of the instruction cache (such as VIPT), the entire I-Cache
+ * needs to be flushed to guarantee that all the aliases of a given
+ * physical address will be removed from the cache.
+ * Invalidating the I-Cache by VA highly depends on the behavior of the
+ * I-Cache (See D4.9.2 in ARM DDI 0487A.k_iss10775). Instead of using flush
+ * by VA on select platforms, we just flush the entire cache here.
+ */
+ invalidate_icache();
}
void __init arch_init_memory(void)
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/arm/p2m.c xen-4.8.1/xen/arch/arm/p2m.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/arm/p2m.c 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/arm/p2m.c 2017-04-10 14:21:48.000000000 +0100
@@ -135,13 +135,12 @@
{
register_t hcr;
struct p2m_domain *p2m = &n->domain->arch.p2m;
+ uint8_t *last_vcpu_ran;
if ( is_idle_vcpu(n) )
return;
hcr = READ_SYSREG(HCR_EL2);
- WRITE_SYSREG(hcr & ~HCR_VM, HCR_EL2);
- isb();
WRITE_SYSREG64(p2m->vttbr, VTTBR_EL2);
isb();
@@ -156,6 +155,17 @@
WRITE_SYSREG(hcr, HCR_EL2);
isb();
+
+ last_vcpu_ran = &p2m->last_vcpu_ran[smp_processor_id()];
+
+ /*
+ * Flush local TLB for the domain to prevent wrong TLB translation
+ * when running multiple vCPU of the same domain on a single pCPU.
+ */
+ if ( *last_vcpu_ran != INVALID_VCPU_ID && *last_vcpu_ran != n->vcpu_id )
+ flush_tlb_local();
+
+ *last_vcpu_ran = n->vcpu_id;
}
static void p2m_flush_tlb(struct p2m_domain *p2m)
@@ -734,6 +744,7 @@
unsigned int i;
lpae_t *table;
mfn_t mfn;
+ struct page_info *pg;
/* Nothing to do if the entry is invalid. */
if ( !p2m_valid(entry) )
@@ -771,7 +782,10 @@
mfn = _mfn(entry.p2m.base);
ASSERT(mfn_valid(mfn_x(mfn)));
- free_domheap_page(mfn_to_page(mfn_x(mfn)));
+ pg = mfn_to_page(mfn_x(mfn));
+
+ page_list_del(pg, &p2m->pages);
+ free_domheap_page(pg);
}
static bool p2m_split_superpage(struct p2m_domain *p2m, lpae_t *entry,
@@ -982,9 +996,10 @@
/*
* The radix-tree can only work on 4KB. This is only used when
- * memaccess is enabled.
+ * memaccess is enabled and during shutdown.
*/
- ASSERT(!p2m->mem_access_enabled || page_order == 0);
+ ASSERT(!p2m->mem_access_enabled || page_order == 0 ||
+ p2m->domain->is_dying);
/*
* The access type should always be p2m_access_rwx when the mapping
* is removed.
@@ -1176,7 +1191,7 @@
if ( !(nr && iomem_access_permitted(d, mfn_x(mfn), mfn_x(mfn) + nr - 1)) )
return 0;
- res = map_mmio_regions(d, gfn, nr, mfn);
+ res = p2m_insert_mapping(d, gfn, nr, mfn, p2m_mmio_direct_c);
if ( res < 0 )
{
printk(XENLOG_G_ERR "Unable to map MFNs [%#"PRI_mfn" - %#"PRI_mfn" in Dom%d\n",
@@ -1308,6 +1323,7 @@
{
struct p2m_domain *p2m = &d->arch.p2m;
int rc = 0;
+ unsigned int cpu;
rwlock_init(&p2m->lock);
INIT_PAGE_LIST_HEAD(&p2m->pages);
@@ -1336,6 +1352,17 @@
rc = p2m_alloc_table(d);
+ /*
+ * Make sure that the type chosen to is able to store the an vCPU ID
+ * between 0 and the maximum of virtual CPUS supported as long as
+ * the INVALID_VCPU_ID.
+ */
+ BUILD_BUG_ON((1 << (sizeof(p2m->last_vcpu_ran[0]) * 8)) < MAX_VIRT_CPUS);
+ BUILD_BUG_ON((1 << (sizeof(p2m->last_vcpu_ran[0])* 8)) < INVALID_VCPU_ID);
+
+ for_each_possible_cpu(cpu)
+ p2m->last_vcpu_ran[cpu] = INVALID_VCPU_ID;
+
return rc;
}
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/arm/psci.c xen-4.8.1/xen/arch/arm/psci.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/arm/psci.c 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/arm/psci.c 2017-04-10 14:21:48.000000000 +0100
@@ -147,7 +147,7 @@
psci_ver = call_smc(PSCI_0_2_FN_PSCI_VERSION, 0, 0, 0);
/* For the moment, we only support PSCI 0.2 and PSCI 1.x */
- if ( psci_ver != PSCI_VERSION(0, 2) && PSCI_VERSION_MAJOR(psci_ver != 1) )
+ if ( psci_ver != PSCI_VERSION(0, 2) && PSCI_VERSION_MAJOR(psci_ver) != 1 )
{
printk("Error: Unrecognized PSCI version %u.%u\n",
PSCI_VERSION_MAJOR(psci_ver), PSCI_VERSION_MINOR(psci_ver));
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/arm/setup.c xen-4.8.1/xen/arch/arm/setup.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/arm/setup.c 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/arm/setup.c 2017-04-10 14:21:48.000000000 +0100
@@ -784,6 +784,8 @@
smp_init_cpus();
cpus = smp_get_max_cpus();
+ printk(XENLOG_INFO "SMP: Allowing %u CPUs\n", cpus);
+ nr_cpu_ids = cpus;
init_xen_time();
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/arm/traps.c xen-4.8.1/xen/arch/arm/traps.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/arm/traps.c 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/arm/traps.c 2017-04-10 14:21:48.000000000 +0100
@@ -101,6 +101,19 @@
integer_param("debug_stack_lines", debug_stack_lines);
+static enum {
+ TRAP,
+ NATIVE,
+} vwfi;
+
+static void __init parse_vwfi(const char *s)
+{
+ if ( !strcmp(s, "native") )
+ vwfi = NATIVE;
+ else
+ vwfi = TRAP;
+}
+custom_param("vwfi", parse_vwfi);
void init_traps(void)
{
@@ -127,8 +140,8 @@
/* Setup hypervisor traps */
WRITE_SYSREG(HCR_PTW|HCR_BSU_INNER|HCR_AMO|HCR_IMO|HCR_FMO|HCR_VM|
- HCR_TWE|HCR_TWI|HCR_TSC|HCR_TAC|HCR_SWIO|HCR_TIDCP|HCR_FB,
- HCR_EL2);
+ (vwfi != NATIVE ? (HCR_TWI|HCR_TWE) : 0) |
+ HCR_TSC|HCR_TAC|HCR_SWIO|HCR_TIDCP|HCR_FB,HCR_EL2);
isb();
}
@@ -643,7 +656,7 @@
};
mode = cpsr & PSR_MODE_MASK;
- if ( mode > ARRAY_SIZE(mode_strings) )
+ if ( mode >= ARRAY_SIZE(mode_strings) )
return "Unknown";
return mode_strings[mode] ? : "Unknown";
}
@@ -2280,6 +2293,20 @@
return inject_undef64_exception(regs, hsr.len);
/*
+ * ICC_SRE_EL2.Enable = 0
+ *
+ * GIC Architecture Specification (IHI 0069C): Section 8.1.9
+ */
+ case HSR_SYSREG_ICC_SRE_EL1:
+ /*
+ * Trapped when the guest is using GICv2 whilst the platform
+ * interrupt controller is GICv3. In this case, the register
+ * should be emulate as RAZ/WI to tell the guest to use the GIC
+ * memory mapped interface (i.e GICv2 compatibility).
+ */
+ return handle_raz_wi(regs, regidx, hsr.sysreg.read, hsr, 1);
+
+ /*
* HCR_EL2.TIDCP
*
* ARMv8 (DDI 0487A.d): D1-1501 Table D1-43
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/arm/vgic-v2.c xen-4.8.1/xen/arch/arm/vgic-v2.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/arm/vgic-v2.c 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/arm/vgic-v2.c 2017-04-10 14:21:48.000000000 +0100
@@ -79,7 +79,7 @@
offset &= ~(NR_TARGETS_PER_ITARGETSR - 1);
for ( i = 0; i < NR_TARGETS_PER_ITARGETSR; i++, offset++ )
- reg |= (1 << rank->vcpu[offset]) << (i * NR_BITS_PER_TARGET);
+ reg |= (1 << read_atomic(&rank->vcpu[offset])) << (i * NR_BITS_PER_TARGET);
return reg;
}
@@ -152,7 +152,7 @@
/* The vCPU ID always starts from 0 */
new_target--;
- old_target = rank->vcpu[offset];
+ old_target = read_atomic(&rank->vcpu[offset]);
/* Only migrate the vIRQ if the target vCPU has changed */
if ( new_target != old_target )
@@ -162,7 +162,7 @@
virq);
}
- rank->vcpu[offset] = new_target;
+ write_atomic(&rank->vcpu[offset], new_target);
}
}
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/arm/vgic-v3.c xen-4.8.1/xen/arch/arm/vgic-v3.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/arm/vgic-v3.c 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/arm/vgic-v3.c 2017-04-10 14:21:48.000000000 +0100
@@ -107,7 +107,7 @@
/* Get the index in the rank */
offset &= INTERRUPT_RANK_MASK;
- return vcpuid_to_vaffinity(rank->vcpu[offset]);
+ return vcpuid_to_vaffinity(read_atomic(&rank->vcpu[offset]));
}
/*
@@ -135,7 +135,7 @@
offset &= virq & INTERRUPT_RANK_MASK;
new_vcpu = vgic_v3_irouter_to_vcpu(d, irouter);
- old_vcpu = d->vcpu[rank->vcpu[offset]];
+ old_vcpu = d->vcpu[read_atomic(&rank->vcpu[offset])];
/*
* From the spec (see 8.9.13 in IHI 0069A), any write with an
@@ -153,7 +153,7 @@
if ( new_vcpu != old_vcpu )
vgic_migrate_irq(old_vcpu, new_vcpu, virq);
- rank->vcpu[offset] = new_vcpu->vcpu_id;
+ write_atomic(&rank->vcpu[offset], new_vcpu->vcpu_id);
}
static inline bool vgic_reg64_check_access(struct hsr_dabt dabt)
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/arm/vgic.c xen-4.8.1/xen/arch/arm/vgic.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/arm/vgic.c 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/arm/vgic.c 2017-04-10 14:21:48.000000000 +0100
@@ -85,7 +85,7 @@
rank->index = index;
for ( i = 0; i < NR_INTERRUPT_PER_RANK; i++ )
- rank->vcpu[i] = vcpu;
+ write_atomic(&rank->vcpu[i], vcpu);
}
int domain_vgic_register(struct domain *d, int *mmio_count)
@@ -218,28 +218,11 @@
return 0;
}
-/* The function should be called by rank lock taken. */
-static struct vcpu *__vgic_get_target_vcpu(struct vcpu *v, unsigned int virq)
-{
- struct vgic_irq_rank *rank = vgic_rank_irq(v, virq);
-
- ASSERT(spin_is_locked(&rank->lock));
-
- return v->domain->vcpu[rank->vcpu[virq & INTERRUPT_RANK_MASK]];
-}
-
-/* takes the rank lock */
struct vcpu *vgic_get_target_vcpu(struct vcpu *v, unsigned int virq)
{
- struct vcpu *v_target;
struct vgic_irq_rank *rank = vgic_rank_irq(v, virq);
- unsigned long flags;
-
- vgic_lock_rank(v, rank, flags);
- v_target = __vgic_get_target_vcpu(v, virq);
- vgic_unlock_rank(v, rank, flags);
-
- return v_target;
+ int target = read_atomic(&rank->vcpu[virq & INTERRUPT_RANK_MASK]);
+ return v->domain->vcpu[target];
}
static int vgic_get_virq_priority(struct vcpu *v, unsigned int virq)
@@ -326,7 +309,7 @@
while ( (i = find_next_bit(&mask, 32, i)) < 32 ) {
irq = i + (32 * n);
- v_target = __vgic_get_target_vcpu(v, irq);
+ v_target = vgic_get_target_vcpu(v, irq);
p = irq_to_pending(v_target, irq);
clear_bit(GIC_IRQ_GUEST_ENABLED, &p->status);
gic_remove_from_queues(v_target, irq);
@@ -368,7 +351,7 @@
while ( (i = find_next_bit(&mask, 32, i)) < 32 ) {
irq = i + (32 * n);
- v_target = __vgic_get_target_vcpu(v, irq);
+ v_target = vgic_get_target_vcpu(v, irq);
p = irq_to_pending(v_target, irq);
set_bit(GIC_IRQ_GUEST_ENABLED, &p->status);
spin_lock_irqsave(&v_target->arch.vgic.lock, flags);
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/x86/domain.c xen-4.8.1/xen/arch/x86/domain.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/x86/domain.c 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/x86/domain.c 2017-04-10 14:21:48.000000000 +0100
@@ -1315,16 +1315,24 @@
return 0;
}
- if ( seg != x86_seg_tr && !reg->attr.fields.s )
+ if ( seg == x86_seg_tr )
{
- gprintk(XENLOG_ERR,
- "System segment provided for a code or data segment\n");
- return -EINVAL;
- }
+ if ( reg->attr.fields.s )
+ {
+ gprintk(XENLOG_ERR, "Code or data segment provided for TR\n");
+ return -EINVAL;
+ }
- if ( seg == x86_seg_tr && reg->attr.fields.s )
+ if ( reg->attr.fields.type != SYS_DESC_tss_busy )
+ {
+ gprintk(XENLOG_ERR, "Non-32-bit-TSS segment provided for TR\n");
+ return -EINVAL;
+ }
+ }
+ else if ( !reg->attr.fields.s )
{
- gprintk(XENLOG_ERR, "Code or data segment provided for TR\n");
+ gprintk(XENLOG_ERR,
+ "System segment provided for a code or data segment\n");
return -EINVAL;
}
@@ -1387,7 +1395,8 @@
#define SEG(s, r) ({ \
s = (struct segment_register){ .base = (r)->s ## _base, \
.limit = (r)->s ## _limit, \
- .attr.bytes = (r)->s ## _ar }; \
+ .attr.bytes = (r)->s ## _ar | \
+ (x86_seg_##s != x86_seg_tr ? 1 : 2) }; \
check_segment(&s, x86_seg_ ## s); })
rc = SEG(cs, regs);
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/x86/efi/efi-boot.h xen-4.8.1/xen/arch/x86/efi/efi-boot.h
--- xen-4.8.1~pre.2017.01.23/xen/arch/x86/efi/efi-boot.h 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/x86/efi/efi-boot.h 2017-04-10 14:21:48.000000000 +0100
@@ -13,7 +13,11 @@
static multiboot_info_t __initdata mbi = {
.flags = MBI_MODULES | MBI_LOADERNAME
};
-static module_t __initdata mb_modules[3];
+/*
+ * The array size needs to be one larger than the number of modules we
+ * support - see __start_xen().
+ */
+static module_t __initdata mb_modules[5];
static void __init edd_put_string(u8 *dst, size_t n, const char *src)
{
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/x86/hvm/hvm.c xen-4.8.1/xen/arch/x86/hvm/hvm.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/x86/hvm/hvm.c 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/x86/hvm/hvm.c 2017-04-10 14:21:48.000000000 +0100
@@ -387,13 +387,20 @@
}
delta_tsc = guest_tsc - tsc;
- v->arch.hvm_vcpu.msr_tsc_adjust += delta_tsc
- - v->arch.hvm_vcpu.cache_tsc_offset;
v->arch.hvm_vcpu.cache_tsc_offset = delta_tsc;
hvm_funcs.set_tsc_offset(v, v->arch.hvm_vcpu.cache_tsc_offset, at_tsc);
}
+static void hvm_set_guest_tsc_msr(struct vcpu *v, u64 guest_tsc)
+{
+ uint64_t tsc_offset = v->arch.hvm_vcpu.cache_tsc_offset;
+
+ hvm_set_guest_tsc(v, guest_tsc);
+ v->arch.hvm_vcpu.msr_tsc_adjust += v->arch.hvm_vcpu.cache_tsc_offset
+ - tsc_offset;
+}
+
void hvm_set_guest_tsc_adjust(struct vcpu *v, u64 tsc_adjust)
{
v->arch.hvm_vcpu.cache_tsc_offset += tsc_adjust
@@ -3940,7 +3947,7 @@
break;
case MSR_IA32_TSC:
- hvm_set_guest_tsc(v, msr_content);
+ hvm_set_guest_tsc_msr(v, msr_content);
break;
case MSR_IA32_TSC_ADJUST:
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/x86/hvm/mtrr.c xen-4.8.1/xen/arch/x86/hvm/mtrr.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/x86/hvm/mtrr.c 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/x86/hvm/mtrr.c 2017-04-10 14:21:48.000000000 +0100
@@ -776,17 +776,19 @@
if ( v->domain != d )
v = d->vcpu ? d->vcpu[0] : NULL;
- if ( !mfn_valid(mfn_x(mfn)) ||
- rangeset_contains_range(mmio_ro_ranges, mfn_x(mfn),
- mfn_x(mfn) + (1UL << order) - 1) )
- {
- *ipat = 1;
- return MTRR_TYPE_UNCACHABLE;
- }
-
+ /* Mask, not add, for order so it works with INVALID_MFN on unmapping */
if ( rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
- mfn_x(mfn) + (1UL << order) - 1) )
+ mfn_x(mfn) | ((1UL << order) - 1)) )
+ {
+ if ( !order || rangeset_contains_range(mmio_ro_ranges, mfn_x(mfn),
+ mfn_x(mfn) | ((1UL << order) - 1)) )
+ {
+ *ipat = 1;
+ return MTRR_TYPE_UNCACHABLE;
+ }
+ /* Force invalid memory type so resolve_misconfig() will split it */
return -1;
+ }
if ( direct_mmio )
{
@@ -798,6 +800,12 @@
return MTRR_TYPE_WRBACK;
}
+ if ( !mfn_valid(mfn_x(mfn)) )
+ {
+ *ipat = 1;
+ return MTRR_TYPE_UNCACHABLE;
+ }
+
if ( !need_iommu(d) && !cache_flush_permitted(d) )
{
*ipat = 1;
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/x86/hvm/svm/svm.c xen-4.8.1/xen/arch/x86/hvm/svm/svm.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/x86/hvm/svm/svm.c 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/x86/hvm/svm/svm.c 2017-04-10 14:21:48.000000000 +0100
@@ -353,7 +353,7 @@
data->msr_cstar = vmcb->cstar;
data->msr_syscall_mask = vmcb->sfmask;
data->msr_efer = v->arch.hvm_vcpu.guest_efer;
- data->msr_flags = -1ULL;
+ data->msr_flags = 0;
}
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/x86/hvm/svm/vmcb.c xen-4.8.1/xen/arch/x86/hvm/svm/vmcb.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/x86/hvm/svm/vmcb.c 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/x86/hvm/svm/vmcb.c 2017-04-10 14:21:48.000000000 +0100
@@ -72,6 +72,9 @@
struct arch_svm_struct *arch_svm = &v->arch.hvm_svm;
struct vmcb_struct *vmcb = arch_svm->vmcb;
+ /* Build-time check of the size of VMCB AMD structure. */
+ BUILD_BUG_ON(sizeof(*vmcb) != PAGE_SIZE);
+
vmcb->_general1_intercepts =
GENERAL1_INTERCEPT_INTR | GENERAL1_INTERCEPT_NMI |
GENERAL1_INTERCEPT_SMI | GENERAL1_INTERCEPT_INIT |
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/x86/hvm/vmx/vmcs.c xen-4.8.1/xen/arch/x86/hvm/vmx/vmcs.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/x86/hvm/vmx/vmcs.c 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/x86/hvm/vmx/vmcs.c 2017-04-10 14:21:48.000000000 +0100
@@ -552,6 +552,20 @@
local_irq_restore(flags);
}
+void vmx_vmcs_reload(struct vcpu *v)
+{
+ /*
+ * As we may be running with interrupts disabled, we can't acquire
+ * v->arch.hvm_vmx.vmcs_lock here. However, with interrupts disabled
+ * the VMCS can't be taken away from us anymore if we still own it.
+ */
+ ASSERT(v->is_running || !local_irq_is_enabled());
+ if ( v->arch.hvm_vmx.vmcs_pa == this_cpu(current_vmcs) )
+ return;
+
+ vmx_load_vmcs(v);
+}
+
int vmx_cpu_up_prepare(unsigned int cpu)
{
/*
@@ -1090,6 +1104,9 @@
vmx_disable_intercept_for_msr(v, MSR_IA32_BNDCFGS, MSR_TYPE_R | MSR_TYPE_W);
}
+ /* All guest MSR state is dirty. */
+ v->arch.hvm_vmx.msr_state.flags = ((1u << VMX_MSR_COUNT) - 1);
+
/* I/O access bitmap. */
__vmwrite(IO_BITMAP_A, __pa(d->arch.hvm_domain.io_bitmap));
__vmwrite(IO_BITMAP_B, __pa(d->arch.hvm_domain.io_bitmap) + PAGE_SIZE);
@@ -1652,10 +1669,7 @@
bool_t debug_state;
if ( v->arch.hvm_vmx.active_cpu == smp_processor_id() )
- {
- if ( v->arch.hvm_vmx.vmcs_pa != this_cpu(current_vmcs) )
- vmx_load_vmcs(v);
- }
+ vmx_vmcs_reload(v);
else
{
/*
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/x86/hvm/vmx/vmx.c xen-4.8.1/xen/arch/x86/hvm/vmx/vmx.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/x86/hvm/vmx/vmx.c 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/x86/hvm/vmx/vmx.c 2017-04-10 14:21:48.000000000 +0100
@@ -739,13 +739,12 @@
static void vmx_save_cpu_state(struct vcpu *v, struct hvm_hw_cpu *data)
{
struct vmx_msr_state *guest_state = &v->arch.hvm_vmx.msr_state;
- unsigned long guest_flags = guest_state->flags;
data->shadow_gs = v->arch.hvm_vmx.shadow_gs;
data->msr_cstar = v->arch.hvm_vmx.cstar;
/* save msrs */
- data->msr_flags = guest_flags;
+ data->msr_flags = 0;
data->msr_lstar = guest_state->msrs[VMX_INDEX_MSR_LSTAR];
data->msr_star = guest_state->msrs[VMX_INDEX_MSR_STAR];
data->msr_syscall_mask = guest_state->msrs[VMX_INDEX_MSR_SYSCALL_MASK];
@@ -756,7 +755,7 @@
struct vmx_msr_state *guest_state = &v->arch.hvm_vmx.msr_state;
/* restore msrs */
- guest_state->flags = data->msr_flags & 7;
+ guest_state->flags = ((1u << VMX_MSR_COUNT) - 1);
guest_state->msrs[VMX_INDEX_MSR_LSTAR] = data->msr_lstar;
guest_state->msrs[VMX_INDEX_MSR_STAR] = data->msr_star;
guest_state->msrs[VMX_INDEX_MSR_SYSCALL_MASK] = data->msr_syscall_mask;
@@ -896,6 +895,18 @@
if ( unlikely(!this_cpu(vmxon)) )
return;
+ if ( !v->is_running )
+ {
+ /*
+ * When this vCPU isn't marked as running anymore, a remote pCPU's
+ * attempt to pause us (from vmx_vmcs_enter()) won't have a reason
+ * to spin in vcpu_sleep_sync(), and hence that pCPU might have taken
+ * away the VMCS from us. As we're running with interrupts disabled,
+ * we also can't call vmx_vmcs_enter().
+ */
+ vmx_vmcs_reload(v);
+ }
+
vmx_fpu_leave(v);
vmx_save_guest_msrs(v);
vmx_restore_host_msrs();
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/x86/mm/p2m-pt.c xen-4.8.1/xen/arch/x86/mm/p2m-pt.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/x86/mm/p2m-pt.c 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/x86/mm/p2m-pt.c 2017-04-10 14:21:48.000000000 +0100
@@ -452,7 +452,7 @@
mfn |= _PAGE_PSE_PAT >> PAGE_SHIFT;
}
else
- mfn &= ~(_PAGE_PSE_PAT >> PAGE_SHIFT);
+ mfn &= ~((unsigned long)_PAGE_PSE_PAT >> PAGE_SHIFT);
flags |= _PAGE_PSE;
}
e = l1e_from_pfn(mfn, flags);
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/x86/mm/p2m.c xen-4.8.1/xen/arch/x86/mm/p2m.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/x86/mm/p2m.c 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/x86/mm/p2m.c 2017-04-10 14:21:48.000000000 +0100
@@ -2048,7 +2048,8 @@
ASSERT(page_list_empty(&p2m->pod.super));
ASSERT(page_list_empty(&p2m->pod.single));
- if ( p2m->np2m_base == P2M_BASE_EADDR )
+ /* No need to flush if it's already empty */
+ if ( p2m_is_nestedp2m(p2m) && p2m->np2m_base == P2M_BASE_EADDR )
{
p2m_unlock(p2m);
return;
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/x86/setup.c xen-4.8.1/xen/arch/x86/setup.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/x86/setup.c 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/x86/setup.c 2017-04-10 14:21:48.000000000 +0100
@@ -890,6 +890,17 @@
mod[i].reserved = 0;
}
+ if ( efi_enabled )
+ {
+ /*
+ * This needs to remain in sync with xen_in_range() and the
+ * respective reserve_e820_ram() invocation below.
+ */
+ mod[mbi->mods_count].mod_start = PFN_DOWN(mbi->mem_upper);
+ mod[mbi->mods_count].mod_end = __pa(__2M_rwdata_end) -
+ (mbi->mem_upper & PAGE_MASK);
+ }
+
modules_headroom = bzimage_headroom(bootstrap_map(mod), mod->mod_end);
bootstrap_map(NULL);
@@ -925,7 +936,7 @@
1UL << (PAGE_SHIFT + 32)) )
e = min(HYPERVISOR_VIRT_END - DIRECTMAP_VIRT_START,
1UL << (PAGE_SHIFT + 32));
-#define reloc_size ((__pa(&_end) + mask) & ~mask)
+#define reloc_size ((__pa(__2M_rwdata_end) + mask) & ~mask)
/* Is the region suitable for relocating Xen? */
if ( !xen_phys_start && e <= limit )
{
@@ -1070,8 +1081,9 @@
if ( mod[j].reserved )
continue;
- /* Don't overlap with other modules. */
- end = consider_modules(s, e, size, mod, mbi->mods_count, j);
+ /* Don't overlap with other modules (or Xen itself). */
+ end = consider_modules(s, e, size, mod,
+ mbi->mods_count + efi_enabled, j);
if ( highmem_start && end > highmem_start )
continue;
@@ -1096,9 +1108,9 @@
*/
while ( !kexec_crash_area.start )
{
- /* Don't overlap with modules. */
- e = consider_modules(s, e, PAGE_ALIGN(kexec_crash_area.size),
- mod, mbi->mods_count, -1);
+ /* Don't overlap with modules (or Xen itself). */
+ e = consider_modules(s, e, PAGE_ALIGN(kexec_crash_area.size), mod,
+ mbi->mods_count + efi_enabled, -1);
if ( s >= e )
break;
if ( e > kexec_crash_area_limit )
@@ -1122,8 +1134,10 @@
if ( !xen_phys_start )
panic("Not enough memory to relocate Xen.");
- reserve_e820_ram(&boot_e820, efi_enabled ? mbi->mem_upper : __pa(&_start),
- __pa(&_end));
+
+ /* This needs to remain in sync with xen_in_range(). */
+ reserve_e820_ram(&boot_e820, efi_enabled ? mbi->mem_upper : __pa(_stext),
+ __pa(__2M_rwdata_end));
/* Late kexec reservation (dynamic start address). */
kexec_reserve_area(&boot_e820);
@@ -1672,7 +1686,7 @@
paddr_t start, end;
int i;
- enum { region_s3, region_text, region_bss, nr_regions };
+ enum { region_s3, region_ro, region_rw, nr_regions };
static struct {
paddr_t s, e;
} xen_regions[nr_regions] __hwdom_initdata;
@@ -1683,12 +1697,20 @@
/* S3 resume code (and other real mode trampoline code) */
xen_regions[region_s3].s = bootsym_phys(trampoline_start);
xen_regions[region_s3].e = bootsym_phys(trampoline_end);
- /* hypervisor code + data */
- xen_regions[region_text].s =__pa(&_stext);
- xen_regions[region_text].e = __pa(&__init_begin);
- /* bss */
- xen_regions[region_bss].s = __pa(&__bss_start);
- xen_regions[region_bss].e = __pa(&__bss_end);
+
+ /*
+ * This needs to remain in sync with the uses of the same symbols in
+ * - __start_xen() (above)
+ * - is_xen_fixed_mfn()
+ * - tboot_shutdown()
+ */
+
+ /* hypervisor .text + .rodata */
+ xen_regions[region_ro].s = __pa(&_stext);
+ xen_regions[region_ro].e = __pa(&__2M_rodata_end);
+ /* hypervisor .data + .bss */
+ xen_regions[region_rw].s = __pa(&__2M_rwdata_start);
+ xen_regions[region_rw].e = __pa(&__2M_rwdata_end);
}
start = (paddr_t)mfn << PAGE_SHIFT;
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/x86/tboot.c xen-4.8.1/xen/arch/x86/tboot.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/x86/tboot.c 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/x86/tboot.c 2017-04-10 14:21:48.000000000 +0100
@@ -12,6 +12,7 @@
#include <asm/processor.h>
#include <asm/e820.h>
#include <asm/tboot.h>
+#include <asm/setup.h>
#include <crypto/vmac.h>
/* tboot=<physical address of shared page> */
@@ -282,7 +283,7 @@
if ( !mfn_valid(mfn) )
continue;
- if ( (mfn << PAGE_SHIFT) < __pa(&_end) )
+ if ( is_xen_fixed_mfn(mfn) )
continue; /* skip Xen */
if ( (mfn >= PFN_DOWN(g_tboot_shared->tboot_base - 3 * PAGE_SIZE))
&& (mfn < PFN_UP(g_tboot_shared->tboot_base
@@ -363,20 +364,22 @@
if ( shutdown_type == TB_SHUTDOWN_S3 )
{
/*
- * Xen regions for tboot to MAC
+ * Xen regions for tboot to MAC. This needs to remain in sync with
+ * xen_in_range().
*/
g_tboot_shared->num_mac_regions = 3;
/* S3 resume code (and other real mode trampoline code) */
g_tboot_shared->mac_regions[0].start = bootsym_phys(trampoline_start);
g_tboot_shared->mac_regions[0].size = bootsym_phys(trampoline_end) -
bootsym_phys(trampoline_start);
- /* hypervisor code + data */
+ /* hypervisor .text + .rodata */
g_tboot_shared->mac_regions[1].start = (uint64_t)__pa(&_stext);
- g_tboot_shared->mac_regions[1].size = __pa(&__init_begin) -
+ g_tboot_shared->mac_regions[1].size = __pa(&__2M_rodata_end) -
__pa(&_stext);
- /* bss */
- g_tboot_shared->mac_regions[2].start = (uint64_t)__pa(&__bss_start);
- g_tboot_shared->mac_regions[2].size = __pa(&__bss_end) - __pa(&__bss_start);
+ /* hypervisor .data + .bss */
+ g_tboot_shared->mac_regions[2].start = (uint64_t)__pa(&__2M_rwdata_start);
+ g_tboot_shared->mac_regions[2].size = __pa(&__2M_rwdata_end) -
+ __pa(&__2M_rwdata_start);
/*
* MAC domains and other Xen memory
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/x86/x86_emulate/x86_emulate.c xen-4.8.1/xen/arch/x86/x86_emulate/x86_emulate.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/x86/x86_emulate/x86_emulate.c 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/x86/x86_emulate/x86_emulate.c 2017-04-10 14:21:48.000000000 +0100
@@ -331,7 +331,11 @@
#define copy_REX_VEX(ptr, rex, vex) do { \
if ( (vex).opcx != vex_none ) \
+ { \
+ if ( !mode_64bit() ) \
+ vex.reg |= 8; \
ptr[0] = 0xc4, ptr[1] = (vex).raw[0], ptr[2] = (vex).raw[1]; \
+ } \
else if ( mode_64bit() ) \
ptr[1] = rex | REX_PREFIX; \
} while (0)
@@ -870,15 +874,15 @@
put_fpu(&fic); \
} while (0)
-#define emulate_fpu_insn_stub(_bytes...) \
+#define emulate_fpu_insn_stub(bytes...) \
do { \
- uint8_t *buf = get_stub(stub); \
- unsigned int _nr = sizeof((uint8_t[]){ _bytes }); \
- struct fpu_insn_ctxt fic = { .insn_bytes = _nr }; \
- memcpy(buf, ((uint8_t[]){ _bytes, 0xc3 }), _nr + 1); \
- get_fpu(X86EMUL_FPU_fpu, &fic); \
- stub.func(); \
- put_fpu(&fic); \
+ unsigned int nr_ = sizeof((uint8_t[]){ bytes }); \
+ struct fpu_insn_ctxt fic_ = { .insn_bytes = nr_ }; \
+ memcpy(get_stub(stub), ((uint8_t[]){ bytes, 0xc3 }), nr_ + 1); \
+ get_fpu(X86EMUL_FPU_fpu, &fic_); \
+ asm volatile ( "call *%[stub]" : "+m" (fic_) : \
+ [stub] "rm" (stub.func) ); \
+ put_fpu(&fic_); \
put_stub(stub); \
} while (0)
@@ -893,7 +897,7 @@
"call *%[func];" \
_POST_EFLAGS("[eflags]", "[mask]", "[tmp]") \
: [eflags] "+g" (_regs.eflags), \
- [tmp] "=&r" (tmp_) \
+ [tmp] "=&r" (tmp_), "+m" (fic_) \
: [func] "rm" (stub.func), \
[mask] "i" (EFLG_ZF|EFLG_PF|EFLG_CF) ); \
put_fpu(&fic_); \
@@ -1356,6 +1360,11 @@
}
memset(sreg, 0, sizeof(*sreg));
sreg->sel = sel;
+
+ /* Since CPL == SS.DPL, we need to put back DPL. */
+ if ( seg == x86_seg_ss )
+ sreg->attr.fields.dpl = sel;
+
return X86EMUL_OKAY;
}
@@ -2017,16 +2026,21 @@
default:
BUG(); /* Shouldn't be possible. */
case 2:
- if ( in_realmode(ctxt, ops) || (state->regs->eflags & EFLG_VM) )
+ if ( state->regs->eflags & EFLG_VM )
break;
/* fall through */
case 4:
- if ( modrm_mod != 3 )
+ if ( modrm_mod != 3 || in_realmode(ctxt, ops) )
break;
/* fall through */
case 8:
/* VEX / XOP / EVEX */
generate_exception_if(rex_prefix || vex.pfx, EXC_UD, -1);
+ /*
+ * With operand size override disallowed (see above), op_bytes
+ * should not have changed from its default.
+ */
+ ASSERT(op_bytes == def_op_bytes);
vex.raw[0] = modrm;
if ( b == 0xc5 )
@@ -2053,6 +2067,12 @@
op_bytes = 8;
}
}
+ else
+ {
+ /* Operand size fixed at 4 (no override via W bit). */
+ op_bytes = 4;
+ vex.b = 1;
+ }
switch ( b )
{
case 0x62:
@@ -2071,7 +2091,7 @@
break;
}
}
- if ( mode_64bit() && !vex.r )
+ if ( !vex.r )
rex_prefix |= REX_R;
ext = vex.opcx;
@@ -2113,12 +2133,21 @@
opcode |= b | MASK_INSR(vex.pfx, X86EMUL_OPC_PFX_MASK);
+ if ( !(d & ModRM) )
+ {
+ modrm_reg = modrm_rm = modrm_mod = modrm = 0;
+ break;
+ }
+
modrm = insn_fetch_type(uint8_t);
modrm_mod = (modrm & 0xc0) >> 6;
break;
}
+ }
+ if ( d & ModRM )
+ {
modrm_reg = ((rex_prefix & 4) << 1) | ((modrm & 0x38) >> 3);
modrm_rm = modrm & 0x07;
@@ -2182,6 +2211,17 @@
break;
}
break;
+ case 0x20: /* mov cr,reg */
+ case 0x21: /* mov dr,reg */
+ case 0x22: /* mov reg,cr */
+ case 0x23: /* mov reg,dr */
+ /*
+ * Mov to/from cr/dr ignore the encoding of Mod, and behave as
+ * if they were encoded as reg/reg instructions. No futher
+ * disp/SIB bytes are fetched.
+ */
+ modrm_mod = 3;
+ break;
}
break;
@@ -4730,7 +4770,7 @@
case X86EMUL_OPC(0x0f, 0x21): /* mov dr,reg */
case X86EMUL_OPC(0x0f, 0x22): /* mov reg,cr */
case X86EMUL_OPC(0x0f, 0x23): /* mov reg,dr */
- generate_exception_if(ea.type != OP_REG, EXC_UD, -1);
+ ASSERT(ea.type == OP_REG); /* Early operand adjustment ensures this. */
generate_exception_if(!mode_ring0(), EXC_GP, 0);
modrm_reg |= lock_prefix << 3;
if ( b & 2 )
@@ -5050,6 +5090,7 @@
}
case X86EMUL_OPC(0x0f, 0xa3): bt: /* bt */
+ generate_exception_if(lock_prefix, EXC_UD, 0);
emulate_2op_SrcV_nobyte("bt", src, dst, _regs.eflags);
dst.type = OP_NONE;
break;
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/x86/x86_emulate/x86_emulate.h xen-4.8.1/xen/arch/x86/x86_emulate/x86_emulate.h
--- xen-4.8.1~pre.2017.01.23/xen/arch/x86/x86_emulate/x86_emulate.h 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/x86/x86_emulate/x86_emulate.h 2017-04-10 14:21:48.000000000 +0100
@@ -71,7 +71,7 @@
* Attribute for segment selector. This is a copy of bit 40:47 & 52:55 of the
* segment descriptor. It happens to match the format of an AMD SVM VMCB.
*/
-typedef union __attribute__((__packed__)) segment_attributes {
+typedef union segment_attributes {
uint16_t bytes;
struct
{
@@ -91,7 +91,7 @@
* Full state of a segment register (visible and hidden portions).
* Again, this happens to match the format of an AMD SVM VMCB.
*/
-struct __attribute__((__packed__)) segment_register {
+struct segment_register {
uint16_t sel;
segment_attributes_t attr;
uint32_t limit;
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/x86/xen.lds.S xen-4.8.1/xen/arch/x86/xen.lds.S
--- xen-4.8.1~pre.2017.01.23/xen/arch/x86/xen.lds.S 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/x86/xen.lds.S 2017-04-10 14:21:48.000000000 +0100
@@ -299,7 +299,7 @@
}
ASSERT(__image_base__ > XEN_VIRT_START ||
- _end <= XEN_VIRT_END - NR_CPUS * PAGE_SIZE,
+ __2M_rwdata_end <= XEN_VIRT_END - NR_CPUS * PAGE_SIZE,
"Xen image overlaps stubs area")
#ifdef CONFIG_KEXEC
diff -Nru xen-4.8.1~pre.2017.01.23/xen/arch/x86/xstate.c xen-4.8.1/xen/arch/x86/xstate.c
--- xen-4.8.1~pre.2017.01.23/xen/arch/x86/xstate.c 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/arch/x86/xstate.c 2017-04-10 14:21:48.000000000 +0100
@@ -92,7 +92,7 @@
if ( bsp )
{
- xstate_features = fls(xfeature_mask);
+ xstate_features = flsl(xfeature_mask);
xstate_offsets = xzalloc_array(unsigned int, xstate_features);
if ( !xstate_offsets )
return -ENOMEM;
diff -Nru xen-4.8.1~pre.2017.01.23/xen/common/memory.c xen-4.8.1/xen/common/memory.c
--- xen-4.8.1~pre.2017.01.23/xen/common/memory.c 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/common/memory.c 2017-04-10 14:21:48.000000000 +0100
@@ -437,8 +437,8 @@
goto fail_early;
}
- if ( !guest_handle_okay(exch.in.extent_start, exch.in.nr_extents) ||
- !guest_handle_okay(exch.out.extent_start, exch.out.nr_extents) )
+ if ( !guest_handle_subrange_okay(exch.in.extent_start, exch.nr_exchanged,
+ exch.in.nr_extents - 1) )
{
rc = -EFAULT;
goto fail_early;
@@ -448,11 +448,27 @@
{
in_chunk_order = exch.out.extent_order - exch.in.extent_order;
out_chunk_order = 0;
+
+ if ( !guest_handle_subrange_okay(exch.out.extent_start,
+ exch.nr_exchanged >> in_chunk_order,
+ exch.out.nr_extents - 1) )
+ {
+ rc = -EFAULT;
+ goto fail_early;
+ }
}
else
{
in_chunk_order = 0;
out_chunk_order = exch.in.extent_order - exch.out.extent_order;
+
+ if ( !guest_handle_subrange_okay(exch.out.extent_start,
+ exch.nr_exchanged << out_chunk_order,
+ exch.out.nr_extents - 1) )
+ {
+ rc = -EFAULT;
+ goto fail_early;
+ }
}
d = rcu_lock_domain_by_any_id(exch.in.domid);
diff -Nru xen-4.8.1~pre.2017.01.23/xen/common/sched_credit2.c xen-4.8.1/xen/common/sched_credit2.c
--- xen-4.8.1~pre.2017.01.23/xen/common/sched_credit2.c 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/common/sched_credit2.c 2017-04-10 14:21:48.000000000 +0100
@@ -491,12 +491,15 @@
}
/*
- * Clear the bits of all the siblings of cpu from mask.
+ * Clear the bits of all the siblings of cpu from mask (if necessary).
*/
static inline
void smt_idle_mask_clear(unsigned int cpu, cpumask_t *mask)
{
- cpumask_andnot(mask, mask, per_cpu(cpu_sibling_mask, cpu));
+ const cpumask_t *cpu_siblings = per_cpu(cpu_sibling_mask, cpu);
+
+ if ( cpumask_subset(cpu_siblings, mask) )
+ cpumask_andnot(mask, mask, per_cpu(cpu_sibling_mask, cpu));
}
/*
@@ -510,24 +513,26 @@
*/
static int get_fallback_cpu(struct csched2_vcpu *svc)
{
- int cpu;
+ struct vcpu *v = svc->vcpu;
+ int cpu = v->processor;
+
+ cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+ cpupool_domain_cpumask(v->domain));
- if ( likely(cpumask_test_cpu(svc->vcpu->processor,
- svc->vcpu->cpu_hard_affinity)) )
- return svc->vcpu->processor;
-
- cpumask_and(cpumask_scratch, svc->vcpu->cpu_hard_affinity,
- &svc->rqd->active);
- cpu = cpumask_first(cpumask_scratch);
- if ( likely(cpu < nr_cpu_ids) )
+ if ( likely(cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu))) )
return cpu;
- cpumask_and(cpumask_scratch, svc->vcpu->cpu_hard_affinity,
- cpupool_domain_cpumask(svc->vcpu->domain));
+ if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu),
+ &svc->rqd->active)) )
+ {
+ cpumask_and(cpumask_scratch_cpu(cpu), &svc->rqd->active,
+ cpumask_scratch_cpu(cpu));
+ return cpumask_first(cpumask_scratch_cpu(cpu));
+ }
- ASSERT(!cpumask_empty(cpumask_scratch));
+ ASSERT(!cpumask_empty(cpumask_scratch_cpu(cpu)));
- return cpumask_first(cpumask_scratch);
+ return cpumask_first(cpumask_scratch_cpu(cpu));
}
/*
@@ -898,6 +903,14 @@
void burn_credits(struct csched2_runqueue_data *rqd, struct csched2_vcpu *, s_time_t);
+static inline void
+tickle_cpu(unsigned int cpu, struct csched2_runqueue_data *rqd)
+{
+ __cpumask_set_cpu(cpu, &rqd->tickled);
+ smt_idle_mask_clear(cpu, &rqd->smt_idle);
+ cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
+}
+
/*
* Check what processor it is best to 'wake', for picking up a vcpu that has
* just been put (back) in the runqueue. Logic is as follows:
@@ -941,6 +954,9 @@
(unsigned char *)&d);
}
+ cpumask_and(cpumask_scratch_cpu(cpu), new->vcpu->cpu_hard_affinity,
+ cpupool_domain_cpumask(new->vcpu->domain));
+
/*
* First of all, consider idle cpus, checking if we can just
* re-use the pcpu where we were running before.
@@ -953,7 +969,7 @@
cpumask_andnot(&mask, &rqd->idle, &rqd->smt_idle);
else
cpumask_copy(&mask, &rqd->smt_idle);
- cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
+ cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu));
i = cpumask_test_or_cycle(cpu, &mask);
if ( i < nr_cpu_ids )
{
@@ -968,7 +984,7 @@
* gone through the scheduler yet.
*/
cpumask_andnot(&mask, &rqd->idle, &rqd->tickled);
- cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
+ cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu));
i = cpumask_test_or_cycle(cpu, &mask);
if ( i < nr_cpu_ids )
{
@@ -984,7 +1000,7 @@
*/
cpumask_andnot(&mask, &rqd->active, &rqd->idle);
cpumask_andnot(&mask, &mask, &rqd->tickled);
- cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
+ cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu));
if ( cpumask_test_cpu(cpu, &mask) )
{
cur = CSCHED2_VCPU(curr_on_cpu(cpu));
@@ -1062,9 +1078,8 @@
sizeof(d),
(unsigned char *)&d);
}
- __cpumask_set_cpu(ipid, &rqd->tickled);
- smt_idle_mask_clear(ipid, &rqd->smt_idle);
- cpu_raise_softirq(ipid, SCHEDULE_SOFTIRQ);
+
+ tickle_cpu(ipid, rqd);
if ( unlikely(new->tickled_cpu != -1) )
SCHED_STAT_CRANK(tickled_cpu_overwritten);
@@ -1104,18 +1119,28 @@
list_for_each( iter, &rqd->svc )
{
+ unsigned int svc_cpu;
struct csched2_vcpu * svc;
int start_credit;
svc = list_entry(iter, struct csched2_vcpu, rqd_elem);
+ svc_cpu = svc->vcpu->processor;
ASSERT(!is_idle_vcpu(svc->vcpu));
ASSERT(svc->rqd == rqd);
+ /*
+ * If svc is running, it is our responsibility to make sure, here,
+ * that the credit it has spent so far get accounted.
+ */
+ if ( svc->vcpu == curr_on_cpu(svc_cpu) )
+ burn_credits(rqd, svc, now);
+
start_credit = svc->credit;
- /* And add INIT * m, avoiding integer multiplication in the
- * common case. */
+ /*
+ * Add INIT * m, avoiding integer multiplication in the common case.
+ */
if ( likely(m==1) )
svc->credit += CSCHED2_CREDIT_INIT;
else
@@ -1378,7 +1403,9 @@
SCHED_STAT_CRANK(vcpu_sleep);
if ( curr_on_cpu(vc->processor) == vc )
- cpu_raise_softirq(vc->processor, SCHEDULE_SOFTIRQ);
+ {
+ tickle_cpu(vc->processor, svc->rqd);
+ }
else if ( __vcpu_on_runq(svc) )
{
ASSERT(svc->rqd == RQD(ops, vc->processor));
@@ -1492,7 +1519,7 @@
csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
{
struct csched2_private *prv = CSCHED2_PRIV(ops);
- int i, min_rqi = -1, new_cpu;
+ int i, min_rqi = -1, new_cpu, cpu = vc->processor;
struct csched2_vcpu *svc = CSCHED2_VCPU(vc);
s_time_t min_avgload = MAX_LOAD;
@@ -1512,7 +1539,7 @@
* just grab the prv lock. Instead, we'll have to trylock, and
* do something else reasonable if we fail.
*/
- ASSERT(spin_is_locked(per_cpu(schedule_data, vc->processor).schedule_lock));
+ ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock));
if ( !read_trylock(&prv->lock) )
{
@@ -1526,6 +1553,9 @@
goto out;
}
+ cpumask_and(cpumask_scratch_cpu(cpu), vc->cpu_hard_affinity,
+ cpupool_domain_cpumask(vc->domain));
+
/*
* First check to see if we're here because someone else suggested a place
* for us to move.
@@ -1537,13 +1567,13 @@
printk(XENLOG_WARNING "%s: target runqueue disappeared!\n",
__func__);
}
- else
+ else if ( cpumask_intersects(cpumask_scratch_cpu(cpu),
+ &svc->migrate_rqd->active) )
{
- cpumask_and(cpumask_scratch, vc->cpu_hard_affinity,
+ cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
&svc->migrate_rqd->active);
- new_cpu = cpumask_any(cpumask_scratch);
- if ( new_cpu < nr_cpu_ids )
- goto out_up;
+ new_cpu = cpumask_any(cpumask_scratch_cpu(cpu));
+ goto out_up;
}
/* Fall-through to normal cpu pick */
}
@@ -1571,12 +1601,12 @@
*/
if ( rqd == svc->rqd )
{
- if ( cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) )
+ if ( cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active) )
rqd_avgload = max_t(s_time_t, rqd->b_avgload - svc->avgload, 0);
}
else if ( spin_trylock(&rqd->lock) )
{
- if ( cpumask_intersects(vc->cpu_hard_affinity, &rqd->active) )
+ if ( cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active) )
rqd_avgload = rqd->b_avgload;
spin_unlock(&rqd->lock);
@@ -1598,9 +1628,9 @@
goto out_up;
}
- cpumask_and(cpumask_scratch, vc->cpu_hard_affinity,
+ cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
&prv->rqd[min_rqi].active);
- new_cpu = cpumask_any(cpumask_scratch);
+ new_cpu = cpumask_any(cpumask_scratch_cpu(cpu));
BUG_ON(new_cpu >= nr_cpu_ids);
out_up:
@@ -1675,6 +1705,8 @@
struct csched2_runqueue_data *trqd,
s_time_t now)
{
+ int cpu = svc->vcpu->processor;
+
if ( unlikely(tb_init_done) )
{
struct {
@@ -1696,8 +1728,8 @@
svc->migrate_rqd = trqd;
__set_bit(_VPF_migrating, &svc->vcpu->pause_flags);
__set_bit(__CSFLAG_runq_migrate_request, &svc->flags);
- cpu_raise_softirq(svc->vcpu->processor, SCHEDULE_SOFTIRQ);
SCHED_STAT_CRANK(migrate_requested);
+ tickle_cpu(cpu, svc->rqd);
}
else
{
@@ -1711,9 +1743,11 @@
}
__runq_deassign(svc);
- cpumask_and(cpumask_scratch, svc->vcpu->cpu_hard_affinity,
+ cpumask_and(cpumask_scratch_cpu(cpu), svc->vcpu->cpu_hard_affinity,
+ cpupool_domain_cpumask(svc->vcpu->domain));
+ cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
&trqd->active);
- svc->vcpu->processor = cpumask_any(cpumask_scratch);
+ svc->vcpu->processor = cpumask_any(cpumask_scratch_cpu(cpu));
ASSERT(svc->vcpu->processor < nr_cpu_ids);
__runq_assign(svc, trqd);
@@ -1737,8 +1771,14 @@
static bool_t vcpu_is_migrateable(struct csched2_vcpu *svc,
struct csched2_runqueue_data *rqd)
{
+ struct vcpu *v = svc->vcpu;
+ int cpu = svc->vcpu->processor;
+
+ cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+ cpupool_domain_cpumask(v->domain));
+
return !(svc->flags & CSFLAG_runq_migrate_request) &&
- cpumask_intersects(svc->vcpu->cpu_hard_affinity, &rqd->active);
+ cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active);
}
static void balance_load(const struct scheduler *ops, int cpu, s_time_t now)
@@ -1928,10 +1968,40 @@
csched2_vcpu_migrate(
const struct scheduler *ops, struct vcpu *vc, unsigned int new_cpu)
{
+ struct domain *d = vc->domain;
struct csched2_vcpu * const svc = CSCHED2_VCPU(vc);
struct csched2_runqueue_data *trqd;
+ s_time_t now = NOW();
+
+ /*
+ * Being passed a target pCPU which is outside of our cpupool is only
+ * valid if we are shutting down (or doing ACPI suspend), and we are
+ * moving everyone to BSP, no matter whether or not BSP is inside our
+ * cpupool.
+ *
+ * And since there indeed is the chance that it is not part of it, all
+ * we must do is remove _and_ unassign the vCPU from any runqueue, as
+ * well as updating v->processor with the target, so that the suspend
+ * process can continue.
+ *
+ * It will then be during resume that a new, meaningful, value for
+ * v->processor will be chosen, and during actual domain unpause that
+ * the vCPU will be assigned to and added to the proper runqueue.
+ */
+ if ( unlikely(!cpumask_test_cpu(new_cpu, cpupool_domain_cpumask(d))) )
+ {
+ ASSERT(system_state == SYS_STATE_suspend);
+ if ( __vcpu_on_runq(svc) )
+ {
+ __runq_remove(svc);
+ update_load(ops, svc->rqd, NULL, -1, now);
+ }
+ __runq_deassign(svc);
+ vc->processor = new_cpu;
+ return;
+ }
- /* Check if new_cpu is valid */
+ /* If here, new_cpu must be a valid Credit2 pCPU, and in our affinity. */
ASSERT(cpumask_test_cpu(new_cpu, &CSCHED2_PRIV(ops)->initialized));
ASSERT(cpumask_test_cpu(new_cpu, vc->cpu_hard_affinity));
@@ -1946,7 +2016,7 @@
* pointing to a pcpu where we can't run any longer.
*/
if ( trqd != svc->rqd )
- migrate(ops, svc, trqd, NOW());
+ migrate(ops, svc, trqd, now);
else
vc->processor = new_cpu;
}
diff -Nru xen-4.8.1~pre.2017.01.23/xen/common/schedule.c xen-4.8.1/xen/common/schedule.c
--- xen-4.8.1~pre.2017.01.23/xen/common/schedule.c 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/common/schedule.c 2017-04-10 14:21:48.000000000 +0100
@@ -84,7 +84,27 @@
: (typeof((opsptr)->fn(opsptr, ##__VA_ARGS__)))0 )
#define DOM2OP(_d) (((_d)->cpupool == NULL) ? &ops : ((_d)->cpupool->sched))
-#define VCPU2OP(_v) (DOM2OP((_v)->domain))
+static inline struct scheduler *VCPU2OP(const struct vcpu *v)
+{
+ struct domain *d = v->domain;
+
+ if ( likely(d->cpupool != NULL) )
+ return d->cpupool->sched;
+
+ /*
+ * If d->cpupool is NULL, this is a vCPU of the idle domain. And this
+ * case is special because the idle domain does not really belong to
+ * a cpupool and, hence, doesn't really have a scheduler). In fact, its
+ * vCPUs (may) run on pCPUs which are in different pools, with different
+ * schedulers.
+ *
+ * What we want, in this case, is the scheduler of the pCPU where this
+ * particular idle vCPU is running. And, since v->processor never changes
+ * for idle vCPUs, it is safe to use it, with no locks, to figure that out.
+ */
+ ASSERT(is_idle_domain(d));
+ return per_cpu(scheduler, v->processor);
+}
#define VCPU2ONLINE(_v) cpupool_domain_cpumask((_v)->domain)
static inline void trace_runstate_change(struct vcpu *v, int new_state)
@@ -633,8 +653,11 @@
void restore_vcpu_affinity(struct domain *d)
{
+ unsigned int cpu = smp_processor_id();
struct vcpu *v;
+ ASSERT(system_state == SYS_STATE_resume);
+
for_each_vcpu ( d, v )
{
spinlock_t *lock = vcpu_schedule_lock_irq(v);
@@ -643,18 +666,34 @@
{
cpumask_copy(v->cpu_hard_affinity, v->cpu_hard_affinity_saved);
v->affinity_broken = 0;
+
}
- if ( v->processor == smp_processor_id() )
+ /*
+ * During suspend (in cpu_disable_scheduler()), we moved every vCPU
+ * to BSP (which, as of now, is pCPU 0), as a temporary measure to
+ * allow the nonboot processors to have their data structure freed
+ * and go to sleep. But nothing guardantees that the BSP is a valid
+ * pCPU for a particular domain.
+ *
+ * Therefore, here, before actually unpausing the domains, we should
+ * set v->processor of each of their vCPUs to something that will
+ * make sense for the scheduler of the cpupool in which they are in.
+ */
+ cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
+ cpupool_domain_cpumask(v->domain));
+ v->processor = cpumask_any(cpumask_scratch_cpu(cpu));
+
+ if ( v->processor == cpu )
{
set_bit(_VPF_migrating, &v->pause_flags);
- vcpu_schedule_unlock_irq(lock, v);
+ spin_unlock_irq(lock);;
vcpu_sleep_nosync(v);
vcpu_migrate(v);
}
else
{
- vcpu_schedule_unlock_irq(lock, v);
+ spin_unlock_irq(lock);
}
}
diff -Nru xen-4.8.1~pre.2017.01.23/xen/drivers/passthrough/iommu.c xen-4.8.1/xen/drivers/passthrough/iommu.c
--- xen-4.8.1~pre.2017.01.23/xen/drivers/passthrough/iommu.c 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/drivers/passthrough/iommu.c 2017-04-10 14:21:48.000000000 +0100
@@ -244,8 +244,7 @@
if ( !iommu_enabled || !dom_iommu(d)->platform_ops )
return;
- if ( need_iommu(d) )
- iommu_teardown(d);
+ iommu_teardown(d);
arch_iommu_domain_destroy(d);
}
diff -Nru xen-4.8.1~pre.2017.01.23/xen/include/asm-arm/config.h xen-4.8.1/xen/include/asm-arm/config.h
--- xen-4.8.1~pre.2017.01.23/xen/include/asm-arm/config.h 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/include/asm-arm/config.h 2017-04-10 14:21:48.000000000 +0100
@@ -46,6 +46,8 @@
#define MAX_VIRT_CPUS 8
#endif
+#define INVALID_VCPU_ID MAX_VIRT_CPUS
+
#define asmlinkage /* Nothing needed */
#define __LINUX_ARM_ARCH__ 7
diff -Nru xen-4.8.1~pre.2017.01.23/xen/include/asm-arm/cpufeature.h xen-4.8.1/xen/include/asm-arm/cpufeature.h
--- xen-4.8.1~pre.2017.01.23/xen/include/asm-arm/cpufeature.h 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/include/asm-arm/cpufeature.h 2017-04-10 14:21:48.000000000 +0100
@@ -24,7 +24,7 @@
#define cpu_has_arm (boot_cpu_feature32(arm) == 1)
#define cpu_has_thumb (boot_cpu_feature32(thumb) >= 1)
#define cpu_has_thumb2 (boot_cpu_feature32(thumb) >= 3)
-#define cpu_has_jazelle (boot_cpu_feature32(jazelle) >= 0)
+#define cpu_has_jazelle (boot_cpu_feature32(jazelle) > 0)
#define cpu_has_thumbee (boot_cpu_feature32(thumbee) == 1)
#define cpu_has_aarch32 (cpu_has_arm || cpu_has_thumb)
diff -Nru xen-4.8.1~pre.2017.01.23/xen/include/asm-arm/p2m.h xen-4.8.1/xen/include/asm-arm/p2m.h
--- xen-4.8.1~pre.2017.01.23/xen/include/asm-arm/p2m.h 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/include/asm-arm/p2m.h 2017-04-10 14:21:48.000000000 +0100
@@ -95,6 +95,9 @@
/* back pointer to domain */
struct domain *domain;
+
+ /* Keeping track on which CPU this p2m was used and for which vCPU */
+ uint8_t last_vcpu_ran[NR_CPUS];
};
/*
diff -Nru xen-4.8.1~pre.2017.01.23/xen/include/asm-arm/page.h xen-4.8.1/xen/include/asm-arm/page.h
--- xen-4.8.1~pre.2017.01.23/xen/include/asm-arm/page.h 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/include/asm-arm/page.h 2017-04-10 14:21:48.000000000 +0100
@@ -292,24 +292,20 @@
static inline int invalidate_dcache_va_range(const void *p, unsigned long size)
{
- size_t off;
const void *end = p + size;
+ size_t cacheline_mask = cacheline_bytes - 1;
dsb(sy); /* So the CPU issues all writes to the range */
- off = (unsigned long)p % cacheline_bytes;
- if ( off )
+ if ( (uintptr_t)p & cacheline_mask )
{
- p -= off;
+ p = (void *)((uintptr_t)p & ~cacheline_mask);
asm volatile (__clean_and_invalidate_dcache_one(0) : : "r" (p));
p += cacheline_bytes;
- size -= cacheline_bytes - off;
}
- off = (unsigned long)end % cacheline_bytes;
- if ( off )
+ if ( (uintptr_t)end & cacheline_mask )
{
- end -= off;
- size -= off;
+ end = (void *)((uintptr_t)end & ~cacheline_mask);
asm volatile (__clean_and_invalidate_dcache_one(0) : : "r" (end));
}
@@ -323,9 +319,10 @@
static inline int clean_dcache_va_range(const void *p, unsigned long size)
{
- const void *end;
+ const void *end = p + size;
dsb(sy); /* So the CPU issues all writes to the range */
- for ( end = p + size; p < end; p += cacheline_bytes )
+ p = (void *)((uintptr_t)p & ~(cacheline_bytes - 1));
+ for ( ; p < end; p += cacheline_bytes )
asm volatile (__clean_dcache_one(0) : : "r" (p));
dsb(sy); /* So we know the flushes happen before continuing */
/* ARM callers assume that dcache_* functions cannot fail. */
@@ -335,9 +332,10 @@
static inline int clean_and_invalidate_dcache_va_range
(const void *p, unsigned long size)
{
- const void *end;
+ const void *end = p + size;
dsb(sy); /* So the CPU issues all writes to the range */
- for ( end = p + size; p < end; p += cacheline_bytes )
+ p = (void *)((uintptr_t)p & ~(cacheline_bytes - 1));
+ for ( ; p < end; p += cacheline_bytes )
asm volatile (__clean_and_invalidate_dcache_one(0) : : "r" (p));
dsb(sy); /* So we know the flushes happen before continuing */
/* ARM callers assume that dcache_* functions cannot fail. */
diff -Nru xen-4.8.1~pre.2017.01.23/xen/include/asm-arm/sysregs.h xen-4.8.1/xen/include/asm-arm/sysregs.h
--- xen-4.8.1~pre.2017.01.23/xen/include/asm-arm/sysregs.h 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/include/asm-arm/sysregs.h 2017-04-10 14:21:48.000000000 +0100
@@ -90,6 +90,7 @@
#define HSR_SYSREG_ICC_SGI1R_EL1 HSR_SYSREG(3,0,c12,c11,5)
#define HSR_SYSREG_ICC_ASGI1R_EL1 HSR_SYSREG(3,1,c12,c11,6)
#define HSR_SYSREG_ICC_SGI0R_EL1 HSR_SYSREG(3,2,c12,c11,7)
+#define HSR_SYSREG_ICC_SRE_EL1 HSR_SYSREG(3,0,c12,c12,5)
#define HSR_SYSREG_CONTEXTIDR_EL1 HSR_SYSREG(3,0,c13,c0,1)
#define HSR_SYSREG_PMCR_EL0 HSR_SYSREG(3,3,c9,c12,0)
diff -Nru xen-4.8.1~pre.2017.01.23/xen/include/asm-arm/vgic.h xen-4.8.1/xen/include/asm-arm/vgic.h
--- xen-4.8.1~pre.2017.01.23/xen/include/asm-arm/vgic.h 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/include/asm-arm/vgic.h 2017-04-10 14:21:48.000000000 +0100
@@ -69,7 +69,7 @@
unsigned long status;
struct irq_desc *desc; /* only set it the irq corresponds to a physical irq */
unsigned int irq;
-#define GIC_INVALID_LR ~(uint8_t)0
+#define GIC_INVALID_LR (uint8_t)~0
uint8_t lr;
uint8_t priority;
/* inflight is used to append instances of pending_irq to
@@ -107,7 +107,9 @@
/*
* It's more convenient to store a target VCPU per vIRQ
- * than the register ITARGETSR/IROUTER itself
+ * than the register ITARGETSR/IROUTER itself.
+ * Use atomic operations to read/write the vcpu fields to avoid
+ * taking the rank lock.
*/
uint8_t vcpu[32];
};
diff -Nru xen-4.8.1~pre.2017.01.23/xen/include/asm-x86/hvm/svm/vmcb.h xen-4.8.1/xen/include/asm-x86/hvm/svm/vmcb.h
--- xen-4.8.1~pre.2017.01.23/xen/include/asm-x86/hvm/svm/vmcb.h 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/include/asm-x86/hvm/svm/vmcb.h 2017-04-10 14:21:48.000000000 +0100
@@ -308,7 +308,7 @@
/* Definition of segment state is borrowed by the generic HVM code. */
typedef struct segment_register svm_segment_register_t;
-typedef union __packed
+typedef union
{
u64 bytes;
struct
@@ -322,7 +322,7 @@
} fields;
} eventinj_t;
-typedef union __packed
+typedef union
{
u64 bytes;
struct
@@ -340,7 +340,7 @@
} fields;
} vintr_t;
-typedef union __packed
+typedef union
{
u64 bytes;
struct
@@ -357,7 +357,7 @@
} fields;
} ioio_info_t;
-typedef union __packed
+typedef union
{
u64 bytes;
struct
@@ -366,7 +366,7 @@
} fields;
} lbrctrl_t;
-typedef union __packed
+typedef union
{
uint32_t bytes;
struct
@@ -401,7 +401,7 @@
#define IOPM_SIZE (12 * 1024)
#define MSRPM_SIZE (8 * 1024)
-struct __packed vmcb_struct {
+struct vmcb_struct {
u32 _cr_intercepts; /* offset 0x00 - cleanbit 0 */
u32 _dr_intercepts; /* offset 0x04 - cleanbit 0 */
u32 _exception_intercepts; /* offset 0x08 - cleanbit 0 */
diff -Nru xen-4.8.1~pre.2017.01.23/xen/include/asm-x86/hvm/vmx/vmcs.h xen-4.8.1/xen/include/asm-x86/hvm/vmx/vmcs.h
--- xen-4.8.1~pre.2017.01.23/xen/include/asm-x86/hvm/vmx/vmcs.h 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/include/asm-x86/hvm/vmx/vmcs.h 2017-04-10 14:21:48.000000000 +0100
@@ -238,6 +238,7 @@
void vmx_vmcs_enter(struct vcpu *v);
bool_t __must_check vmx_vmcs_try_enter(struct vcpu *v);
void vmx_vmcs_exit(struct vcpu *v);
+void vmx_vmcs_reload(struct vcpu *v);
#define CPU_BASED_VIRTUAL_INTR_PENDING 0x00000004
#define CPU_BASED_USE_TSC_OFFSETING 0x00000008
diff -Nru xen-4.8.1~pre.2017.01.23/xen/include/asm-x86/mm.h xen-4.8.1/xen/include/asm-x86/mm.h
--- xen-4.8.1~pre.2017.01.23/xen/include/asm-x86/mm.h 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/include/asm-x86/mm.h 2017-04-10 14:21:48.000000000 +0100
@@ -253,8 +253,8 @@
#define is_xen_heap_mfn(mfn) \
(__mfn_valid(mfn) && is_xen_heap_page(__mfn_to_page(mfn)))
#define is_xen_fixed_mfn(mfn) \
- ((((mfn) << PAGE_SHIFT) >= __pa(&_start)) && \
- (((mfn) << PAGE_SHIFT) <= __pa(&_end)))
+ ((((mfn) << PAGE_SHIFT) >= __pa(&_stext)) && \
+ (((mfn) << PAGE_SHIFT) <= __pa(&__2M_rwdata_end)))
#define PRtype_info "016lx"/* should only be used for printk's */
diff -Nru xen-4.8.1~pre.2017.01.23/xen/include/asm-x86/x86_64/uaccess.h xen-4.8.1/xen/include/asm-x86/x86_64/uaccess.h
--- xen-4.8.1~pre.2017.01.23/xen/include/asm-x86/x86_64/uaccess.h 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/include/asm-x86/x86_64/uaccess.h 2017-04-10 14:21:48.000000000 +0100
@@ -29,8 +29,9 @@
/*
* Valid if in +ve half of 48-bit address space, or above Xen-reserved area.
* This is also valid for range checks (addr, addr+size). As long as the
- * start address is outside the Xen-reserved area then we will access a
- * non-canonical address (and thus fault) before ever reaching VIRT_START.
+ * start address is outside the Xen-reserved area, sequential accesses
+ * (starting at addr) will hit a non-canonical address (and thus fault)
+ * before ever reaching VIRT_START.
*/
#define __addr_ok(addr) \
(((unsigned long)(addr) < (1UL<<47)) || \
@@ -40,7 +41,8 @@
(__addr_ok(addr) || is_compat_arg_xlat_range(addr, size))
#define array_access_ok(addr, count, size) \
- (access_ok(addr, (count)*(size)))
+ (likely(((count) ?: 0UL) < (~0UL / (size))) && \
+ access_ok(addr, (count) * (size)))
#define __compat_addr_ok(d, addr) \
((unsigned long)(addr) < HYPERVISOR_COMPAT_VIRT_START(d))
diff -Nru xen-4.8.1~pre.2017.01.23/xen/include/public/arch-x86/hvm/save.h xen-4.8.1/xen/include/public/arch-x86/hvm/save.h
--- xen-4.8.1~pre.2017.01.23/xen/include/public/arch-x86/hvm/save.h 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/include/public/arch-x86/hvm/save.h 2017-04-10 14:21:48.000000000 +0100
@@ -135,7 +135,7 @@
uint64_t shadow_gs;
/* msr content saved/restored. */
- uint64_t msr_flags;
+ uint64_t msr_flags; /* Obsolete, ignored. */
uint64_t msr_lstar;
uint64_t msr_star;
uint64_t msr_cstar;
@@ -249,7 +249,7 @@
uint64_t shadow_gs;
/* msr content saved/restored. */
- uint64_t msr_flags;
+ uint64_t msr_flags; /* Obsolete, ignored. */
uint64_t msr_lstar;
uint64_t msr_star;
uint64_t msr_cstar;
diff -Nru xen-4.8.1~pre.2017.01.23/xen/include/public/memory.h xen-4.8.1/xen/include/public/memory.h
--- xen-4.8.1~pre.2017.01.23/xen/include/public/memory.h 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/include/public/memory.h 2017-04-10 14:21:48.000000000 +0100
@@ -222,9 +222,9 @@
* XENMEM_add_to_physmap_batch only. */
#define XENMAPSPACE_dev_mmio 5 /* device mmio region
ARM only; the region is mapped in
- Stage-2 using the memory attribute
- "Device-nGnRE" (previously named
- "Device" on ARMv7) */
+ Stage-2 using the Normal Memory
+ Inner/Outer Write-Back Cacheable
+ memory attribute. */
/* ` } */
/*
diff -Nru xen-4.8.1~pre.2017.01.23/xen/include/xsm/dummy.h xen-4.8.1/xen/include/xsm/dummy.h
--- xen-4.8.1~pre.2017.01.23/xen/include/xsm/dummy.h 2017-01-20 17:37:46.000000000 +0000
+++ xen-4.8.1/xen/include/xsm/dummy.h 2017-04-10 14:21:48.000000000 +0100
@@ -712,18 +712,13 @@
XSM_ASSERT_ACTION(XSM_OTHER);
switch ( op )
{
- case XENPMU_mode_set:
- case XENPMU_mode_get:
- case XENPMU_feature_set:
- case XENPMU_feature_get:
- return xsm_default_action(XSM_PRIV, d, current->domain);
case XENPMU_init:
case XENPMU_finish:
case XENPMU_lvtpc_set:
case XENPMU_flush:
return xsm_default_action(XSM_HOOK, d, current->domain);
default:
- return -EPERM;
+ return xsm_default_action(XSM_PRIV, d, current->domain);
}
}
--- End Message ---