Bug#1112282: trixie-pu: package watcher/14.0.0-2
Package: release.debian.org
Severity: normal
Tags: trixie
X-Debbugs-Cc: watcher@packages.debian.org
Control: affects -1 + src:watcher
User: release.debian.org@packages.debian.org
Usertags: pu
Hi,
[ Reason ]
I'd like to fix: https://bugs.debian.org/1111692
in Trixie. This is a vulnerability where an OpenStack volume
may be mounted to a wrong VM.
[ Impact ]
Someone could access the volume of another tenant in an
OpenStack deployment.
[ Tests ]
Upstream has intensive unit and functional tests. I use it
too with the packaged version (that's on top of unit tests
at build time and in autopkgtest).
[ Risks ]
Not much risk thanks to testing.
[ Checklist ]
[x] *all* changes are documented in the d/changelog
[x] I reviewed all changes and I approve them
[x] attach debdiff against the package in (old)stable
[x] the issue is verified as fixed in unstable
Please allow me to upload watcher/14.0.0-2+deb13u1 to Trixe
proposed-updates as per attached debdiff.
Cheers,
Thomas Goirand (zigo)
P.S: I'm following-up with the same request for Nova, as
both have fixes for OSSN-0094.
diff -Nru watcher-14.0.0/debian/changelog watcher-14.0.0/debian/changelog
--- watcher-14.0.0/debian/changelog 2025-07-11 14:45:24.000000000 +0200
+++ watcher-14.0.0/debian/changelog 2025-08-21 10:27:37.000000000 +0200
@@ -1,3 +1,15 @@
+watcher (14.0.0-2+deb13u1) trixie; urgency=high
+
+ * A vulnerability has been identified in OpenStack Nova and OpenStack Watcher
+ in conjunction with volume swap operations performed by the Watcher
+ service. Under specific circumstances, this can lead to a situation where
+ two Nova libvirt instances could reference the same block device, allowing
+ accidental information disclosure to the unauthorized instance. Added
+ upstream patch: OSSN-0094_use_cinder_migrate_for_swap_volume.patch.
+ (Closes: #1111692).
+
+ -- Thomas Goirand <zigo@debian.org> Thu, 21 Aug 2025 10:27:37 +0200
+
watcher (14.0.0-2) unstable; urgency=medium
* Add export OS_OSLO_MESSAGING_RABBIT__PROCESSNAME for all daemons.
diff -Nru watcher-14.0.0/debian/patches/OSSN-0094_use_cinder_migrate_for_swap_volume.patch watcher-14.0.0/debian/patches/OSSN-0094_use_cinder_migrate_for_swap_volume.patch
--- watcher-14.0.0/debian/patches/OSSN-0094_use_cinder_migrate_for_swap_volume.patch 1970-01-01 01:00:00.000000000 +0100
+++ watcher-14.0.0/debian/patches/OSSN-0094_use_cinder_migrate_for_swap_volume.patch 2025-08-21 10:27:37.000000000 +0200
@@ -0,0 +1,521 @@
+Author: Sean Mooney <work@seanmooney.info>
+Date: Tue, 10 Jun 2025 20:13:49 +0100
+Description: use cinder migrate for swap volume
+ This change removes watchers in tree functionality
+ for swapping instance volumes and defines swap as an alias
+ of cinder volume migrate.
+ .
+ The watcher native implementation was missing error handling
+ which could lead to irretrievable data loss.
+ .
+ The removed code also forged project user credentials to
+ perform admin request as if it was done by a member of a project.
+ this was unsafe an posses a security risk due to how it was
+ implemented. This code has been removed without replacement.
+ .
+ While some effort has been made to allow existing
+ audits that were defined to work, any reduction of functionality
+ as a result of this security hardening is intentional.
+Bug: https://launchpad.net/bugs/2112187
+Bug-Debian: https://bugs.debian.org/1111692
+Change-Id: Ic3b6bfd164e272d70fe86d7b182478dd962f8ac0
+Signed-off-by: Sean Mooney <work@seanmooney.info>
+Origin: upstream, https://review.opendev.org/c/openstack/watcher/+/957770
+Last-Update: 2025-08-21
+
+diff --git a/releasenotes/notes/bug-2112187-763bae283e0b736d.yaml b/releasenotes/notes/bug-2112187-763bae283e0b736d.yaml
+new file mode 100644
+index 0000000..630fbcf
+--- /dev/null
++++ b/releasenotes/notes/bug-2112187-763bae283e0b736d.yaml
+@@ -0,0 +1,47 @@
++---
++security:
++ - |
++ Watchers no longer forges requests on behalf of a tenant when
++ swapping volumes. Prior to this release watcher had 2 implementations
++ of moving a volume, it could use cinders volume migrate api or its own
++ internal implementation that directly calls nova volume attachment update
++ api. The former is safe and the recommend way to move volumes between
++ cinder storage backend the internal implementation was insecure, fragile
++ due to a lack of error handling and capable of deleting user data.
++
++ Insecure: the internal volume migration operation created a new keystone
++ user with a weak name and password and added it to the tenants project
++ with the admin role. It then used that user to forge request on behalf
++ of the tenant with admin right to swap the volume. if the applier was
++ restarted during the execution of this operation it would never be cleaned
++ up.
++
++ Fragile: the error handling was minimal, the swap volume api is async
++ so watcher has to poll for completion, there was no support to resume
++ that if interrupted of the time out was exceeded.
++
++ Data-loss: while the internal polling logic returned success or failure
++ watcher did not check the result, once the function returned it
++ unconditionally deleted the source volume. For larger volumes this
++ could result in irretrievable data loss.
++
++ Finally if a volume was swapped using the internal workflow it put
++ the nova instance in an out of sync state. If the VM was live migrated
++ after the swap volume completed successfully prior to a hard reboot
++ then the migration would fail or succeed and break tenant isolation.
++
++ see: https://bugs.launchpad.net/nova/+bug/2112187 for details.
++fixes:
++ - |
++ All code related to creating keystone user and granting roles has been
++ removed. The internal swap volume implementation has been removed and
++ replaced by cinders volume migrate api. Note as part of this change
++ Watcher will no longer attempt volume migrations or retypes if the
++ instance is in the `Verify Resize` task state. This resolves several
++ issues related to volume migration in the zone migration and
++ Storage capacity balance strategies. While efforts have been made
++ to maintain backward compatibility these changes are required to
++ address a security weakness in watcher's prior approach.
++
++ see: https://bugs.launchpad.net/nova/+bug/2112187 for more context.
++
+diff --git a/watcher/applier/actions/volume_migration.py b/watcher/applier/actions/volume_migration.py
+index aa2466f..8575eea 100644
+--- a/watcher/applier/actions/volume_migration.py
++++ b/watcher/applier/actions/volume_migration.py
+@@ -17,14 +17,11 @@
+
+ from oslo_log import log
+
+-from cinderclient import client as cinder_client
+ from watcher._i18n import _
+ from watcher.applier.actions import base
+ from watcher.common import cinder_helper
+ from watcher.common import exception
+-from watcher.common import keystone_helper
+ from watcher.common import nova_helper
+-from watcher.common import utils
+ from watcher import conf
+
+ CONF = conf.CONF
+@@ -70,8 +67,6 @@
+
+ def __init__(self, config, osc=None):
+ super(VolumeMigrate, self).__init__(config)
+- self.temp_username = utils.random_string(10)
+- self.temp_password = utils.random_string(10)
+ self.cinder_util = cinder_helper.CinderHelper(osc=self.osc)
+ self.nova_util = nova_helper.NovaHelper(osc=self.osc)
+
+@@ -134,83 +129,42 @@
+
+ def _can_swap(self, volume):
+ """Judge volume can be swapped"""
++ # TODO(sean-k-mooney): rename this to _can_migrate and update
++ # tests to reflect that.
+
++ # cinder volume migration can migrate volumes that are not
++ # attached to instances or nova can migrate the data for cinder
++ # if the volume is in-use. If the volume has no attachments
++ # allow cinder to decided if it can be migrated.
+ if not volume.attachments:
+- return False
+- instance_id = volume.attachments[0]['server_id']
+- instance_status = self.nova_util.find_instance(instance_id).status
+-
+- if (volume.status == 'in-use' and
+- instance_status in ('ACTIVE', 'PAUSED', 'RESIZED')):
++ LOG.debug(f"volume: {volume.id} has no attachments")
+ return True
+
+- return False
+-
+- def _create_user(self, volume, user):
+- """Create user with volume attribute and user information"""
+- keystone_util = keystone_helper.KeystoneHelper(osc=self.osc)
+- project_id = getattr(volume, 'os-vol-tenant-attr:tenant_id')
+- user['project'] = project_id
+- user['domain'] = keystone_util.get_project(project_id).domain_id
+- user['roles'] = ['admin']
+- return keystone_util.create_user(user)
+-
+- def _get_cinder_client(self, session):
+- """Get cinder client by session"""
+- return cinder_client.Client(
+- CONF.cinder_client.api_version,
+- session=session,
+- endpoint_type=CONF.cinder_client.endpoint_type)
+-
+- def _swap_volume(self, volume, dest_type):
+- """Swap volume to dest_type
+-
+- Limitation note: only for compute libvirt driver
+- """
+- if not dest_type:
+- raise exception.Invalid(
+- message=(_("destination type is required when "
+- "migration type is swap")))
+-
+- if not self._can_swap(volume):
+- raise exception.Invalid(
+- message=(_("Invalid state for swapping volume")))
+-
+- user_info = {
+- 'name': self.temp_username,
+- 'password': self.temp_password}
+- user = self._create_user(volume, user_info)
+- keystone_util = keystone_helper.KeystoneHelper(osc=self.osc)
+- try:
+- session = keystone_util.create_session(
+- user.id, self.temp_password)
+- temp_cinder = self._get_cinder_client(session)
+-
+- # swap volume
+- new_volume = self.cinder_util.create_volume(
+- temp_cinder, volume, dest_type)
+- self.nova_util.swap_volume(volume, new_volume)
+-
+- # delete old volume
+- self.cinder_util.delete_volume(volume)
+-
+- finally:
+- keystone_util.delete_user(user)
+-
+- return True
++ # since it has attachments we need to validate nova's constraints
++ instance_id = volume.attachments[0]['server_id']
++ instance_status = self.nova_util.find_instance(instance_id).status
++ LOG.debug(
++ f"volume: {volume.id} is attached to instance: {instance_id} "
++ f"in instance status: {instance_status}")
++ # NOTE(sean-k-mooney): This used to allow RESIZED which
++ # is the resize_verify task state, that is not an acceptable time
++ # to migrate volumes, if nova does not block this in the API
++ # today that is probably a bug. PAUSED is also questionable but
++ # it should generally be safe.
++ return (volume.status == 'in-use' and
++ instance_status in ('ACTIVE', 'PAUSED'))
+
+ def _migrate(self, volume_id, dest_node, dest_type):
+-
+ try:
+ volume = self.cinder_util.get_volume(volume_id)
+- if self.migration_type == self.SWAP:
+- if dest_node:
+- LOG.warning("dest_node is ignored")
+- return self._swap_volume(volume, dest_type)
++ # for backward compatibility map swap to migrate.
++ if self.migration_type in (self.SWAP, self.MIGRATE):
++ if not self._can_swap(volume):
++ raise exception.Invalid(
++ message=(_("Invalid state for swapping volume")))
++ return self.cinder_util.migrate(volume, dest_node)
+ elif self.migration_type == self.RETYPE:
+ return self.cinder_util.retype(volume, dest_type)
+- elif self.migration_type == self.MIGRATE:
+- return self.cinder_util.migrate(volume, dest_node)
+ else:
+ raise exception.Invalid(
+ message=(_("Migration of type '%(migration_type)s' is not "
+diff --git a/watcher/common/keystone_helper.py b/watcher/common/keystone_helper.py
+index 53ff8e2..3733fd1 100644
+--- a/watcher/common/keystone_helper.py
++++ b/watcher/common/keystone_helper.py
+@@ -15,8 +15,6 @@
+ from oslo_log import log
+
+ from keystoneauth1.exceptions import http as ks_exceptions
+-from keystoneauth1 import loading
+-from keystoneauth1 import session
+ from watcher._i18n import _
+ from watcher.common import clients
+ from watcher.common import exception
+@@ -90,35 +88,3 @@
+ message=(_("Domain name seems ambiguous: %s") %
+ name_or_id))
+ return domains[0]
+-
+- def create_session(self, user_id, password):
+- user = self.get_user(user_id)
+- loader = loading.get_plugin_loader('password')
+- auth = loader.load_from_options(
+- auth_url=CONF.watcher_clients_auth.auth_url,
+- password=password,
+- user_id=user_id,
+- project_id=user.default_project_id)
+- return session.Session(auth=auth)
+-
+- def create_user(self, user):
+- project = self.get_project(user['project'])
+- domain = self.get_domain(user['domain'])
+- _user = self.keystone.users.create(
+- user['name'],
+- password=user['password'],
+- domain=domain,
+- project=project,
+- )
+- for role in user['roles']:
+- role = self.get_role(role)
+- self.keystone.roles.grant(
+- role.id, user=_user.id, project=project.id)
+- return _user
+-
+- def delete_user(self, user):
+- try:
+- user = self.get_user(user)
+- self.keystone.users.delete(user)
+- except exception.Invalid:
+- pass
+diff --git a/watcher/common/utils.py b/watcher/common/utils.py
+index 5780e81..0d50761 100644
+--- a/watcher/common/utils.py
++++ b/watcher/common/utils.py
+@@ -19,9 +19,7 @@
+ import asyncio
+ import datetime
+ import inspect
+-import random
+ import re
+-import string
+
+ from croniter import croniter
+ import eventlet
+@@ -160,14 +158,10 @@
+ StrictDefaultValidatingDraft4Validator = extend_with_default(
+ extend_with_strict_schema(validators.Draft4Validator))
+
++
+ Draft4Validator = validators.Draft4Validator
+
+
+-def random_string(n):
+- return ''.join([random.choice(
+- string.ascii_letters + string.digits) for i in range(n)])
+-
+-
+ # Some clients (e.g. MAAS) use asyncio, which isn't compatible with Eventlet.
+ # As a workaround, we're delegating such calls to a native thread.
+ def async_compat_call(f, *args, **kwargs):
+diff --git a/watcher/decision_engine/strategy/strategies/zone_migration.py b/watcher/decision_engine/strategy/strategies/zone_migration.py
+index 16dd2f0..908f71e 100644
+--- a/watcher/decision_engine/strategy/strategies/zone_migration.py
++++ b/watcher/decision_engine/strategy/strategies/zone_migration.py
+@@ -57,8 +57,19 @@
+ self.planned_cold_count = 0
+ self.volume_count = 0
+ self.planned_volume_count = 0
+- self.volume_update_count = 0
+- self.planned_volume_update_count = 0
++
++ # TODO(sean-n-mooney) This is backward compatibility
++ # for calling the swap code paths. Swap is now an alias
++ # for migrate, we should clean this up in a future
++ # cycle.
++ @property
++ def volume_update_count(self):
++ return self.volume_count
++
++ # same as above clean up later.
++ @property
++ def planned_volume_update_count(self):
++ return self.planned_volume_count
+
+ @classmethod
+ def get_name(cls):
+@@ -312,8 +323,8 @@
+ planned_cold_migrate_instance_count=self.planned_cold_count,
+ volume_migrate_count=self.volume_count,
+ planned_volume_migrate_count=self.planned_volume_count,
+- volume_update_count=self.volume_update_count,
+- planned_volume_update_count=self.planned_volume_update_count
++ volume_update_count=self.volume_count,
++ planned_volume_update_count=self.planned_volume_count
+ )
+
+ def set_migration_count(self, targets):
+@@ -328,10 +339,7 @@
+ elif self.is_cold(instance):
+ self.cold_count += 1
+ for volume in targets.get('volume', []):
+- if self.is_available(volume):
+- self.volume_count += 1
+- elif self.is_in_use(volume):
+- self.volume_update_count += 1
++ self.volume_count += 1
+
+ def is_live(self, instance):
+ status = getattr(instance, 'status')
+@@ -404,19 +412,16 @@
+ LOG.debug(src_type)
+ LOG.debug("%s %s", dst_pool, dst_type)
+
+- if self.is_available(volume):
+- if src_type == dst_type:
+- self._volume_migrate(volume, dst_pool)
+- else:
+- self._volume_retype(volume, dst_type)
+- elif self.is_in_use(volume):
+- self._volume_update(volume, dst_type)
++ if src_type == dst_type:
++ self._volume_migrate(volume, dst_pool)
++ else:
++ self._volume_retype(volume, dst_type)
+
+- # if with_attached_volume is True, migrate attaching instances
+- if self.with_attached_volume:
+- instances = [self.nova.find_instance(dic.get('server_id'))
+- for dic in volume.attachments]
+- self.instances_migration(instances, action_counter)
++ # if with_attached_volume is True, migrate attaching instances
++ if self.with_attached_volume:
++ instances = [self.nova.find_instance(dic.get('server_id'))
++ for dic in volume.attachments]
++ self.instances_migration(instances, action_counter)
+
+ action_counter.add_pool(pool)
+
+@@ -464,16 +469,6 @@
+ input_parameters=parameters)
+ self.planned_cold_count += 1
+
+- def _volume_update(self, volume, dst_type):
+- parameters = {"migration_type": "swap",
+- "destination_type": dst_type,
+- "resource_name": volume.name}
+- self.solution.add_action(
+- action_type="volume_migrate",
+- resource_id=volume.id,
+- input_parameters=parameters)
+- self.planned_volume_update_count += 1
+-
+ def _volume_migrate(self, volume, dst_pool):
+ parameters = {"migration_type": "migrate",
+ "destination_node": dst_pool,
+diff --git a/watcher/tests/applier/actions/test_volume_migration.py b/watcher/tests/applier/actions/test_volume_migration.py
+index ae77e55..d97957c 100644
+--- a/watcher/tests/applier/actions/test_volume_migration.py
++++ b/watcher/tests/applier/actions/test_volume_migration.py
+@@ -22,7 +22,6 @@
+ from watcher.common import clients
+ from watcher.common import keystone_helper
+ from watcher.common import nova_helper
+-from watcher.common import utils as w_utils
+ from watcher.tests import base
+
+
+@@ -102,12 +101,15 @@
+
+ @staticmethod
+ def fake_volume(**kwargs):
++ # FIXME(sean-k-mooney): we should be using real objects in this
++ # test or at lease something more Representative of the real data
+ volume = mock.MagicMock()
+ volume.id = kwargs.get('id', TestMigration.VOLUME_UUID)
+ volume.size = kwargs.get('size', '1')
+ volume.status = kwargs.get('status', 'available')
+ volume.snapshot_id = kwargs.get('snapshot_id', None)
+ volume.availability_zone = kwargs.get('availability_zone', 'nova')
++ volume.attachments = kwargs.get('attachments', [])
+ return volume
+
+ @staticmethod
+@@ -175,42 +177,14 @@
+ "storage1-typename",
+ )
+
+- def test_swap_success(self):
+- volume = self.fake_volume(
+- status='in-use', attachments=[{'server_id': 'server_id'}])
+- self.m_n_helper.find_instance.return_value = self.fake_instance()
+-
+- new_volume = self.fake_volume(id=w_utils.generate_uuid())
+- user = mock.Mock()
+- session = mock.MagicMock()
+- self.m_k_helper.create_user.return_value = user
+- self.m_k_helper.create_session.return_value = session
+- self.m_c_helper.get_volume.return_value = volume
+- self.m_c_helper.create_volume.return_value = new_volume
+-
+- result = self.action_swap.execute()
+- self.assertTrue(result)
+-
+- self.m_n_helper.swap_volume.assert_called_once_with(
+- volume,
+- new_volume
+- )
+- self.m_k_helper.delete_user.assert_called_once_with(user)
+-
+- def test_swap_fail(self):
+- # _can_swap fail
+- instance = self.fake_instance(status='STOPPED')
+- self.m_n_helper.find_instance.return_value = instance
+-
+- result = self.action_swap.execute()
+- self.assertFalse(result)
+-
+ def test_can_swap_success(self):
+ volume = self.fake_volume(
+- status='in-use', attachments=[{'server_id': 'server_id'}])
+- instance = self.fake_instance()
++ status='in-use', attachments=[
++ {'server_id': TestMigration.INSTANCE_UUID}])
+
++ instance = self.fake_instance()
+ self.m_n_helper.find_instance.return_value = instance
++
+ result = self.action_swap._can_swap(volume)
+ self.assertTrue(result)
+
+@@ -219,16 +193,33 @@
+ result = self.action_swap._can_swap(volume)
+ self.assertTrue(result)
+
+- instance = self.fake_instance(status='RESIZED')
+- self.m_n_helper.find_instance.return_value = instance
+- result = self.action_swap._can_swap(volume)
+- self.assertTrue(result)
+-
+ def test_can_swap_fail(self):
+
+ volume = self.fake_volume(
+- status='in-use', attachments=[{'server_id': 'server_id'}])
++ status='in-use', attachments=[
++ {'server_id': TestMigration.INSTANCE_UUID}])
+ instance = self.fake_instance(status='STOPPED')
+ self.m_n_helper.find_instance.return_value = instance
+ result = self.action_swap._can_swap(volume)
+ self.assertFalse(result)
++
++ instance = self.fake_instance(status='RESIZED')
++ self.m_n_helper.find_instance.return_value = instance
++ result = self.action_swap._can_swap(volume)
++ self.assertFalse(result)
++
++ def test_swap_success(self):
++ volume = self.fake_volume(
++ status='in-use', attachments=[
++ {'server_id': TestMigration.INSTANCE_UUID}])
++ self.m_c_helper.get_volume.return_value = volume
++
++ instance = self.fake_instance()
++ self.m_n_helper.find_instance.return_value = instance
++
++ result = self.action_swap.execute()
++ self.assertTrue(result)
++ self.m_c_helper.migrate.assert_called_once_with(
++ volume,
++ "storage1-poolname"
++ )
+diff --git a/watcher/tests/decision_engine/strategy/strategies/test_zone_migration.py b/watcher/tests/decision_engine/strategy/strategies/test_zone_migration.py
+index 9bd4c39..0326451 100644
+--- a/watcher/tests/decision_engine/strategy/strategies/test_zone_migration.py
++++ b/watcher/tests/decision_engine/strategy/strategies/test_zone_migration.py
+@@ -320,7 +320,10 @@
+ migration_types = collections.Counter(
+ [action.get('input_parameters')['migration_type']
+ for action in solution.actions])
+- self.assertEqual(1, migration_types.get("swap", 0))
++ # watcher no longer implements swap. it is now an
++ # alias for migrate.
++ self.assertEqual(0, migration_types.get("swap", 0))
++ self.assertEqual(1, migration_types.get("migrate", 1))
+ global_efficacy_value = solution.global_efficacy[3].get('value', 0)
+ self.assertEqual(100, global_efficacy_value)
+
diff -Nru watcher-14.0.0/debian/patches/series watcher-14.0.0/debian/patches/series
--- watcher-14.0.0/debian/patches/series 2025-07-11 14:45:24.000000000 +0200
+++ watcher-14.0.0/debian/patches/series 2025-08-21 10:27:37.000000000 +0200
@@ -1,3 +1,4 @@
also-package-alembic-migration.patch
remove-sphinxcontrib.rsvgconverter.patch
removed-sphinxcontrib.httpdomain-from-sphinx-ext.patch
+OSSN-0094_use_cinder_migrate_for_swap_volume.patch
Reply to: