[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1112282: trixie-pu: package watcher/14.0.0-2



Package: release.debian.org
Severity: normal
Tags: trixie
X-Debbugs-Cc: watcher@packages.debian.org
Control: affects -1 + src:watcher
User: release.debian.org@packages.debian.org
Usertags: pu

Hi,

[ Reason ]
I'd like to fix: https://bugs.debian.org/1111692
in Trixie. This is a vulnerability where an OpenStack volume
may be mounted to a wrong VM.

[ Impact ]
Someone could access the volume of another tenant in an
OpenStack deployment.

[ Tests ]
Upstream has intensive unit and functional tests. I use it
too with the packaged version (that's on top of unit tests
at build time and in autopkgtest).

[ Risks ]
Not much risk thanks to testing.

[ Checklist ]
  [x] *all* changes are documented in the d/changelog
  [x] I reviewed all changes and I approve them
  [x] attach debdiff against the package in (old)stable
  [x] the issue is verified as fixed in unstable

Please allow me to upload watcher/14.0.0-2+deb13u1 to Trixe
proposed-updates as per attached debdiff.

Cheers,

Thomas Goirand (zigo)

P.S: I'm following-up with the same request for Nova, as
both have fixes for OSSN-0094.
diff -Nru watcher-14.0.0/debian/changelog watcher-14.0.0/debian/changelog
--- watcher-14.0.0/debian/changelog	2025-07-11 14:45:24.000000000 +0200
+++ watcher-14.0.0/debian/changelog	2025-08-21 10:27:37.000000000 +0200
@@ -1,3 +1,15 @@
+watcher (14.0.0-2+deb13u1) trixie; urgency=high
+
+  * A vulnerability has been identified in OpenStack Nova and OpenStack Watcher
+    in conjunction with volume swap operations performed by the Watcher
+    service. Under specific circumstances, this can lead to a situation where
+    two Nova libvirt instances could reference the same block device, allowing
+    accidental information disclosure to the unauthorized instance. Added
+    upstream patch: OSSN-0094_use_cinder_migrate_for_swap_volume.patch.
+    (Closes: #1111692).
+
+ -- Thomas Goirand <zigo@debian.org>  Thu, 21 Aug 2025 10:27:37 +0200
+
 watcher (14.0.0-2) unstable; urgency=medium
 
   * Add export OS_OSLO_MESSAGING_RABBIT__PROCESSNAME for all daemons.
diff -Nru watcher-14.0.0/debian/patches/OSSN-0094_use_cinder_migrate_for_swap_volume.patch watcher-14.0.0/debian/patches/OSSN-0094_use_cinder_migrate_for_swap_volume.patch
--- watcher-14.0.0/debian/patches/OSSN-0094_use_cinder_migrate_for_swap_volume.patch	1970-01-01 01:00:00.000000000 +0100
+++ watcher-14.0.0/debian/patches/OSSN-0094_use_cinder_migrate_for_swap_volume.patch	2025-08-21 10:27:37.000000000 +0200
@@ -0,0 +1,521 @@
+Author: Sean Mooney <work@seanmooney.info>
+Date: Tue, 10 Jun 2025 20:13:49 +0100
+Description: use cinder migrate for swap volume
+ This change removes watchers in tree functionality
+ for swapping instance volumes and defines swap as an alias
+ of cinder volume migrate.
+ .
+ The watcher native implementation was missing error handling
+ which could lead to irretrievable data loss.
+ .
+ The removed code also forged project user credentials to
+ perform admin request as if it was done by a member of a project.
+ this was unsafe an posses a security risk due to how it was
+ implemented. This code has been removed without replacement.
+ .
+ While some effort has been made to allow existing
+ audits that were defined to work, any reduction of functionality
+ as a result of this security hardening is intentional.
+Bug: https://launchpad.net/bugs/2112187
+Bug-Debian: https://bugs.debian.org/1111692
+Change-Id: Ic3b6bfd164e272d70fe86d7b182478dd962f8ac0
+Signed-off-by: Sean Mooney <work@seanmooney.info>
+Origin: upstream, https://review.opendev.org/c/openstack/watcher/+/957770
+Last-Update: 2025-08-21
+
+diff --git a/releasenotes/notes/bug-2112187-763bae283e0b736d.yaml b/releasenotes/notes/bug-2112187-763bae283e0b736d.yaml
+new file mode 100644
+index 0000000..630fbcf
+--- /dev/null
++++ b/releasenotes/notes/bug-2112187-763bae283e0b736d.yaml
+@@ -0,0 +1,47 @@
++---
++security:
++  - |
++    Watchers no longer forges requests on behalf of a tenant when
++    swapping volumes. Prior to this release watcher had 2 implementations
++    of moving a volume, it could use cinders volume migrate api or its own
++    internal implementation that directly calls nova volume attachment update
++    api. The former is safe and the recommend way to move volumes between
++    cinder storage backend the internal implementation was insecure, fragile
++    due to a lack of error handling and capable of deleting user data.
++
++    Insecure: the internal volume migration operation created a new keystone
++    user with a weak name and password and added it to the tenants project
++    with the admin role. It then used that user to forge request on behalf
++    of the tenant with admin right to swap the volume. if the applier was
++    restarted during the execution of this operation it would never be cleaned
++    up.
++
++    Fragile: the error handling was minimal, the swap volume api is async
++    so watcher has to poll for completion, there was no support to resume
++    that if interrupted of the time out was exceeded.
++
++    Data-loss: while the internal polling logic returned success or failure
++    watcher did not check the result, once the function returned it
++    unconditionally deleted the source volume. For larger volumes this
++    could result in irretrievable data loss.
++
++    Finally if a volume was swapped using the internal workflow it put
++    the nova instance in an out of sync state. If the VM was live migrated
++    after the swap volume completed successfully prior to a hard reboot
++    then the migration would fail or succeed and break tenant isolation.
++
++    see: https://bugs.launchpad.net/nova/+bug/2112187 for details.
++fixes:
++  - |
++    All code related to creating keystone user and granting roles has been
++    removed. The internal swap volume implementation has been removed and
++    replaced by cinders volume migrate api. Note as part of this change
++    Watcher will no longer attempt volume migrations or retypes if the
++    instance is in the `Verify Resize` task state. This resolves several
++    issues related to volume migration in the zone migration and
++    Storage capacity balance strategies. While efforts have been made
++    to maintain backward compatibility these changes are required to
++    address a security weakness in watcher's prior approach.
++
++    see: https://bugs.launchpad.net/nova/+bug/2112187 for more context.
++
+diff --git a/watcher/applier/actions/volume_migration.py b/watcher/applier/actions/volume_migration.py
+index aa2466f..8575eea 100644
+--- a/watcher/applier/actions/volume_migration.py
++++ b/watcher/applier/actions/volume_migration.py
+@@ -17,14 +17,11 @@
+ 
+ from oslo_log import log
+ 
+-from cinderclient import client as cinder_client
+ from watcher._i18n import _
+ from watcher.applier.actions import base
+ from watcher.common import cinder_helper
+ from watcher.common import exception
+-from watcher.common import keystone_helper
+ from watcher.common import nova_helper
+-from watcher.common import utils
+ from watcher import conf
+ 
+ CONF = conf.CONF
+@@ -70,8 +67,6 @@
+ 
+     def __init__(self, config, osc=None):
+         super(VolumeMigrate, self).__init__(config)
+-        self.temp_username = utils.random_string(10)
+-        self.temp_password = utils.random_string(10)
+         self.cinder_util = cinder_helper.CinderHelper(osc=self.osc)
+         self.nova_util = nova_helper.NovaHelper(osc=self.osc)
+ 
+@@ -134,83 +129,42 @@
+ 
+     def _can_swap(self, volume):
+         """Judge volume can be swapped"""
++        # TODO(sean-k-mooney): rename this to _can_migrate and update
++        # tests to reflect that.
+ 
++        # cinder volume migration can migrate volumes that are not
++        # attached to instances or nova can migrate the data for cinder
++        # if the volume is in-use. If the volume has no attachments
++        # allow cinder to decided if it can be migrated.
+         if not volume.attachments:
+-            return False
+-        instance_id = volume.attachments[0]['server_id']
+-        instance_status = self.nova_util.find_instance(instance_id).status
+-
+-        if (volume.status == 'in-use' and
+-                instance_status in ('ACTIVE', 'PAUSED', 'RESIZED')):
++            LOG.debug(f"volume: {volume.id} has no attachments")
+             return True
+ 
+-        return False
+-
+-    def _create_user(self, volume, user):
+-        """Create user with volume attribute and user information"""
+-        keystone_util = keystone_helper.KeystoneHelper(osc=self.osc)
+-        project_id = getattr(volume, 'os-vol-tenant-attr:tenant_id')
+-        user['project'] = project_id
+-        user['domain'] = keystone_util.get_project(project_id).domain_id
+-        user['roles'] = ['admin']
+-        return keystone_util.create_user(user)
+-
+-    def _get_cinder_client(self, session):
+-        """Get cinder client by session"""
+-        return cinder_client.Client(
+-            CONF.cinder_client.api_version,
+-            session=session,
+-            endpoint_type=CONF.cinder_client.endpoint_type)
+-
+-    def _swap_volume(self, volume, dest_type):
+-        """Swap volume to dest_type
+-
+-           Limitation note: only for compute libvirt driver
+-        """
+-        if not dest_type:
+-            raise exception.Invalid(
+-                message=(_("destination type is required when "
+-                           "migration type is swap")))
+-
+-        if not self._can_swap(volume):
+-            raise exception.Invalid(
+-                message=(_("Invalid state for swapping volume")))
+-
+-        user_info = {
+-            'name': self.temp_username,
+-            'password': self.temp_password}
+-        user = self._create_user(volume, user_info)
+-        keystone_util = keystone_helper.KeystoneHelper(osc=self.osc)
+-        try:
+-            session = keystone_util.create_session(
+-                user.id, self.temp_password)
+-            temp_cinder = self._get_cinder_client(session)
+-
+-            # swap volume
+-            new_volume = self.cinder_util.create_volume(
+-                temp_cinder, volume, dest_type)
+-            self.nova_util.swap_volume(volume, new_volume)
+-
+-            # delete old volume
+-            self.cinder_util.delete_volume(volume)
+-
+-        finally:
+-            keystone_util.delete_user(user)
+-
+-        return True
++        # since it has attachments we need to validate nova's constraints
++        instance_id = volume.attachments[0]['server_id']
++        instance_status = self.nova_util.find_instance(instance_id).status
++        LOG.debug(
++            f"volume: {volume.id} is attached to instance: {instance_id} "
++            f"in instance status: {instance_status}")
++        # NOTE(sean-k-mooney): This used to allow RESIZED which
++        # is the resize_verify task state, that is not an acceptable time
++        # to migrate volumes, if nova does not block this in the API
++        # today that is probably a bug. PAUSED is also questionable but
++        # it should generally be safe.
++        return (volume.status == 'in-use' and
++                instance_status in ('ACTIVE', 'PAUSED'))
+ 
+     def _migrate(self, volume_id, dest_node, dest_type):
+-
+         try:
+             volume = self.cinder_util.get_volume(volume_id)
+-            if self.migration_type == self.SWAP:
+-                if dest_node:
+-                    LOG.warning("dest_node is ignored")
+-                return self._swap_volume(volume, dest_type)
++            # for backward compatibility map swap to migrate.
++            if self.migration_type in (self.SWAP, self.MIGRATE):
++                if not self._can_swap(volume):
++                    raise exception.Invalid(
++                        message=(_("Invalid state for swapping volume")))
++                return self.cinder_util.migrate(volume, dest_node)
+             elif self.migration_type == self.RETYPE:
+                 return self.cinder_util.retype(volume, dest_type)
+-            elif self.migration_type == self.MIGRATE:
+-                return self.cinder_util.migrate(volume, dest_node)
+             else:
+                 raise exception.Invalid(
+                     message=(_("Migration of type '%(migration_type)s' is not "
+diff --git a/watcher/common/keystone_helper.py b/watcher/common/keystone_helper.py
+index 53ff8e2..3733fd1 100644
+--- a/watcher/common/keystone_helper.py
++++ b/watcher/common/keystone_helper.py
+@@ -15,8 +15,6 @@
+ from oslo_log import log
+ 
+ from keystoneauth1.exceptions import http as ks_exceptions
+-from keystoneauth1 import loading
+-from keystoneauth1 import session
+ from watcher._i18n import _
+ from watcher.common import clients
+ from watcher.common import exception
+@@ -90,35 +88,3 @@
+                     message=(_("Domain name seems ambiguous: %s") %
+                              name_or_id))
+             return domains[0]
+-
+-    def create_session(self, user_id, password):
+-        user = self.get_user(user_id)
+-        loader = loading.get_plugin_loader('password')
+-        auth = loader.load_from_options(
+-            auth_url=CONF.watcher_clients_auth.auth_url,
+-            password=password,
+-            user_id=user_id,
+-            project_id=user.default_project_id)
+-        return session.Session(auth=auth)
+-
+-    def create_user(self, user):
+-        project = self.get_project(user['project'])
+-        domain = self.get_domain(user['domain'])
+-        _user = self.keystone.users.create(
+-            user['name'],
+-            password=user['password'],
+-            domain=domain,
+-            project=project,
+-        )
+-        for role in user['roles']:
+-            role = self.get_role(role)
+-            self.keystone.roles.grant(
+-                role.id, user=_user.id, project=project.id)
+-        return _user
+-
+-    def delete_user(self, user):
+-        try:
+-            user = self.get_user(user)
+-            self.keystone.users.delete(user)
+-        except exception.Invalid:
+-            pass
+diff --git a/watcher/common/utils.py b/watcher/common/utils.py
+index 5780e81..0d50761 100644
+--- a/watcher/common/utils.py
++++ b/watcher/common/utils.py
+@@ -19,9 +19,7 @@
+ import asyncio
+ import datetime
+ import inspect
+-import random
+ import re
+-import string
+ 
+ from croniter import croniter
+ import eventlet
+@@ -160,14 +158,10 @@
+ StrictDefaultValidatingDraft4Validator = extend_with_default(
+     extend_with_strict_schema(validators.Draft4Validator))
+ 
++
+ Draft4Validator = validators.Draft4Validator
+ 
+ 
+-def random_string(n):
+-    return ''.join([random.choice(
+-        string.ascii_letters + string.digits) for i in range(n)])
+-
+-
+ # Some clients (e.g. MAAS) use asyncio, which isn't compatible with Eventlet.
+ # As a workaround, we're delegating such calls to a native thread.
+ def async_compat_call(f, *args, **kwargs):
+diff --git a/watcher/decision_engine/strategy/strategies/zone_migration.py b/watcher/decision_engine/strategy/strategies/zone_migration.py
+index 16dd2f0..908f71e 100644
+--- a/watcher/decision_engine/strategy/strategies/zone_migration.py
++++ b/watcher/decision_engine/strategy/strategies/zone_migration.py
+@@ -57,8 +57,19 @@
+         self.planned_cold_count = 0
+         self.volume_count = 0
+         self.planned_volume_count = 0
+-        self.volume_update_count = 0
+-        self.planned_volume_update_count = 0
++
++    # TODO(sean-n-mooney) This is backward compatibility
++    # for calling the swap code paths. Swap is now an alias
++    # for migrate, we should clean this up in a future
++    # cycle.
++    @property
++    def volume_update_count(self):
++        return self.volume_count
++
++    # same as above clean up later.
++    @property
++    def planned_volume_update_count(self):
++        return self.planned_volume_count
+ 
+     @classmethod
+     def get_name(cls):
+@@ -312,8 +323,8 @@
+             planned_cold_migrate_instance_count=self.planned_cold_count,
+             volume_migrate_count=self.volume_count,
+             planned_volume_migrate_count=self.planned_volume_count,
+-            volume_update_count=self.volume_update_count,
+-            planned_volume_update_count=self.planned_volume_update_count
++            volume_update_count=self.volume_count,
++            planned_volume_update_count=self.planned_volume_count
+         )
+ 
+     def set_migration_count(self, targets):
+@@ -328,10 +339,7 @@
+             elif self.is_cold(instance):
+                 self.cold_count += 1
+         for volume in targets.get('volume', []):
+-            if self.is_available(volume):
+-                self.volume_count += 1
+-            elif self.is_in_use(volume):
+-                self.volume_update_count += 1
++            self.volume_count += 1
+ 
+     def is_live(self, instance):
+         status = getattr(instance, 'status')
+@@ -404,19 +412,16 @@
+             LOG.debug(src_type)
+             LOG.debug("%s %s", dst_pool, dst_type)
+ 
+-            if self.is_available(volume):
+-                if src_type == dst_type:
+-                    self._volume_migrate(volume, dst_pool)
+-                else:
+-                    self._volume_retype(volume, dst_type)
+-            elif self.is_in_use(volume):
+-                self._volume_update(volume, dst_type)
++            if src_type == dst_type:
++                self._volume_migrate(volume, dst_pool)
++            else:
++                self._volume_retype(volume, dst_type)
+ 
+-                # if with_attached_volume is True, migrate attaching instances
+-                if self.with_attached_volume:
+-                    instances = [self.nova.find_instance(dic.get('server_id'))
+-                                 for dic in volume.attachments]
+-                    self.instances_migration(instances, action_counter)
++            # if with_attached_volume is True, migrate attaching instances
++            if self.with_attached_volume:
++                instances = [self.nova.find_instance(dic.get('server_id'))
++                             for dic in volume.attachments]
++                self.instances_migration(instances, action_counter)
+ 
+             action_counter.add_pool(pool)
+ 
+@@ -464,16 +469,6 @@
+             input_parameters=parameters)
+         self.planned_cold_count += 1
+ 
+-    def _volume_update(self, volume, dst_type):
+-        parameters = {"migration_type": "swap",
+-                      "destination_type": dst_type,
+-                      "resource_name": volume.name}
+-        self.solution.add_action(
+-            action_type="volume_migrate",
+-            resource_id=volume.id,
+-            input_parameters=parameters)
+-        self.planned_volume_update_count += 1
+-
+     def _volume_migrate(self, volume, dst_pool):
+         parameters = {"migration_type": "migrate",
+                       "destination_node": dst_pool,
+diff --git a/watcher/tests/applier/actions/test_volume_migration.py b/watcher/tests/applier/actions/test_volume_migration.py
+index ae77e55..d97957c 100644
+--- a/watcher/tests/applier/actions/test_volume_migration.py
++++ b/watcher/tests/applier/actions/test_volume_migration.py
+@@ -22,7 +22,6 @@
+ from watcher.common import clients
+ from watcher.common import keystone_helper
+ from watcher.common import nova_helper
+-from watcher.common import utils as w_utils
+ from watcher.tests import base
+ 
+ 
+@@ -102,12 +101,15 @@
+ 
+     @staticmethod
+     def fake_volume(**kwargs):
++        # FIXME(sean-k-mooney): we should be using real objects in this
++        # test or at lease something more Representative of the real data
+         volume = mock.MagicMock()
+         volume.id = kwargs.get('id', TestMigration.VOLUME_UUID)
+         volume.size = kwargs.get('size', '1')
+         volume.status = kwargs.get('status', 'available')
+         volume.snapshot_id = kwargs.get('snapshot_id', None)
+         volume.availability_zone = kwargs.get('availability_zone', 'nova')
++        volume.attachments = kwargs.get('attachments', [])
+         return volume
+ 
+     @staticmethod
+@@ -175,42 +177,14 @@
+             "storage1-typename",
+         )
+ 
+-    def test_swap_success(self):
+-        volume = self.fake_volume(
+-            status='in-use', attachments=[{'server_id': 'server_id'}])
+-        self.m_n_helper.find_instance.return_value = self.fake_instance()
+-
+-        new_volume = self.fake_volume(id=w_utils.generate_uuid())
+-        user = mock.Mock()
+-        session = mock.MagicMock()
+-        self.m_k_helper.create_user.return_value = user
+-        self.m_k_helper.create_session.return_value = session
+-        self.m_c_helper.get_volume.return_value = volume
+-        self.m_c_helper.create_volume.return_value = new_volume
+-
+-        result = self.action_swap.execute()
+-        self.assertTrue(result)
+-
+-        self.m_n_helper.swap_volume.assert_called_once_with(
+-            volume,
+-            new_volume
+-        )
+-        self.m_k_helper.delete_user.assert_called_once_with(user)
+-
+-    def test_swap_fail(self):
+-        # _can_swap fail
+-        instance = self.fake_instance(status='STOPPED')
+-        self.m_n_helper.find_instance.return_value = instance
+-
+-        result = self.action_swap.execute()
+-        self.assertFalse(result)
+-
+     def test_can_swap_success(self):
+         volume = self.fake_volume(
+-            status='in-use', attachments=[{'server_id': 'server_id'}])
+-        instance = self.fake_instance()
++            status='in-use', attachments=[
++                {'server_id': TestMigration.INSTANCE_UUID}])
+ 
++        instance = self.fake_instance()
+         self.m_n_helper.find_instance.return_value = instance
++
+         result = self.action_swap._can_swap(volume)
+         self.assertTrue(result)
+ 
+@@ -219,16 +193,33 @@
+         result = self.action_swap._can_swap(volume)
+         self.assertTrue(result)
+ 
+-        instance = self.fake_instance(status='RESIZED')
+-        self.m_n_helper.find_instance.return_value = instance
+-        result = self.action_swap._can_swap(volume)
+-        self.assertTrue(result)
+-
+     def test_can_swap_fail(self):
+ 
+         volume = self.fake_volume(
+-            status='in-use', attachments=[{'server_id': 'server_id'}])
++            status='in-use', attachments=[
++                {'server_id': TestMigration.INSTANCE_UUID}])
+         instance = self.fake_instance(status='STOPPED')
+         self.m_n_helper.find_instance.return_value = instance
+         result = self.action_swap._can_swap(volume)
+         self.assertFalse(result)
++
++        instance = self.fake_instance(status='RESIZED')
++        self.m_n_helper.find_instance.return_value = instance
++        result = self.action_swap._can_swap(volume)
++        self.assertFalse(result)
++
++    def test_swap_success(self):
++        volume = self.fake_volume(
++            status='in-use', attachments=[
++                {'server_id': TestMigration.INSTANCE_UUID}])
++        self.m_c_helper.get_volume.return_value = volume
++
++        instance = self.fake_instance()
++        self.m_n_helper.find_instance.return_value = instance
++
++        result = self.action_swap.execute()
++        self.assertTrue(result)
++        self.m_c_helper.migrate.assert_called_once_with(
++            volume,
++            "storage1-poolname"
++        )
+diff --git a/watcher/tests/decision_engine/strategy/strategies/test_zone_migration.py b/watcher/tests/decision_engine/strategy/strategies/test_zone_migration.py
+index 9bd4c39..0326451 100644
+--- a/watcher/tests/decision_engine/strategy/strategies/test_zone_migration.py
++++ b/watcher/tests/decision_engine/strategy/strategies/test_zone_migration.py
+@@ -320,7 +320,10 @@
+         migration_types = collections.Counter(
+             [action.get('input_parameters')['migration_type']
+              for action in solution.actions])
+-        self.assertEqual(1, migration_types.get("swap", 0))
++        # watcher no longer implements swap. it is now an
++        # alias for migrate.
++        self.assertEqual(0, migration_types.get("swap", 0))
++        self.assertEqual(1, migration_types.get("migrate", 1))
+         global_efficacy_value = solution.global_efficacy[3].get('value', 0)
+         self.assertEqual(100, global_efficacy_value)
+ 
diff -Nru watcher-14.0.0/debian/patches/series watcher-14.0.0/debian/patches/series
--- watcher-14.0.0/debian/patches/series	2025-07-11 14:45:24.000000000 +0200
+++ watcher-14.0.0/debian/patches/series	2025-08-21 10:27:37.000000000 +0200
@@ -1,3 +1,4 @@
 also-package-alembic-migration.patch
 remove-sphinxcontrib.rsvgconverter.patch
 removed-sphinxcontrib.httpdomain-from-sphinx-ext.patch
+OSSN-0094_use_cinder_migrate_for_swap_volume.patch

Reply to: