OpenStack源码学习笔记5

今天遇到一个诡异的问题,对某个有问题的计算节点进行疏散,结果有些虚拟机的根磁盘居然消!失!了?首先能够确定的是ceph不会自动删除,那么一定是某个地方触发了删除根磁盘的操作。

这如果发生在生产环境可是一个极其严重的问题,正好借此排查的机会梳理一下nova关于主机疏散的流程。

以下代码为N版,但大体流程相差应该不大。

根据套路(不知道套路的看以前的系列文章),定位疏散入口为 api/openstack/compute/evacuate.py_evacuate 函数,这里有个比较重要的就是判断是否使用了共享存储,然后调用 compute/api.py 中的 evacuate 函数:

def evacuate(self, context, instance, host, on_shared_storage,
                admin_password=None, force=None):
    ......
    migration = objects.Migration(context,
                                    source_compute=instance.host,
                                    source_node=instance.node,
                                    instance_uuid=instance.uuid,
                                    status='accepted',
                                    migration_type='evacuation')
    if host:
        migration.dest_compute = host
    migration.create()
    ......
    return self.compute_task_api.rebuild_instance(context,
                    instance=instance,
                    new_pass=admin_password,
                    injected_files=None,
                    image_ref=None,
                    orig_image_ref=None,
                    orig_sys_metadata=None,
                    bdms=None,
                    recreate=True,
                    on_shared_storage=on_shared_storage,
                    host=host,
                    request_spec=request_spec,
                    )

这个函数首先建立一个status为 accepted 、type为 evacuation 的数据,然后看是否指定目标主机,在我们的使用场景下这个值都为 None ,这里需要注意一个 recreate 参数,在疏散调用时值为 True

然后再经过层层封装,跟进 compute/manager.py_do_rebuild_instance 函数,这个函数中进行主机状态、网络、磁盘相关的修改,构造好参数后进入 _rebuild_default_impl 函数:

def _rebuild_default_impl(self, context, instance, image_meta,
                            injected_files, admin_password, bdms,
                            detach_block_devices, attach_block_devices,
                            network_info=None,
                            recreate=False, block_device_info=None,
                            preserve_ephemeral=False):
    if preserve_ephemeral:
        # The default code path does not support preserving ephemeral
        # partitions.
        raise exception.PreserveEphemeralNotSupported()

    if recreate:
        detach_block_devices(context, bdms)
    else:
        self._power_off_instance(context, instance, clean_shutdown=True)
        detach_block_devices(context, bdms)
        self.driver.destroy(context, instance,
                            network_info=network_info,
                            block_device_info=block_device_info)

    instance.task_state = task_states.REBUILD_BLOCK_DEVICE_MAPPING
    instance.save(expected_task_state=[task_states.REBUILDING])

    new_block_device_info = attach_block_devices(context, instance, bdms)

    instance.task_state = task_states.REBUILD_SPAWNING
    instance.save(
        expected_task_state=[task_states.REBUILD_BLOCK_DEVICE_MAPPING])

    with instance.mutated_migration_context():
        self.driver.spawn(context, instance, image_meta, injected_files,
                            admin_password, network_info=network_info,
                            block_device_info=new_block_device_info)

正常情况下,这里执行完成后返回 _do_rebuild_instance 函数,疏散流程就结束了。

由于传入的 recreate 值为 True ,怎么看也不会有触发删除磁盘的行为啊?else分支的 driver.destroy 函数看起来很可疑,跟进去看看,代码位于 virt/libvirt/driver.py 中:

def destroy(self, context, instance, network_info, block_device_info=None,
            destroy_disks=True, migrate_data=None):
    self._destroy(instance)
    self.cleanup(context, instance, network_info, block_device_info,
                 destroy_disks, migrate_data)

其中 _destory 函数主要是删除主机,代码有兴趣的可以看看。而真正执行删磁盘的代码位于 cleanup 函数中:

def cleanup(self, context, instance, network_info, block_device_info=None,
            destroy_disks=True, migrate_data=None, destroy_vifs=True):
    ......      
        try:
            self._disconnect_volume(connection_info, disk_dev)
        except Exception as exc:
            with excutils.save_and_reraise_exception() as ctxt:
                if destroy_disks:
                    # Don't block on Volume errors if we're trying to
                    # delete the instance as we may be partially created
                    # or deleted
                    ctxt.reraise = False
                    LOG.warning(
                        _LW("Ignoring Volume Error on vol %(vol_id)s "
                            "during delete %(exc)s"),
                        {'vol_id': vol.get('volume_id'), 'exc': exc},
                        instance=instance)
    if destroy_disks:
        # NOTE(haomai): destroy volumes if needed
        if CONF.libvirt.images_type == 'lvm':
            self._cleanup_lvm(instance, block_device_info)
        if CONF.libvirt.images_type == 'rbd':
            self._cleanup_rbd(instance)
    ......
def _cleanup_rbd(self, instance):
    # NOTE(nic): On revert_resize, the cleanup steps for the root
    # volume are handled with an "rbd snap rollback" command,
    # and none of this is needed (and is, in fact, harmful) so
    # filter out non-ephemerals from the list
    if instance.task_state == task_states.RESIZE_REVERTING:
        filter_fn = lambda disk: (disk.startswith(instance.uuid) and
                                    disk.endswith('disk.local'))
    else:
        filter_fn = lambda disk: disk.startswith(instance.uuid)
    LibvirtDriver._get_rbd_driver().cleanup_volumes(filter_fn)

# Roy注:下面这函数在virt/libvirt/storage/rbd_utils.py中
def cleanup_volumes(self, filter_fn):
    with RADOSClient(self, self.pool) as client:
        volumes = RbdProxy().list(client.ioctx)
        for volume in filter(filter_fn, volumes):
            self._destroy_volume(client, volume)

至此,能够触发删除根磁盘的代码可以确定了。这里我们倒过来想,如果疏散行为不会导致磁盘被删除,那么一定是其他地方调用了这个函数呢?经过排查,所有会调用这个函数的地方有4个:

  1. evacuate
  2. revert_resize
  3. shelve_offloading
  4. host_init

其中1、2、3都被排除,那么就看host_init时做了什么吧:

def init_host(self):
    """Initialization for a standalone compute service."""
    ......
    try:
        # checking that instance was not already evacuated to other host
        self._destroy_evacuated_instances(context)
        for instance in instances:
            self._init_instance(context, instance)
    finally:
        if CONF.defer_iptables_apply:
            self.driver.filter_defer_apply_off()
        self._update_scheduler_instance_info(context, instances)


def _destroy_evacuated_instances(self, context):
    """Destroys evacuated instances.

    While nova-compute was down, the instances running on it could be
    evacuated to another host. Check that the instances reported
    by the driver are still associated with this host.  If they are
    not, destroy them, with the exception of instances which are in
    the MIGRATING, RESIZE_MIGRATING, RESIZE_MIGRATED, RESIZE_FINISH
    task state or RESIZED vm state.
    """
    filters = {
        'source_compute': self.host,
        'status': ['accepted', 'done'],
        'migration_type': 'evacuation',
    }
    evacuations = objects.MigrationList.get_by_filters(context, filters)
    if not evacuations:
        return
    evacuations = {mig.instance_uuid: mig for mig in evacuations}

    filters = {'deleted': False}
    local_instances = self._get_instances_on_driver(context, filters)
    evacuated = [inst for inst in local_instances
                    if inst.uuid in evacuations]
    for instance in evacuated:
        migration = evacuations[instance.uuid]
        LOG.info(_LI('Deleting instance as it has been evacuated from '
                        'this host'), instance=instance)
        try:
            network_info = self.network_api.get_instance_nw_info(
                context, instance)
            bdi = self._get_instance_block_device_info(context,
                                                        instance)
            destroy_disks = not (self._is_instance_storage_shared(
                context, instance))
        except exception.InstanceNotFound:
            network_info = network_model.NetworkInfo()
            bdi = {}
            LOG.info(_LI('Instance has been marked deleted already, '
                            'removing it from the hypervisor.'),
                        instance=instance)
            # always destroy disks if the instance was deleted
            destroy_disks = True
        self.driver.destroy(context, instance,
                            network_info,
                            bdi, destroy_disks)
        migration.status = 'completed'
        migration.save()

此时有一个大胆的想法:由于这台计算节点有问题,同事进行了疏散操作,然后重启这台计算节点。当 nova-compute 服务启动时会调用 host_init 函数,而这个函数则会将状态为accepted,done的机器清理掉。正常情况下这个流程没问题,但如果在某些特殊情况下 疏散失败且重启节点后触发了 InstanceNotFound 异常 ,这台机器的根磁盘就会被删除了。

跟进 try 函数,由于使用了装饰器, get_instance_nw_info 不管什么异常都会被包装成 InstanceNotFound ,但跟进后发现这个函数就是在查询数据库。猜测当时由于机器有问题触发访问数据库超时一类的异常,但由于当时时间段内的日志都没有写成功(包括系统日志),十分遗憾无法还原当时到底发生了什么。

虽然测试环境可以按照这个思路重现这种情况,

我来评几句
登录后评论

已发表评论数()

相关站点

+订阅
热门文章