Bi-VAEGAN

  • Paper: Bi-directional Distribution Alignment for Transductive Zero-Shot Learning

  • Authors: Zhicai Wang, Yanbin Hao, Tingting Mu, Ouxiang Li, Shuo Wang, Xiangnan He

  • Code: GitHub

  • Framework: Bi-VAEGAN

Zero-shot Learning (ZSL)

Hugo Larochelle, Dumitru Erhan, and Yoshua Bengio. Zerodata learning of new tasks. In AAAI, volume 1, page 3, 2008. 1

  • 目标

    解决训练时缺少例子或标签的问题

  • Conventional ZSL / Inductive ZSL

    • 核心挑战

      在存在Class Relevance的条件下,使得分类器能从 Seen Classes 提取信息迁移到 Unseen Classes 当中

    • Class Relevance 通常作为 Auxiliary Data 提供

    • Auxiliary Data 可以为人工标注、文字描述、知识图谱或 Formal Description of Knowledge(如嵌入向量)

    • Domain Shift Problem

      仅从 Auxiliary Data 学习容易导致 Unseen Classes 的真实分布与其建模分布之间存在差异

  • Proposed: Transductive ZSL (TZSL)

    • 允许在训练中额外加入为目标类别收集的无标签示例

Generative Models

作用

  • Synthesize Examples 合成样本
  • Learn the Unseen Data Distribution 学习 unseen 数据分布

分类

  • Unconditional Generation
  • Conditional Generation Auxiliary 信息是信息量更丰富的类标签,通过 Auxiliary 信息作为 Condition,可以学习到 Data-Auxiliary 联合分布,这连接了 Visual 空间和 Auxiliary 空间,使得生成器具有信息迁移的能力

难点

将 seen classes 所学迁移到 unseen classes

f-VAEGAN

提出方法

  1. Transductive Regressor
  2. Normalization
  3. Class Prior Estimation (CPE)

架构

  1. VAE 编码器,得到维隐藏表征向量
  2. 条件生成器,以类别属性为条件,从正态分布采样维向量用于视觉特征生成
  3. Wasserstein GAN(WGAN)的判别器,用于 seen classes
  4. WGAN 的判别器,用于 unseen classes
  5. 映射视觉空间到特征空间的 Regressor
  6. WGAN 的判别器,用于特征判别

Workflow

Alt text
  1. Level-1

    对抗性训练

  2. Level-2

    对抗性训练

Pytorch Source Code

init

处理逻辑

  1. 判断当前运行环境,加载必须库文件
  2. Define basFic utilities 定义基本工具 typename; is_tensor; ...
  3. Define numeric constants 定义数值常量 e; inf; nan; pi
  4. Define Storage and Tensor classes 定义 Storage 和 Tensor 类

ctypes 库

  • 一个可以在 python 中调用由 C、C++编写并导出的 dll 动态链接库的包
  • ctypes.CDLL('vcruntime140.dll') 加载使用 C、C++编写的vcruntime140.dll文件

.pyi 文件

  • python 中的类型提示文件,也被叫做存根文件 stub file
  • 用于提供代码的静态类型信息,也可以用来表示公共的接口
  • .pyi 文件给出变量或函数的静态类型,实现了 python 和 C、C++的绑定

参考

[1] Pytorch 底层源码解读(一)概览

ANCL

Auxiliary Network Continual Learning (ANCL)

  • Paper: Achieving a Better Stability-Plasticity Trade-off via Auxiliary Networks in Continual Learning

  • Authors: Sanghwan Kim ; Lorenzo Noci ; Antonio Orvieto ; Thomas Hofmann

  • Code: https://github.com/kim-sanghwan/ANCL

  • Framework:

Continual Learning (CL) 持续学习

  • 符号定义

    • PT:Previous Task
    • CT:Current Task
  • 含义

    保留 PT 信息的同时,继续在 CT 中进行学习

  • 难点:Catastrophic Forgetting 灾难性的遗忘

    对于梯度更新学习的模型,在学习 CT 的过程中更倾向于覆盖 PT 学习的梯度

    换而言之,Stability-Plasticity Dilemma

    Martial Mermillod, Aur ́ elia Bugaiska, and Patrick Bonin. The stability-plasticity dilemma: Investigating the continuum from catastrophic forgetting to age-limited learning effects, 2013. 1

    • Stability: 在 PT 具有较好的泛化能力
    • Plasticity: 在 CT 学习新概念

    所以,如何平衡 Stability 和 Plasticity是研究的重点

  • 任务分类

    类别增量学习(Class-Incremental Learning)的前世今生、开源工具包

    • Task Incremental Learning (TIL):训练和测试阶段均为模型提供当前任务标识
    • Domain Incremental Learning (DIL):测试阶段不提供当前任务标识
    • Class Incremental Learning (CIL):测试阶段自动识别当前任务标识和分类

    学习难度逐渐增加,ANCL 在 TIL 和 CIL 设置中进行了评估

相关工作

增加 Auxiliary Network 或 Extra Module

Active Forgetting with synaptic Expansion-Convergence (AFEC) 超参控制新旧参数的融合

当前工作

框架化使用 Auxiliary Network 的 CL,使得 Auxiliary Network 插件化

通过调整正则化项

局限

  • 不同方法依赖于不同的超参

参考

[1] 类别增量学习(Class-Incremental Learning)的前世今生、开源工具包

Code of Pixel-to-Prototype Constrast

Generate CAMs

  • Feature map
  • Class feature map
  • Score of class
  • CAMs

Pixel-to-Prototype Contrast

  • Pseudo mask
  • Pixel-wise projected feature
  • Pixel-to-prototype contrast
    • Prototype set
    • Temperature
    • Contrast 像素特征与原型的相似度

Prototype Estimation in Batch

  • Top K pixels of class c
    • CAM as confidences
    • Estimate prototypes from pixel-wise feature embeddings that are with the top K confidences
  • Prototype

Loss

  • Cross Prototype Contrast

  • Cross CAM Contrast

  • Intra-view Contrast

    • Strategy to slove the matter of in accurate pseudo label [50]
      • Semi-hard prototype mining
      • Hard pixel sampling

Code

归一化

归一化

  • 作用
    • 保证所有元素之和为 1
    • 将向量转换为概率分布

归一化

1
2
3
4
# 按通道执行L2归一化
v = v / (torch.norm(v, dim=1, keepdim=True) + 1e-5)
# or
v = torch.nn.functional.normalize(v, dim=1)
  • 作用
    • 方向不变性:向量的方向不变,长度变为 1,使得向量表示不再依赖于其大小
    • 数值稳定性:将向量的大小规范在一个相对较小的区间
    • 减小特征尺度的差异
    • 便于执行相似性度量

Max 归一化

  • 归一化后向量的最大值为 1

Max-Min 归一化

  • 归一化后向量值范围为[0, 1]

Forward

  • cam

    1
    2
    3
    4
    # fea是最后一层输出的特征图
    self.fc8 = nn.Conv2d(4096, 21, 1, bias=False)
    cam = self.fc8(fea)
    cam = torch.nn.functional.interpolate(cam, (H, W), mode='bilinear', align_corners=True)
  • cam_rv_down

    • 清洗 CAM

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      with torch.no_grad():
      cam_d = torch.nn.functional.relu(cam.detach())
      # max norm
      cam_d_max = torch.max(cam_d.view(n, c, -1), dim=-1)[0].view(n, c, 1, 1)+1e-5
      cam_d_norm = torch.nn.functional.relu(cam_d - 1e-5) / cam_d_max
      # 计算保留概率值最大分类,反相为背景概率,其余分类置0
      cam_d_norm[:, 0, :, :] = 1 - torch.max(cam_d_norm[:, 1:, :, :], dim=1)[0]
      cam_max = torch.max(cam_d_norm[:,1:,:,:], dim=1, keepdim=True)[0]
      cam_d_norm[:,1:,:,:][cam_d_norm[:,1:,:,:] < cam_max] = 0

    • 增强 CAM

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      # 根据像素相似度调整CAM
      cam_rv_down = self.PCM(cam_d_norm, f)

      # PCM
      def PCM(self, cam, f):
      n,c,h,w = f.size()
      cam = torch.nn.functional.interpolate(cam, (h,w), mode='bilinear', align_corners=True).view(n,-1,h*w)
      # 多尺度特征融合
      f = self.f9(f)
      f = f.view(n, -1, h*w)
      # 特征按通道L2归一化
      f = f / (torch.norm(f, dim=1, keepdim=True) + 1e-5)
      # 计算像素相似度矩阵
      aff = torch.nn.functional.relu(torch.matmul(f.transpose(1, 2), f), inplace=True)
      # 相似度矩阵L1归一化
      aff = aff/(torch.sum(aff, dim=1, keepdim=True) + 1e-5)
      # CAM加权
      cam_rv = torch.matmul(cam, aff).view(n, -1, h, w)

      return cam_rv
  • cam_rv

    1
    cam_rv = torch.nn.functional.interpolate(cam_rv_down, (H,W), mode='bilinear', align_corners=True)
  • f_proj

    1
    2
    self.fc_proj = torch.nn.Conv2d(4096, 128, 1, bias=False)
    f_proj = torch.nn.functional.relu(self.fc_proj(fea), inplace=True)
  • prototype

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    f_proj1 = torch.nn.functional.interpolate(f_proj1, size=(128 // 8, 128 // 8), mode='bilinear', align_corners=True)
    cam_rv1_down = torch.nn.functional.interpolate(cam_rv1_down, size=(128 // 8, 128 // 8), mode='bilinear', align_corners=True)
    cam_rv2_down = cam_rv2_down

    with torch.no_grad():
    fea1 = f_proj1.detach()
    c_fea1 = fea1.shape[1]
    cam_rv1_down = torch.nn.functional.relu(cam_rv1_down.detach())
    # CAM Max-min归一化
    n1, c1, h1, w1 = cam_rv1_down.shape
    max1 = torch.max(cam_rv1_down.view(n1, c1, -1), dim=-1)[0].view(n1, c1, 1, 1)
    min1 = torch.min(cam_rv1_down.view(n1, c1, -1), dim=-1)[0].view(n1, c1, 1, 1)
    cam_rv1_down[cam_rv1_down < min1 + 1e-5] = 0.
    norm_cam1 = (cam_rv1_down - min1 - 1e-5) / (max1 - min1 + 1e-5)
    cam_rv1_down = norm_cam1
    # 设置背景阈值
    cam_rv1_down[:, 0, :, :] = args.bg_threshold
    # 根据图像级标签保留相应的类别
    scores1 = torch.nn.functional.softmax(cam_rv1_down * label, dim=1)

    # 计算伪标签
    pseudo_label1 = scores1.argmax(dim=1, keepdim=True)
    n_sc1, c_sc1, h_sc1, w_sc1 = scores1.shape
    scores1 = scores1.transpose(0, 1)
    fea1 = fea1.permute(0, 2, 3, 1).reshape(-1, c_fea1)

    # 获取各个分类CAM值最高的值与索引
    top_values, top_indices = torch.topk(cam_rv1_down.transpose(0, 1).reshape(c_sc1, -1), k=h_sc1 * w_sc1 // 8, dim=-1)
    prototypes1 = torch.zeros(c_sc1, c_fea1).cuda() # [21, 128]
    # 遍历各个分类
    for i in range(c_sc1):
    # 获取k个像素对应的特征
    top_fea = fea1[top_indices[i]]
    # CAM值加权平均k个特征得到分类原型
    prototypes1[i] = torch.sum(top_values[i].unsqueeze(-1) * top_fea, dim=0) / torch.sum(top_values[i])
    # 各个原型L2归一化
    prototypes1 = torch.nn.functional.normalize(prototypes1, dim=-1)
  • prototype similarity

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    n_f, c_f, h_f, w_f = f_proj1.shape
    # [N, H, W, C] -> [N x H x W, C]
    f_proj1 = f_proj1.permute(0, 2, 3, 1).reshape(n_f * h_f * w_f, c_f)
    # 特征L2归一化
    f_proj1 = torch.nn.functional.normalize(f_proj1, dim=-1)
    pseudo_label1 = pseudo_label1.reshape(-1)
    positives1 = prototypes2[pseudo_label1]
    negitives1 = prototypes2

    # for target
    n_f, c_f, h_f, w_f = f_proj2.shape
    f_proj2 = f_proj2.permute(0, 2, 3, 1).reshape(n_f * h_f * w_f, c_f)
    f_proj2 = torch.nn.functional.normalize(f_proj2, dim=-1)
    pseudo_label2 = pseudo_label2.reshape(-1)
    positives2 = prototypes1[pseudo_label2]
    negitives2 = prototypes1
    A1 = torch.exp(torch.sum(f_proj1 * positives1, dim=-1) / 0.1)
    A2 = torch.sum(torch.exp(torch.matmul(f_proj1, negitives1.transpose(0, 1)) / 0.1), dim=-1)
    loss_nce1 = torch.mean(-1 * torch.log(A1 / A2))

    A3 = torch.exp(torch.sum(f_proj2 * positives2, dim=-1) / 0.1)
    A4 = torch.sum(torch.exp(torch.matmul(f_proj2, negitives2.transpose(0, 1)) / 0.1), dim=-1)
    loss_nce2 = torch.mean(-1 * torch.log(A3 / A4))

    loss_cross_nce = 0.1 * (loss_nce1 + loss_nce2) / 2

p