Posted 2024-06-24Updated 2024-06-24

Zero-shot Learning (ZSL)

Hugo Larochelle, Dumitru Erhan, and Yoshua Bengio. Zerodata learning of new tasks. In AAAI, volume 1, page 3, 2008. 1

目标

解决训练时缺少例子或标签的问题
Conventional ZSL / Inductive ZSL
- 核心挑战
  
  在存在Class Relevance的条件下，使得分类器能从 Seen Classes 提取信息迁移到 Unseen Classes 当中
- Class Relevance 通常作为 Auxiliary Data 提供
- Auxiliary Data 可以为人工标注、文字描述、知识图谱或 Formal Description of Knowledge（如嵌入向量）
- Domain Shift Problem
  
  仅从 Auxiliary Data 学习容易导致 Unseen Classes 的真实分布与其建模分布之间存在差异
Proposed: Transductive ZSL (TZSL)
- 允许在训练中额外加入为目标类别收集的无标签示例

Generative Models

作用

Synthesize Examples 合成样本
Learn the Unseen Data Distribution 学习 unseen 数据分布

难点

将 seen classes 所学迁移到 unseen classes

f-VAEGAN

提出方法

Transductive Regressor
Normalization
Class Prior Estimation (CPE)

架构

VAE 编码器，得到维隐藏表征向量
条件生成器，以类别属性为条件，从正态分布采样维向量用于视觉特征生成
Wasserstein GAN（WGAN）的判别器，用于 seen classes
WGAN 的判别器，用于 unseen classes
映射视觉空间到特征空间的 Regressor
WGAN 的判别器，用于特征判别

Workflow

Level-1

和对抗性训练
Level-2

和、对抗性训练

Posted 2024-06-20Updated 2024-06-20Machine Learning / Framework

Pytorch Source Code

init

处理逻辑

判断当前运行环境，加载必须库文件
Define basFic utilities 定义基本工具 typename; is_tensor; ...
Define numeric constants 定义数值常量 e; inf; nan; pi
Define Storage and Tensor classes 定义 Storage 和 Tensor 类

ctypes 库

一个可以在 python 中调用由 C、C++编写并导出的 dll 动态链接库的包
ctypes.CDLL('vcruntime140.dll') 加载使用 C、C++编写的vcruntime140.dll文件

.pyi 文件

python 中的类型提示文件，也被叫做存根文件 stub file
用于提供代码的静态类型信息，也可以用来表示公共的接口
.pyi 文件给出变量或函数的静态类型，实现了 python 和 C、C++的绑定

参考

[1] Pytorch 底层源码解读（一）概览

Posted 2024-06-20Updated 2024-06-20paper

ANCL

Auxiliary Network Continual Learning (ANCL)

Paper: Achieving a Better Stability-Plasticity Trade-off via Auxiliary Networks in Continual Learning
Authors: Sanghwan Kim ; Lorenzo Noci ; Antonio Orvieto ; Thomas Hofmann
Code: https://github.com/kim-sanghwan/ANCL
Framework:

Continual Learning (CL) 持续学习

符号定义
- PT：Previous Task
- CT：Current Task
含义

保留 PT 信息的同时，继续在 CT 中进行学习
难点：Catastrophic Forgetting 灾难性的遗忘

对于梯度更新学习的模型，在学习 CT 的过程中更倾向于覆盖 PT 学习的梯度

换而言之，Stability-Plasticity Dilemma

Martial Mermillod, Aur ́ elia Bugaiska, and Patrick Bonin. The stability-plasticity dilemma: Investigating the continuum from catastrophic forgetting to age-limited learning effects, 2013. 1
- Stability: 在 PT 具有较好的泛化能力
- Plasticity: 在 CT 学习新概念
所以，如何平衡 Stability 和 Plasticity是研究的重点
任务分类

类别增量学习(Class-Incremental Learning)的前世今生、开源工具包
- Task Incremental Learning (TIL)：训练和测试阶段均为模型提供当前任务标识
- Domain Incremental Learning (DIL)：测试阶段不提供当前任务标识
- Class Incremental Learning (CIL)：测试阶段自动识别当前任务标识和分类
学习难度逐渐增加，ANCL 在 TIL 和 CIL 设置中进行了评估

当前工作

框架化使用 Auxiliary Network 的 CL，使得 Auxiliary Network 插件化

通过和调整正则化项

局限

不同方法依赖于不同的超参

参考

[1] 类别增量学习(Class-Incremental Learning)的前世今生、开源工具包

Posted 2023-11-14Updated 2023-11-14paper

Code of Pixel-to-Prototype Constrast

Generate CAMs

Feature map
Class feature map
Score of class
CAMs

Pixel-to-Prototype Contrast

Pseudo mask
Pixel-wise projected feature
Pixel-to-prototype contrast
- Prototype set
- Temperature
- Contrast 像素特征与原型的相似度

Prototype Estimation in Batch

Top K pixels of class c
- CAM as confidences
- Estimate prototypes from pixel-wise feature embeddings that are with the top K confidences
Prototype

Loss

Cross Prototype Contrast
Cross CAM Contrast
Intra-view Contrast
- Strategy to slove the matter of in accurate pseudo label [50]
  - Semi-hard prototype mining
  - Hard pixel sampling

Code

归一化

作用
- 保证所有元素之和为 1
- 将向量转换为概率分布

归一化

# 按通道执行L2归一化
v = v / (torch.norm(v, dim=1, keepdim=True) + 1e-5)
# or
v = torch.nn.functional.normalize(v, dim=1)

作用
- 方向不变性：向量的方向不变，长度变为 1，使得向量表示不再依赖于其大小
- 数值稳定性：将向量的大小规范在一个相对较小的区间
- 减小特征尺度的差异
- 便于执行相似性度量

Max 归一化

归一化后向量的最大值为 1

Max-Min 归一化

归一化后向量值范围为[0, 1]

Forward

cam

# fea是最后一层输出的特征图
self.fc8 = nn.Conv2d(4096, 21, 1, bias=False)
cam = self.fc8(fea)
cam = torch.nn.functional.interpolate(cam, (H, W), mode='bilinear', align_corners=True)

cam_rv_down

清洗 CAM

with torch.no_grad():
    cam_d = torch.nn.functional.relu(cam.detach())
    # max norm
    cam_d_max = torch.max(cam_d.view(n, c, -1), dim=-1)[0].view(n, c, 1, 1)+1e-5
    cam_d_norm = torch.nn.functional.relu(cam_d - 1e-5) / cam_d_max
    # 计算保留概率值最大分类，反相为背景概率，其余分类置0
    cam_d_norm[:, 0, :, :] = 1 - torch.max(cam_d_norm[:, 1:, :, :], dim=1)[0]
    cam_max = torch.max(cam_d_norm[:,1:,:,:], dim=1, keepdim=True)[0]
    cam_d_norm[:,1:,:,:][cam_d_norm[:,1:,:,:] < cam_max] = 0

增强 CAM

# 根据像素相似度调整CAM
cam_rv_down = self.PCM(cam_d_norm, f)

# PCM
def PCM(self, cam, f):
    n,c,h,w = f.size()
    cam = torch.nn.functional.interpolate(cam, (h,w), mode='bilinear', align_corners=True).view(n,-1,h*w)
    # 多尺度特征融合
    f = self.f9(f)
    f = f.view(n, -1, h*w)
    # 特征按通道L2归一化
    f = f / (torch.norm(f, dim=1, keepdim=True) + 1e-5)
    # 计算像素相似度矩阵
    aff = torch.nn.functional.relu(torch.matmul(f.transpose(1, 2), f), inplace=True)
    # 相似度矩阵L1归一化
    aff = aff/(torch.sum(aff, dim=1, keepdim=True) + 1e-5)
    # CAM加权
    cam_rv = torch.matmul(cam, aff).view(n, -1, h, w)

    return cam_rv

cam_rv

1	cam_rv = torch.nn.functional.interpolate(cam_rv_down, (H,W), mode='bilinear', align_corners=True)

f_proj

1 2	self.fc_proj = torch.nn.Conv2d(4096, 128, 1, bias=False) f_proj = torch.nn.functional.relu(self.fc_proj(fea), inplace=True)

prototype

f_proj1 = torch.nn.functional.interpolate(f_proj1, size=(128 // 8, 128 // 8), mode='bilinear', align_corners=True)
cam_rv1_down = torch.nn.functional.interpolate(cam_rv1_down, size=(128 // 8, 128 // 8), mode='bilinear', align_corners=True)
cam_rv2_down = cam_rv2_down

with torch.no_grad():
    fea1 = f_proj1.detach()
    c_fea1 = fea1.shape[1]
    cam_rv1_down = torch.nn.functional.relu(cam_rv1_down.detach())
    # CAM Max-min归一化
    n1, c1, h1, w1 = cam_rv1_down.shape
    max1 = torch.max(cam_rv1_down.view(n1, c1, -1), dim=-1)[0].view(n1, c1, 1, 1)
    min1 = torch.min(cam_rv1_down.view(n1, c1, -1), dim=-1)[0].view(n1, c1, 1, 1)
    cam_rv1_down[cam_rv1_down < min1 + 1e-5] = 0.
    norm_cam1 = (cam_rv1_down - min1 - 1e-5) / (max1 - min1 + 1e-5)
    cam_rv1_down = norm_cam1
    # 设置背景阈值
    cam_rv1_down[:, 0, :, :] = args.bg_threshold
    # 根据图像级标签保留相应的类别
    scores1 = torch.nn.functional.softmax(cam_rv1_down * label, dim=1)

    # 计算伪标签
    pseudo_label1 = scores1.argmax(dim=1, keepdim=True)
    n_sc1, c_sc1, h_sc1, w_sc1 = scores1.shape
    scores1 = scores1.transpose(0, 1)
    fea1 = fea1.permute(0, 2, 3, 1).reshape(-1, c_fea1)

    # 获取各个分类CAM值最高的值与索引
    top_values, top_indices = torch.topk(cam_rv1_down.transpose(0, 1).reshape(c_sc1, -1), k=h_sc1 * w_sc1 // 8, dim=-1)
    prototypes1 = torch.zeros(c_sc1, c_fea1).cuda()  # [21, 128]
    # 遍历各个分类
    for i in range(c_sc1):
        # 获取k个像素对应的特征
        top_fea = fea1[top_indices[i]]
        # CAM值加权平均k个特征得到分类原型
        prototypes1[i] = torch.sum(top_values[i].unsqueeze(-1) * top_fea, dim=0) / torch.sum(top_values[i])
    # 各个原型L2归一化
    prototypes1 = torch.nn.functional.normalize(prototypes1, dim=-1)

prototype similarity

n_f, c_f, h_f, w_f = f_proj1.shape
# [N, H, W, C] -> [N x H x W, C]
f_proj1 = f_proj1.permute(0, 2, 3, 1).reshape(n_f * h_f * w_f, c_f)
# 特征L2归一化
f_proj1 = torch.nn.functional.normalize(f_proj1, dim=-1)
pseudo_label1 = pseudo_label1.reshape(-1)
positives1 = prototypes2[pseudo_label1]
negitives1 = prototypes2

# for target
n_f, c_f, h_f, w_f = f_proj2.shape
f_proj2 = f_proj2.permute(0, 2, 3, 1).reshape(n_f * h_f * w_f, c_f)
f_proj2 = torch.nn.functional.normalize(f_proj2, dim=-1)
pseudo_label2 = pseudo_label2.reshape(-1)
positives2 = prototypes1[pseudo_label2]
negitives2 = prototypes1
A1 = torch.exp(torch.sum(f_proj1 * positives1, dim=-1) / 0.1)
A2 = torch.sum(torch.exp(torch.matmul(f_proj1, negitives1.transpose(0, 1)) / 0.1), dim=-1)
loss_nce1 = torch.mean(-1 * torch.log(A1 / A2))

A3 = torch.exp(torch.sum(f_proj2 * positives2, dim=-1) / 0.1)
A4 = torch.sum(torch.exp(torch.matmul(f_proj2, negitives2.transpose(0, 1)) / 0.1), dim=-1)
loss_nce2 = torch.mean(-1 * torch.log(A3 / A4))

loss_cross_nce = 0.1 * (loss_nce1 + loss_nce2) / 2

相关题目