Posted 2024-06-26Updated 2024-06-26paper

Paper: BiFormer: Vision Transformer with Bi-Level Routing Attention
Authors: Lei Zhu, Xinjiang Wang, Zhanghan Ke, Wayne Zhang, Rynson Lau
Code: GitHub
Framework:

Transformer

优势

long-range dependency
inductive-bias-free
high parallelism

劣势

计算量大
内存占用大
现有方案：引入稀疏性
- 局部窗口
- 轴向注意力
- 空洞注意力
存在问题
- 筛选 key/value 时没有区分 query

Bi-level Routing Attention (BRA)

Sparsity

利用稀疏性来节省计算量和内存，同时只包含 GPU 友好的稠密矩阵乘法

Query-aware

为各个 Query 筛选语义最相关的 Key-Value 对

伪代码

# input: features (H, W, C). Assume H==W.
# output: features (H, W, C).
# S: square root of number of regions.
# k: number of regions to attend.

# patchify input (H, W, C) -> (Sˆ2, HW/Sˆ2, C)
x = patchify(input, patch_size=H//S)

# linear projection of query, key, value
query, key, value = linear_qkv(x).chunk(3, dim=-1)

# regional query and key (Sˆ2, C)
query_r, key_r = query.mean(dim=1), key.mean(dim=1)

# adjacency matrix for regional graph (Sˆ2, Sˆ2)
A_r = mm(query_r, key_r.transpose(-1, -2))
# compute index matrix of routed regions (Sˆ2, K)
I_r = topk(A_r, k).index
# gather key-value pairs
key_g = gather(key, I_r)
# (Sˆ2, kHW/Sˆ2, C)
value_g = gather(value, I_r)
# (Sˆ2, kHW/Sˆ2, C)
# token-to-token attention
A = bmm(query, key_g.transpose(-2, -1))
A = softmax(A, dim=-1)
output = bmm(A, value_g) + dwconv(value)
# recover to (H, W, C) shape
output = unpatchify(output, patch_size=H//S)

Posted 2024-06-21Updated 2024-06-21paper

Bi-VAEGAN

Paper: Bi-directional Distribution Alignment for Transductive Zero-Shot Learning
Authors: Zhicai Wang, Yanbin Hao, Tingting Mu, Ouxiang Li, Shuo Wang, Xiangnan He
Code: GitHub
Framework:

Zero-shot Learning (ZSL)

Hugo Larochelle, Dumitru Erhan, and Yoshua Bengio. Zerodata learning of new tasks. In AAAI, volume 1, page 3, 2008. 1

目标

解决训练时缺少例子或标签的问题
Conventional ZSL / Inductive ZSL
- 核心挑战
  
  在存在Class Relevance的条件下，使得分类器能从 Seen Classes 提取信息迁移到 Unseen Classes 当中
- Class Relevance 通常作为 Auxiliary Data 提供
- Auxiliary Data 可以为人工标注、文字描述、知识图谱或 Formal Description of Knowledge（如嵌入向量）
- Domain Shift Problem
  
  仅从 Auxiliary Data 学习容易导致 Unseen Classes 的真实分布与其建模分布之间存在差异
Proposed: Transductive ZSL (TZSL)
- 允许在训练中额外加入为目标类别收集的无标签示例

Generative Models

作用

Synthesize Examples 合成样本
Learn the Unseen Data Distribution 学习 unseen 数据分布

难点

将 seen classes 所学迁移到 unseen classes

f-VAEGAN

提出方法

Transductive Regressor
Normalization
Class Prior Estimation (CPE)

架构

VAE 编码器，得到维隐藏表征向量
条件生成器，以类别属性为条件，从正态分布采样维向量用于视觉特征生成
Wasserstein GAN（WGAN）的判别器，用于 seen classes
WGAN 的判别器，用于 unseen classes
映射视觉空间到特征空间的 Regressor
WGAN 的判别器，用于特征判别

Workflow

Level-1

和对抗性训练
Level-2

和、对抗性训练

Posted 2024-06-20Updated 2024-06-20paper

ANCL

Auxiliary Network Continual Learning (ANCL)

Paper: Achieving a Better Stability-Plasticity Trade-off via Auxiliary Networks in Continual Learning
Authors: Sanghwan Kim ; Lorenzo Noci ; Antonio Orvieto ; Thomas Hofmann
Code: https://github.com/kim-sanghwan/ANCL
Framework:

Continual Learning (CL) 持续学习

符号定义
- PT：Previous Task
- CT：Current Task
含义

保留 PT 信息的同时，继续在 CT 中进行学习
难点：Catastrophic Forgetting 灾难性的遗忘

对于梯度更新学习的模型，在学习 CT 的过程中更倾向于覆盖 PT 学习的梯度

换而言之，Stability-Plasticity Dilemma

Martial Mermillod, Aur ́ elia Bugaiska, and Patrick Bonin. The stability-plasticity dilemma: Investigating the continuum from catastrophic forgetting to age-limited learning effects, 2013. 1
- Stability: 在 PT 具有较好的泛化能力
- Plasticity: 在 CT 学习新概念
所以，如何平衡 Stability 和 Plasticity是研究的重点
任务分类

类别增量学习(Class-Incremental Learning)的前世今生、开源工具包
- Task Incremental Learning (TIL)：训练和测试阶段均为模型提供当前任务标识
- Domain Incremental Learning (DIL)：测试阶段不提供当前任务标识
- Class Incremental Learning (CIL)：测试阶段自动识别当前任务标识和分类
学习难度逐渐增加，ANCL 在 TIL 和 CIL 设置中进行了评估

当前工作

框架化使用 Auxiliary Network 的 CL，使得 Auxiliary Network 插件化

通过和调整正则化项

局限

不同方法依赖于不同的超参

参考

[1] 类别增量学习(Class-Incremental Learning)的前世今生、开源工具包

Transformer

优势

劣势

Bi-level Routing Attention (BRA)

Sparsity

Query-aware

伪代码

Zero-shot Learning (ZSL)

Generative Models

作用

分类

难点

提出方法

架构

Workflow

Auxiliary Network Continual Learning (ANCL)

Continual Learning (CL) 持续学习

相关工作

增加 Auxiliary Network 或 Extra Module

当前工作

局限

参考

Categories

Links

Tags

Recents