Attention / Conv 大锅烩
Self-Attentions and Convolutions
自然梯度下降
Natural Gradient Decent
Fisher 信息矩阵
Fisher Information Matrix