An Adaptive Learning Method for Solving the Extreme Learning Rate Problem of Transformer
Lecture Notes in Computer Science (2023) - Comments
doi: 10.1007/978-3-031-44693-1_29  issn: 0302-9743  issn: 1611-3349 

Jianbang Ding, Xuancheng Ren, Ruixuan Luo