tranformer

[논문 리뷰] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (ViT)

2025.06.16

https://arxiv.org/abs/2010.11929 An Image is Worth 16x16 Words: Transformers for Image Recognition at ScaleWhile the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in conjunction with convolutional networks, or used to reparxiv.orgIntroductionSelf-attention..

[논문 리뷰] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (ViT)

티스토리툴바