Fine-Grained Classification via Class-Attentive Contrastive Representation Learning
Main Article Content
Abstract
Fine-grained classification faces the fundamental challenge of learning representations that can discern subtle differences among closely related classes while maintaining intra- class compactness. We propose a novel class-attentive contrastive learning framework that integrates a class-attentive embedding module with a modified contrastive loss to emphasize class-discriminative features. Under mild assumptions, we derive a theoretical bound linking the proposed loss minimization to the reduction of classification error, providing theoretical guarantees on representation quality. Experiments on a toy dataset demonstrate improved classification accuracy and superior embedding separation compared to baseline contrastive learning methods.
Article Details

This work is licensed under a Creative Commons Attribution 4.0 International License.
This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). To view a copy of this license, visit https://creativecommons.org/licenses/by/4.0/
References
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple frame- work for contrastive learning of visual representations. In International conference on ma- chine learning, pages 1597–1607. PmLR, 2020.
Gong Cheng, Qingyang Li, Guangxing Wang, Xingxing Xie, Lingtong Min, and Junwei Han. Sfrnet: Fine-grained oriented object recognition via separate feature refinement. IEEE Transactions on Geoscience and Remote Sensing, 61:1–10, 2023.
Sumit Chopra, Raia Hadsell, and Yann LeCun. Learning a similarity metric discriminatively, with application to face verification. In 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), volume 1, pages 539–546. IEEE, 2005.
Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7132–7141, 2018.
Saumya Jetley, Nicholas A Lord, Namhoon Lee, and Philip HS Torr. Learn to pay attention. arXiv preprint arXiv:1804.02391, 2018.
Jonathan Krause, Hailin Jin, Jianchao Yang, and Li Fei-Fei. Fine-grained recognition with- out part annotations. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5546–5555, 2015.
Tsung-Yu Lin, Aruni RoyChowdhury, and Subhransu Maji. Bilinear cnn models for fine- grained visual recognition. In Proceedings of the IEEE international conference on computer vision, pages 1449–1457, 2015.
Nikunj Saunshi, Orestis Plevrakis, Sanjeev Arora, Mikhail Khodak, and Hrishikesh Khan- deparkar. A theoretical analysis of contrastive unsupervised representation learning. In International conference on machine learning, pages 5628–5637. PMLR, 2019.
Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie. The caltech-ucsd birds-200-2011 dataset. 2011.
Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. Non-local neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7794–7803, 2018.
Yandong Wen, Kaipeng Zhang, Zhifeng Li, and Yu Qiao. A discriminative feature learning approach for deep face recognition. In European conference on computer vision, pages 499– 515. Springer, 2016.