Diabetic Retinopathy Detection from Retinal Images Using the Pyramid Vision Transformer Method

Document Type : Research Article

Authors

1 Master of Science, Faculty of Electrical Engineering, Sahand University of Technology, Tabriz, Iran

2 Computer Vision Res. Lab., Faculty of Electrical Engineering, Sahand University of Technology, Tabriz, Iran

Abstract

The development of automated diagnostic tools is essential for efficiently analyzing medical data, especially for conditions like diabetic retinopathy, a leading cause of vision impairment and blindness in adults. The APTOS 2019 blindness detection dataset, with its comprehensive retinal images, is critical for developing these tools. This study leverages the Pyramid Vision Transformer (PVT) to enhance accuracy and efficiency in detecting diabetic retinopathy. Unlike the Vision Transformer (ViT), which incurs high computational costs and yields low-resolution outputs due to its single-scale structure, PVT’s pyramid architecture enables efficient multi-scale feature representation. This allows for effective management of large feature maps and improved resolution, both essential for precise image-based diagnoses. By implementing PVT, our approach demonstrates improved accuracy and resource efficiency, outperforming traditional CNN methods. Extensive experiments demonstrate that PVT significantly improves detection and classification accuracy, making it a valuable tool for clinical applications. The model achieved 92.38% accuracy and an AUC of 99.58%. Future research will focus on optimizing the model and exploring clinical integration.

Keywords

Main Subjects