Leveraging Swin Transformer for Local-to-Global Weakly Supervised Semantic Segmentation

Ahmadi, Rozhan; Kasaei, Shohreh

doi:10.22060/eej.2024.23490.5616

Leveraging Swin Transformer for Local-to-Global Weakly Supervised Semantic Segmentation

Articles in Press

Document Type : Research Article

Authors

Rozhan Ahmadi ¹
Shohreh Kasaei ²

¹ Masters of Computer Engineering, Department of Computer Engineering, Sharif University of Technology, Tehran, Iran

² Professor of Artificial Intelligence, Department of Computer Engineering, Sharif University of Technology, Tehran, Iran

10.22060/eej.2024.23490.5616

Abstract

Recent advancements in Weakly Supervised Semantic Segmentation (WSSS) have highlighted the use of image-level class labels as a form of supervision. Many methods use pseudo-labels from class activation maps (CAMs) to address the limited spatial information in class labels. However, CAMs generated from Convolutional Neural Networks (CNNs) are often led to focus on prominent features, making it difficult to distinguish foreground objects from their backgrounds. While recent studies show that features from Vision Transformers (ViTs) are more effective in capturing the scene layout than CNNs, the use of hierarchical ViTs has not been widely studied in WSSS. This work introduces "SWTformer" and explores the effect of Swin Transformer’s local-to-global view on improving the accuracy of initial seed CAMs. SWTformer-V1 produces CAMs solely based on patch tokens as its input features. SWTformer-V2 enhances this process by integrating a multi-scale feature fusion mechanism and employing a background-aware mechanism that refines the accuracy of localization maps, resulting in better differentiation between objects. Experiments on the Pascal VOC 2012 dataset demonstrate that compared to state-of-the-art models, SWTformer-V1 achieves 0.98% mAP higher in localization accuracy and generates initial localization maps that are 0.82% mIoU higher in accuracy while relying solely on the classification network. SWTformer-V2 enhances the accuracy of the seed CAMs by 5.32% mIoU. Code available at: ttps://github.com/RozhanAhmadi/SWTformer

Keywords

Main Subjects

Articles in Press, Accepted Manuscript
Available Online from 02 December 2024

Article View: 43
PDF Download: 52

Leveraging Swin Transformer for Local-to-Global Weakly Supervised Semantic Segmentation

Articles in Press, Accepted Manuscript
Available Online from 02 December 2024

Files

Share

How to cite

Statistics

Leveraging Swin Transformer for Local-to-Global Weakly Supervised Semantic Segmentation

Articles in Press, Accepted Manuscript Available Online from 02 December 2024

Files

Share

How to cite

Statistics

Articles in Press, Accepted Manuscript
Available Online from 02 December 2024