Convolutional neural network (CNN) has been applied widely in various fields. However, it is always hindered by the unexplainable characteristics. Users cannot know why a CNN-based model produces certain recognition results, which is a vulnerability of CNN from the security perspective. To alleviate this problem, in this study, the three existing feature visualization methods of CNN are analyzed in detail firstly, and a unified visualization framework for interpreting the recognition results of CNN is presented. Here, class activation weight (CAW) is considered as the most important factor in the framework. Then, the different types of CAWs are further analyzed, and it is concluded that a linear correlation exists between them. Finally, on this basis, a spatial-channel attention-based class activation mapping (SCA-CAM) method is proposed. This method uses different types of CAWs as attention weights and combines spatial and channel attentions to generate class activation maps, which is capable of using richer features for interpreting the results of CNN. Experiments on four different networks are conducted. The results verify the linear correlation between different CAWs. In addition, compared with the existing methods, the proposed method SCA-CAM can effectively improve the visualization effect of the class activation map with higher flexibility on network structure.
See how this article has been cited at scite.ai
scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.