Fashion Retrieval via Graph Reasoning Networks on a Similarity Pyramid

IEEE Trans Pattern Anal Mach Intell. 2023 Jun;45(6):7019-7034. doi: 10.1109/TPAMI.2020.3025062. Epub 2023 May 5.

Abstract

Matching clothing images from customers and online shopping stores has rich applications in e-commerce. Existing algorithms mostly encode an image as a global feature vector and perform retrieval via global representation matching. However, distinctive local information on clothing is immersed in this global representation, resulting in sub-optimized performance. To address this issue, we propose a novel graph reasoning network (GRNet) on a similarity pyramid, which learns similarities between a query and a gallery cloth by using both initial pairwise multi-scale feature representations and matching propagation for unaligned representations. The query local representations at each scale are aligned with those of the gallery via an adaptive window pooling module. The similarity pyramid is represented by a similarity graph, where nodes represent similarities between clothing components at different scales, and the final matching score is obtained by message propagation along edges. In GRNet, graph reasoning is solved by training a graph convolutional network, enabling the alignment of salient clothing components to improve clothing retrieval. To facilitate future research, we introduce a new benchmark, i.e. FindFashion, containing rich annotations of bounding boxes, views, occlusions, and cropping. Extensive experiments show that GRNet obtains new state-of-the-art results on three challenging benchmarks, e.g. pushing the accuracy of top-1, top-20, and top-50 on DeepFashion to 27, 66, and 75 percent (i.e. 6, 12, and 10 percent absolute improvements), outperforming competitors with large margins. On FindFashion, GRNet achieves considerable improvements on all empirical settings.