MS2A: Memory Storage-to-Adaptation for Cross-domain Few-annotation Object Detection

Abstract

Cross-domain few-annotation object detection (CFOD) aims to transfer a detector trained on a source domain to a target domain with minimal annotations, a critical challenge in industrial applications. This paper proposes a novel memory storage-to-adaptation (MS2A) mechanism to address the challenge. Unlike prior methods that focus on data imbalance and feature misalignment with limited labeled data, MS2A mines prior knowledge from abundant unlabeled data that is more representative of the entire target domain. Specifically, MS2A involves memory storage and memory adaptation modules: the former captures the prior knowledge and stores it as memory information, while the latter efficiently integrates the memory into the feature alignment, yielding more discriminative features and significant improvement. We have collected a more challenging dataset in industrial scenes and evaluated MS2A on the constructed and public datasets. MS2A achieves new state-of-the-art results, outperforming previous methods by 10.4\% on the industrial dataset for the 10-annotation setting.

Method

Overall pipeline. The proposed MS2A consists of two core parts. In the memory storage module, we first use the source data to train a base detector (we use YOLOX in practice and name this process prior learning), and use the base detector to extract prior knowledge for both source data and unlabeled target data. The prior knowledge is refined through clustering and momentum updates and stored as memory. On the other hand, in the memory adaptation module, we introduce an efficient adaptation module, which utilizes the memory to align the source data with the target data adaptively.

Detail of efficient adaptation module.

After obtaining the memory, the memory adaptation module aims at transferring it into feature alignment for both domains adaptively. To this end, we propose a novel efficient adaptation module to associate the memory with the multi-scale features of both source and target domains.

The efficient adaptation module aims to integrate the memory feature into the multi-scale features from the neck architecture of the detection network. We split them into window features and conduct window attention, resulting in an efficient computation.