Abstract: The presence of noise in acquired data invariably leads to performance degradation in cross-modal matching. Unfortunately, obtaining precise annotations in the multimodal field is expensive, ...