Zero-shot learning for image recognition recognizes the unseen classes by learning the transferable representations from seen to unseen classes in view of visuals or semantics. Recently, a few works have focused on learning the transferability of visual and semantic modals. They aim to conduct the adaptation between two modals to achieve more reliable transferability. However, their performances depend on the consistency of transferability in both modals. When the transferability of one modal is insufficient, the adaptation will lead to sparse and indistinguishable representations of unseen classes. To this end, a zero-shot method with visual-semantic mutual reinforcement is proposed, in which visual transferability and semantic transferability are reinforced mutually so that the transferable representations of two modals can complement each other, resulting in more discriminative transferable representations. Specifically, in visual transferability learning, local semantics are utilized to reinforce the key region representations, thus enriching the key regions and excluding the ambiguous regions in the image representations. In semantic transferability learning, visual class prototypes are used to reinforce the semantic representations with higher class discriminability. Finally, the image representations and attribute class prototypes for unseen classes are incorporated to recognize the unseen samples. Experimental results on multiple datasets show that the zero-shot and generalized zero-shot recognition of our method outperforms the state-of-the-art. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one
Visualization
Semantics
Prototyping
Education and training
Feature extraction
Classification systems
Picosecond phenomena