skip to main content
Resource type Show Results with: Show Results with: Index

A comparative study for multiple visual concepts detection in images and videos

Hamadi, Abdelkader ; Mulhem, Philippe ; Quénot, Georges

Multimedia Tools and Applications, 2016, Vol.75(15), pp.8973-8997 [Peer Reviewed Journal]

Full text available online

View all versions
Citations Cited by
  • Title:
    A comparative study for multiple visual concepts detection in images and videos
  • Author/Creator: Hamadi, Abdelkader ; Mulhem, Philippe ; Quénot, Georges
  • Language: English
  • Subjects: Semantic indexing ; Multimedia ; Fusion ; Multiple concepts ; Multi-concept ; Concept pairs ; Triplet of concepts ; Bi-concept ; Tri-concept ; Image ; Video ; Pascal VOC ; TRECVid
  • Is Part Of: Multimedia Tools and Applications, 2016, Vol.75(15), pp.8973-8997
  • Description: Automatic indexing of images and videos is a highly relevant and important research area in multimedia information retrieval. The difficulty of this task is no longer something to prove. Most efforts of the research community have been focusing, in the past, on the detection of single concepts in images/videos, which is already a hard task. With the evolution of information retrieval systems, users’ needs become more abstract, and lead to a larger number of words composing the queries. It is important to think about indexing multimedia documents with more than just individual concepts, to help retrieval systems to answer such complex queries. Few studies addressed specifically the problem of detecting multiple concepts (multi-concept) in images and videos. Most of them concern the detection of concept pairs. These studies showed that such challenge is even greater than the one of single concept detection. In this work, we address the problem of multi-concept detection in images/videos by making a comparative and detailed study. Three types of approaches are considered: 1) building detectors for multi-concept, 2) fusing single concepts detectors and 3) exploiting detectors of a set of single concepts in a stacking scheme. We conducted our evaluations on PASCAL VOC’12 collection regarding the detection of pairs and triplets of concepts. We extended the evaluation process on TRECVid 2013 dataset for infrequent concept pairs’ detection. Our results show that the three types of approaches give globally comparable results for images, but they differ for specific kinds of pairs/triplets. In the case of videos, late fusion of detectors seems to be more effective and efficient when single concept detectors have good performances. Otherwise, directly building bi-concept detectors remains the best alternative, especially if a well-annotated dataset is available. The third approach did not bring additional gain or efficiency.
  • Identifier: ISSN: 1380-7501 ; E-ISSN: 1573-7721 ; DOI: 10.1007/s11042-015-2730-2