Abstract

One of the challenges in Multimedia Event Retrieval is the integration of data from multiple modalities. A modality is defined as a single channel of sensory input, such as visual or audio. We also refer to this as data source. Previous research has shown that the integration of different data sources can improve performance compared to only using one source, but a clear insight of success factors of alternative fusion methods is still lacking. We introduce several new blind late fusion methods based on inversions and ratios of the state-of-the-art blind fusion methods and compare performance in both simulations and an international benchmark data set in multimedia event retrieval named TRECVID MED. The results show that five of the proposed methods outperform the state-of-the-art methods in a case with sufficient training examples (100 examples). The novel fusion method named JRER is not only the best method with dependent data sources, but this method is also a robust method in all simulations with sufficient training examples.

August 2016, Volume 75, Issue 15, pp 9025–9043

Knowledge based query expansion in complex multimedia event detection

Maaike de Boer
Klamer Schutte
Wessel Kraaij

Open AccessArticle

First Online:: 12 July 2015

Received:: 28 November 2014
Revised:: 28 April 2015
Accepted:: 16 June 2015

DOI: 10.1007/s11042-015-2757-4

Cite this article as:: de Boer, M., Schutte, K. & Kraaij, W. Multimed Tools Appl (2016) 75: 9025. doi:10.1007/s11042-015-2757-4

A common approach in content based video information retrieval is to perform automatic shot annotation with semantic labels using pre-trained classifiers. The visual vocabulary of state-of-the-art automatic annotation systems is limited to a few thousand concepts, which creates a semantic gap between the semantic labels and the natural language query. One of the methods to bridge this semantic gap is to expand the original user query using knowledge bases. Both common knowledge bases such as Wikipedia and expert knowledge bases such as a manually created ontology can be used to bridge the semantic gap. Expert knowledge bases have highest performance, but are only available in closed domains. Only in closed domains all necessary information, including structure and disambiguation, can be made available in a knowledge base. Common knowledge bases are often used in open domain, because it covers a lot of general information. In this research, query expansion using common knowledge bases ConceptNet and Wikipedia is compared to an expert description of the topic applied to content-based information retrieval of complex events. We run experiments on the Test Set of TRECVID MED 2014. Results show that 1) Query Expansion can improve performance compared to using no query expansion in the case that the main noun of the query could not be matched to a concept detector; 2) Query expansion using expert knowledge is not necessarily better than query expansion using common knowledge; 3) ConceptNet performs slightly better than Wikipedia; 4) Late fusion can slightly improve performance. To conclude, query expansion has potential in complex event detection.

Wessel Kraaij

Leiden Instute for Advanced Computer Science

Tag Archives: video

Blind late fusion in multimedia event retrieval

Abstract

Knowledge based query expansion in complex multimedia event detection

Knowledge based query expansion in complex multimedia event detection