Article

VICAR Video Navigator:

Content Based Video Search Engines Become a Reality

Peter van der Putten

pvdputten@smr.nl

Sentient Machine Research

Vicar is an innovative search-engine technology for cataloguing and retrieval of huge amounts of video. Vicar stands for Video Indexing, Classification, Annotation and Retrieval. It offers advanced video search and analysis features, including face recognition, camera movement detection and query by example retrieval - based on shape, texture and colour. The first VICAR product, the Vicar Video Navigator, is currently being evaluated by major European broadcasters under public law: SWR (Germany), ORF (Austria), SVT (Sweden) and NAA (Dutch National Audiovisual Archive).

Using the VICAR Video Navigator to extract high level content for archiving

There exist two main groups of possible users for the Video Navigator: the documentalist who archives video material and the producer or journalist who searches for appropriate footage for re-use.

Figure 1. Vicar Video Navigator User Interface (movie clip source: ORF)

Replacing many routine actions, the Video Navigator automates a part of the cataloguing work that today is still done manually by documentalists. Information is extracted automatically from the video stream in three main steps. First the video is segmented into its single shots. From each shot a number of representative keyframes is selected (see figure 1).

Second, the Video Navigator computes an index for each keyframe. The index is based on a wide variety of features, such as frame colour, distribution of brightness over a frame, camera movements, textures and changes in contrast over a shot. Taken together, the features are integrated into a rich index structure that is linked to the time-code of the original video tape. If a user presents an example image or movie clip, the index will be used to search for similar footage.

In the third step the indices are classified: keyframes are labelled in terms of some class that is relevant to the user. The standard Video Navigator is equipped to classify, recognize and annotate VIPs (‘Clinton’, ‘Kohl’), settings (‘interior’, ‘forest’, ‘city’, ‘mountain scenery'), objects ('3 cars') and camera movements (‘left’, ‘pan’, ‘tilt', 'zoom out/in').

The documentalist retains control over this process; annotations proposed by the system can be overwritten or completed. Semantic interpretations exceeding the bounds of VICAR's knowledge, such as historical significance of events, can be added manually by the documentalist. Finally, it is possible to edit the resulting folder structure representing the shots and keyframes; this functionality enables the documentalist to create a personalized file archive structure to minimize research time (with folders like ‘most popular stock shots').

Using the VICAR Video Navigator for flexible search and retrieval

The producer or documentalist can query by text, as usual, and by visual example, using stills or video sequences. In response to a query, the system delivers a set of matching sequences, shots, or frames. Depending on the storage capacity available to the user, the user can play a video instantly in the browser, or retrieve it near-line from a tape store (see figure 2).

Figure 2. Face Recognition and Query by Example search in the Video Navigator (movie clip source: ORF)

In addition, the interface guides the user in a type of query that goes beyond current limits of querying. Given some search results, the user can indicate a number of answers he or she likes best, and run a second query. This kind of relevance feedback enables a user to search without explicitly formulating what he or she is looking for. For instance, a user can search for ‘Helmut Kohl’ in an interior shot and then select those search results that show him wearing a dark blue jacket against a light blue background. The Video Navigator will now only retrieve clips matching these criteria.

In conclusion, it is possible to control the different functionalities of VICAR’s Video Navigator through an intuitive and flexible user interface that is optimally geared towards television archive practice and in particular, promoting re-use of archive material.

Open VICAR Video Navigator architecture facilitates integration and deployment

To improve and facilitate integration into existing workflows and technical infrastructure VICAR's design is based on standard PC hardware, client server software and open networking protocols (see figure 3).

Figure 3. Vicar Solution Architecture

The video content can either reside on a video-server, in any appropriate open format, or it can be encoded from tape to MPEG-1 or MPEG-2, the data-format that VICAR applications internally use for video analysis and retrieval. The video indexing server performs all the computing intensive tasks: video analysis, indexing and retrieval. The server software has been parallelised for maximal scalability and runs on standalone office PC's, multiprocessor PC's and PC clusters, all under Windows NT. The graphical user interface client is JAVA based, platform independent and web-enabled, which allows for easy company-wide or even external access to the VICAR video index server. Depending on personal preferences, the user can change the look and feel of the GUI from Windows to Java or UNIX/Motif, with one click of a button.

Bridging the Gap between Traditional and Digital Archives

The benefits of using the VICAR Video Navigator are clear: qualitatively better, quicker and more flexible research, resulting in a substantial increase in the re-use of archive material. Istar Buscher, responsible scientific documentalist for VICAR at the German broadcaster SWR: "With a tool like VICAR, the current process of radical change in the working routine of a broadcaster will be accelerated. An application of VICAR gives us the chance to strengthen the quality of TV production by supporting the basic characteristics of film-making, like creativity and artistic value“.

Although clearly being a technology from the digital future, VICAR might be the key to bridge the gap between old-style legacy tape archives and archives with full digital storage. “VICAR enables us to get the best of both the digital and the analogue worlds”, says Istar Buscher. “At the moment we have to keep the original video on tape because digital storage is expensive and there are no internationally accepted and long-term lasting standards for storage. But now we can run it once through the VICAR Video Navigator - maybe with some additional manual annotation - but without the need to store the entire digital video-file itself. Then we have optimal access to the content either through the storyboard, keyframes, annotation or in addition the video clip. The original remains in the tape store.

Today, we are already able to to build up a semi-digital video archive covering thousands of hours of video material very quickly, with all the digital retrieval and navigation features one could dream of, for a fraction of the storage requirements and costs it would take to build and maintain a fully digital archive.“

For more information, please feel free to visit the VICAR website at

http://ppc210.joanneum.ac.at/vicar/.