VICAR
Video Navigator:
Content
Based Video Search Engines Become a Reality
Peter
van der Putten
pvdputten@smr.nl
Sentient
Machine Research
Vicar is an innovative search-engine technology
for cataloguing and retrieval of huge amounts of video. Vicar stands for Video
Indexing, Classification, Annotation and Retrieval. It offers advanced video
search and analysis features, including face recognition, camera movement
detection and query by example retrieval - based on shape, texture and colour.
The first VICAR product, the Vicar Video Navigator, is currently being
evaluated by major European broadcasters under public law: SWR (Germany), ORF
(Austria), SVT (Sweden) and NAA (Dutch National Audiovisual Archive).
Using
the VICAR Video Navigator to extract high level content for archiving
There exist two main groups of possible users
for the Video Navigator: the documentalist who archives video material and the
producer or journalist who searches for appropriate footage for re-use.
Figure
1. Vicar Video Navigator User Interface
(movie clip source: ORF)
Replacing many routine actions, the Video
Navigator automates a part of the cataloguing work that today is still done
manually by documentalists. Information is extracted automatically from the
video stream in three main steps. First the video is segmented into its single
shots. From each shot a number of representative keyframes is selected (see
figure 1).
Second, the Video Navigator computes an index
for each keyframe. The index is based on a wide variety of features, such as
frame colour, distribution of brightness over a frame, camera movements,
textures and changes in contrast over a shot. Taken together, the features are
integrated into a rich index structure that is linked to the time-code of the
original video tape. If a user presents an example image or movie clip, the
index will be used to search for similar footage.
In the third step the indices are classified:
keyframes are labelled in terms of some class that is relevant to the user. The
standard Video Navigator is equipped to classify, recognize and annotate VIPs
(‘Clinton’, ‘Kohl’), settings (‘interior’, ‘forest’, ‘city’, ‘mountain
scenery'), objects ('3 cars') and camera movements (‘left’, ‘pan’, ‘tilt',
'zoom out/in').
The documentalist retains control over this
process; annotations proposed by the system can be overwritten or completed.
Semantic interpretations exceeding the bounds of VICAR's knowledge, such as
historical significance of events, can be added manually by the documentalist.
Finally, it is possible to edit the resulting folder structure representing the
shots and keyframes; this functionality enables the documentalist to create a
personalized file archive structure to minimize research time (with folders
like ‘most popular stock shots').
Using
the VICAR Video Navigator for flexible search and retrieval
The producer or documentalist can query by
text, as usual, and by visual example, using stills or video sequences. In
response to a query, the system delivers a set of matching sequences, shots, or
frames. Depending on the storage capacity available to the user, the user can
play a video instantly in the browser, or retrieve it near-line from a tape
store (see figure 2).
Figure
2. Face Recognition and Query by Example search in the Video Navigator (movie clip source: ORF)
In addition, the interface guides the user in a
type of query that goes beyond current limits of querying. Given some search results, the user can
indicate a number of answers he or she likes best, and run a second query. This
kind of relevance feedback enables a user to search without explicitly
formulating what he or she is looking for. For instance, a user can search for
‘Helmut Kohl’ in an interior shot and then select those search results that
show him wearing a dark blue jacket against a light blue background. The Video Navigator will now only retrieve
clips matching these criteria.
In conclusion, it is possible to control the
different functionalities of VICAR’s Video Navigator through an intuitive and
flexible user interface that is optimally geared towards television archive
practice and in particular, promoting re-use of archive material.
Open
VICAR Video Navigator architecture facilitates integration and deployment
To improve and facilitate integration into
existing workflows and technical infrastructure VICAR's design is based on
standard PC hardware, client server software and open networking protocols (see
figure 3).
Figure
3. Vicar Solution Architecture
The video content can either reside on a
video-server, in any appropriate open format, or it can be encoded from tape to
MPEG-1 or MPEG-2, the data-format that VICAR applications internally use for
video analysis and retrieval. The video indexing server performs all the
computing intensive tasks: video analysis, indexing and retrieval. The server
software has been parallelised for maximal scalability and runs on standalone
office PC's, multiprocessor PC's and PC clusters, all under Windows NT. The
graphical user interface client is JAVA based, platform independent and
web-enabled, which allows for easy company-wide or even external access to the
VICAR video index server. Depending on personal preferences, the user can
change the look and feel of the GUI from Windows to Java or UNIX/Motif, with
one click of a button.
Bridging
the Gap between Traditional and Digital
Archives
The benefits of using the VICAR Video Navigator
are clear: qualitatively better, quicker and more flexible research, resulting
in a substantial increase in the re-use of archive material. Istar Buscher,
responsible scientific documentalist for VICAR at the German broadcaster SWR:
"With a tool like VICAR, the current process of radical change in the
working routine of a broadcaster will be accelerated. An application of VICAR
gives us the chance to strengthen the quality of TV production by supporting
the basic characteristics of film-making, like creativity and artistic value“.
Although clearly being a technology from the digital future, VICAR might be the key to bridge the gap between old-style legacy tape archives and archives with full digital storage. “VICAR enables us to get the best of both the digital and the analogue worlds”, says Istar Buscher. “At the moment we have to keep the original video on tape because digital storage is expensive and there are no internationally accepted and long-term lasting standards for storage. But now we can run it once through the VICAR Video Navigator - maybe with some additional manual annotation - but without the need to store the entire digital video-file itself. Then we have optimal access to the content either through the storyboard, keyframes, annotation or in addition the video clip. The original remains in the tape store.
Today, we are already able to to build up a semi-digital video archive covering thousands of hours of video material very quickly, with all the digital retrieval and navigation features one could dream of, for a fraction of the storage requirements and costs it would take to build and maintain a fully digital archive.“
For more information, please feel free to visit
the VICAR website at
http://ppc210.joanneum.ac.at/vicar/.