Video Search Basics

Abstract

The explosion of online video, fueled by a proliferation of video sharing sites like YouTube, has stimulated the development of a growing number of video search sites.  These search sites are based on the search paradigm created by the video sharing sites: user-generated keywords, based on the success of Flickr, the photo-sharing site.

This method of search disregards a very important dimension of video, an actual dimension – time, however, a new class of online video services is emerging which allows video publishers to work with the time component of video, creating time-based tags of all types.

These video tags are unique, because they present an entire searchable phrase to the search engines for each segment of video represented, whereas the video sharing sites only allow individual keywords that apply in general to the entire video.

The importance of online video will only increase, and with it the importance of and opportunity in video search.  With the eventual predominance of index-based video tagging, video search optimization will be a factor of the linguistic skill used to create the index tags, which in turn will be dependent upon a clear understanding of the intended use of the video.  Both of these are opportunities for a new cadre of online video professionals to distinguish themselves.

Video Search – A Misnomer

Within the new generation of video sharing and video search sites, there is one common thread - none of them actually searches video.  In fact, there are no applications that search video.  That’s because video itself is not searchable, at least not in ways which are compatible with any current form of search engine.


Video is a stream of pixel data, interwoven with a stream of digital audio data.  This data is only useful to the media player on the computer, whether that is Windows Media, QuickTime, Flash Video, Real, etc. and that data is sent directly to the screen and the sound card.  Search engines cannot process actual video or audio data. 


What video search sites are searching is the video “metadata”, the set of text-based data which is either embedded in or accompanying the video file.  The Flickr photo-sharing model was essentially cloned to create video sharing sites.


The critical issue here is that no matter how sophisticated or simple the techniques for tagging video, it will continue to be tagging, that is to say, it will continue to be a function of web search technology.  Whatever information we have in our videos, we want the Internet or Intranet search engines to find it.


Video search engines are actually identical to other web search engines.  They simply filter out all other content than video in their search results, and in some cases provide different levels of categorization.

Metadata – The Human Element

Because video is not directly searchable, video search depends upon the creation of text to represent the video in various ways.

Keywords

Video sharing sites provide metadata editing to their users during the video upload process, in a process identical to photo sharing.  This comes generally in the form of keywords which are created by the video’s publisher.

Embedded Metadata

Digital video and audio editing software systems have provided metadata authoring and editing capabilities for many years, generally embedding the metadata directly in the file.  This type of metadata has very limited use, because the embedded data is not accessible to anything but a compatible media player, and while it is possible to programmatically extract and utilize this data, even exposing it to search engines, it is rarely done. 

Closed Captioning

For many years now, there has been an accepted practice for creating video metadata in the broadcast and cable television industries, only it is not thought of as metadata authoring.  Closed captioning was established many years ago to allow access to video programming for the deaf; essentially adding a “living” transcript to the video, one where the words are synchronized to the video.  This transcript, which contains both dialogue and descriptions of audio-related activity is created “on the fly” by highly skilled typists, and currently most cable and television stations have 24/7 transcription of their programming.  This data gets embedded in the video.  (Note here that closed caption data does not contain any descriptions of video activity, a notable exception.)

A number of enterprise-level video search platforms exist that use closed caption text as the basis for their video search, generally combined with “scene detecting” to create video index information.  There is a special class of video web site that also uses closed caption data for user search.  Since the CC data is embedded in the video and tied to the timeline, it represents the most direct form of video search currently available. 

Transcription

Transcription is very similar to closed captioning, with the notable addition of descriptions of video activity where required, since audio alone is not always going to provide a complete picture, so to speak.


Transcription services have also been around for many years, but until recently the documents they produced were generally paper-based, often not containing any time-stamp information.  That has changed in the last decade or so, and it is standard practice for transcriptions to provide time-synchronization data as part of the transcript, sometimes offered in an electronic form, which provides possibility for programmatic access to the video.

Deep Tagging/Indexing

All of the major media formats contain a timeline that is accessible programmatically and an API (Application Programming Interface) that lets a media player embedded on a web page, such as Windows Media, talk back and forth to the web page.  You can send JavaScript requests to the player such as time-seek requests, allowing you to jump to any part of the video at will.

Using this functionality, you can create a database of time points in the video, each time point being what we can call an index entry.  When used in this manner, the index tag describes the index region (that part of the video represented by the index tag), which can be used both as a navigation label and as text for the search engines.


There is now a new class of video service site which allows users the ability to create these time-based tags.  These tags then are exposed to the search engines, which treat them exactly like any other keyword searching, with the exception that the search results take the viewer directly to the time point in the video represented by the video tag.  This is not a function of the search engines, by the way.  This functionality must be written into the player application for the video site in question.


The most obvious advantage of this approach is the ability to work with longer format videos, and although you can describe an index as simply being a playlist of clips; something that can ostensibly be accomplished on the current keyword-driven video search sites; continuity brings a different feeling to the deep tagging process, and the requirement for creating of an index tag phrase creates a truly different class of video tag than represented by the keywords of the video sharing sites.


Interestingly, the aforementioned digital video and audio editing systems have always had a feature to add time-based markers, actual markers, in the video file, along with the other metadata.   This has the advantage that the index is self-contained in the video, however, that is also the disadvantage because the data must be merged into the file by a separate process memory and processor intensive process.  This capability has been eclipsed by the various external tagging processes inherent in current video search, but implementations of internal markers may yet find their way into the metadata processing universe, especially considering the possibility of embedding indices in portable media, such as the video iPod.

Summary

Until the next major breakthrough in video processing technology, video metadata must be created manually.  There are a plethora of techniques for this, some of which provide time based access to the video, but all video search techniques now have one thing in common – they are designed to expose the information to search engines, using the same technology as all other web-base search.

The opportunities for video search optimization will be dramatic.

Right click to download word document