LipTracker™ Lip Sync Analyzer
contact site map home

LipTracker™ is a non-invasive measurement tool for in-service lip sync analysis. It operates using the same principle as the human brain by comparing the timing of sounds in the audio and mouth shapes in the video to measure the lip sync error. LipTracker™ improves productivity by supplementing an operator's subjective and time consuming analysis of lip sync with rapid objective results measured in real time from the program material. This easy to use measurement tool provides numeric and graphic displays of the lip sync error, a history graph, status indicators and event logging. LipTracker™ increases efficiency in systems design and installation, daily operation and program quality assurance.

Audio offsets of up to ± 5 video frames can be measured in standard mode or up to ± 20 frames in extended range mode. This unique approach of analyzing real time video and audio content does not require the insertion of cues, codes or watermarks into the program stream. Therefore, since the program material is untouched, LipTracker™ can be used at any point in the transmission path.

Depending on the program content, the first result can be displayed in as little as 4 seconds after a face is detected. The result is then updated every 2 seconds until the current face is lost or a new face is detected. The history graph charts the most recent error profile and event logging saves the results for scene by scene analysis. The Audio Offset Status indicator is a visual warning of the current offset. User programmable thresholds determine whether the indicator is Green, Yellow or Red at any given offset reading.


Non-invasive analysis of lip sync errors up to ± 20 video frames by comparing video and audio Mutual Events (MuEvs)

Language independent

Displays current lip sync error with numeric and graphic displays

First result displayed in as little as 4 seconds and updated every 2 seconds

Measurement offset parameter is used to compensate for known fixed delays in the video or audio being analyzed

Automatic face detection with point and click manual override for scenes with multiple faces

Audio Offset Status indicator provides a visual warning of the current offset

History display shows most recent error profile

Event logging for scene by scene analysis and archiving

Operates with SD or HD SDI video and AES-3id audio or audio is internally de-embedded from the SDI input

Digital and analog video and audio monitoring outputs

Face Detection

LipTracker™ searches frame by frame for a face in the input video. After finding a face, LipTracker™ automatically locks onto it and maintains lock during typical camera pans, tilts, zooms, and through the normal range of head motion. Minimum face height (from the top of the head to the bottom of the chin) is one quarter of the overall picture height.

In scenes with multiple faces, if LipTracker™ selects a non-speaking face for analysis, you can override the face selection by pointing to the correct face with the cursor and double clicking the mouse.

Determining The MuEv Offset

The sounds and mouth shapes that are used for MuEv analysis are commonly found in the natural speech patterns of many languages. When a face is detected, the input video is processed by locating the upper and lower lips within the face and extracting the mouth shape characteristics to generate a field by field stream of video MuEvs.

LipTracker™ does not need to be "trained" in advance to recognize any particular voice. The input audio is normalized and processed with LipTracker™'s proprietary technology to generate a stream of audio MuEvs that are speaker independent.

The audio and video MuEv streams are then correlated to determine the measurement of the lip sync error that is displayed on the screen. The silence segments that occur in
the audio input are also identified to provide additional cues for measuring the lip sync error.

Event Logging

LipTracker™'s results can be archived for scene by scene analysis. When logging is enabled, the audio offset measurements are written to an HTML file and/or a comma delimited (.csv) file. The .csv files can be imported into a spreadsheet or other application for further analysis.

For each program segment that is analyzed, a thumbnail of the first frame is stored along with the segment start time, the time of each measurement and the audio offset at that
time. The system clock and/or VITC from the video input signal can be selected as the logfile time reference.

Longitudinal time code (LTC) can also be recorded in the log files via LipTracker™'s 9-pin serial port. An external converter is required to translate baseband LTC to Sony™ serial protocol.

Language Independence

LipTracker™ analysis uses a number of key sounds that have the same distinct mouth shapes in virtually all languages. Examples include the EE sound (street in English, Paris in
French), the OO sound (moon in English, fruta in Spanish) and the AA sound (palm in English, nacht in German). Therefore, LipTracker™ is not limited to operating with English speakers but is langauge independent.

Scene Change Processing

In the normal mode of operation, analysis automatically restarts when a new face is detected. Each newly detected face is assumed to come from a different source and therefore could have a different audio offset than the face that preceded it.

However, for those applications where consecutive scenes are known to have the same audio offset, the "continuous" mode of scene change processing can be used. In this mode, the video and audio MuEv streams from consecutive scenes are combined and averaged to produce the audio offset results. This mode is also useful when the individual scenes are too short to generate measurements.

Measurement Offset

The LipTracker measurement window can be offset by up to ± 5 video frames in half frame increments. This offset parameter is used when there is a known fixed delay in either the video path or the audio path feeding LipTracker™.

For example, an HD to SD downconverter will add delay to the video path, or a digital audio processor can add delay to the audio path. Using the appropriate value of measurement offset ensures that LipTracker™ operates in the center of its measurement range.

Measurement Response Time

LipTracker™ provides two modes of measurement response time - Normal and Fast. Normal mode is appropriate for most applications where the audio offset does not change significantly during a single speaker segment. Fast mode is used when the audio offset may have significant changes that occur frequently during single speaker segments.

LipTracker Configuration
Each LipTracker™ 1RU frame is shipped with a breakout cable (see below) and keyboard and mouse. Simply add a standard XGA monitor for a fully operational system.

Digital I/O  
Video Input (SD mode):
Input Formats (SD mode):
Video Input (HD mode):
Input Formats (HD mode):
Input Connector:

Audio Input:
Input Format:
Input Connector:
Embedded Audio:

Video Monitoring Outputs:
Output Connector:
Audio Monitoring Output:
Output Connector:

Standard Definition SDI video (SMPTE 259M-C)
480i59.94, 576i50
High Definition SDI video (SMPTE 292M-C)
720p59.94, 720p50, 1080i59.94, 1080i50
BNC (75) - rear of LipTracker™ frame

AES-3id unbalanced digital audio (SMPTE 276M)
1 AES pair
BNC (75) - breakout cable
The audio for analysis is internally de-embedded from the SDI input

2 copies of the SDI video input
2x BNC (75) - rear of LipTracker™ frame
1 copy of the AES-3id audio input
BNC (75) - breakout cable

Analog Monitoring Output  
Video Output:
Selectable between:
    Composite NTSC or PAL;
Y, R-Y, B-Y (Betacam™ or SMPTE)
Output Connectors:
Audio Output:
Output Connectors:
3 x BNC (75) - breakout cable
1 balanced stereo pair
2 x XLR - breakout cable

LTC Input  
Input Format:


Longitudinal time code can be used as a logging reference. An external converter must be used to convert the baseband time code to Sony™ serial format (RS-422).
9 pin D - breakout cable

Rear Panel Interface (option)
The breakout cable connections can be replaced with the optional 1RU Rear Panel Interface.

(zoom in/out)

LipTracker™ and Pixel Instruments are trademarks of Pixel Instruments Corporation. Sony and Betacam are trademarks of Sony Corporation.

Features and specifications subject to change without notice. U.S. Patent Applications 20040227856, 20070153125, 20070153089 and other patents applied for.