a non-invasive measurement tool for in-service lip sync analysis. It operates
using the same principle as the human brain by comparing the timing of
sounds in the audio and mouth shapes in the video to measure the lip sync
error. LipTracker™ improves productivity by supplementing
an operator's subjective and time consuming analysis of lip sync with
rapid objective results measured in real time from the program material.
This easy to use measurement tool provides numeric and graphic displays
of the lip sync error, a history graph, status indicators and event logging.
LipTracker™ increases efficiency in systems design
and installation, daily operation and program quality assurance.
Audio offsets of up to ± 5 video frames can be measured in standard mode or up to ± 20 frames in extended range mode. This unique approach of analyzing real time video and audio content does not require the insertion of cues, codes or watermarks into the program stream. Therefore, since the program material is untouched, LipTracker™ can be used at any point in the transmission path.
Depending on the program content, the first result can be displayed in as little as 4 seconds after a face is detected. The result is then updated every 2 seconds until the current face is lost or a new face is detected. The history graph charts the most recent error profile and event logging saves the results for scene by scene analysis. The Audio Offset Status indicator is a visual warning of the current offset. User programmable thresholds determine whether the indicator is Green, Yellow or Red at any given offset reading.
Non-invasive analysis of lip sync errors up to ± 20 video frames by comparing video and audio Mutual Events (MuEvs)
Displays current lip sync error with numeric and graphic displays
First result displayed in as little as 4 seconds and updated every 2 seconds
Measurement offset parameter is used to compensate for known fixed delays in the video or audio being analyzed
Automatic face detection with point and click manual override for scenes with multiple faces
Audio Offset Status indicator provides a visual warning of the current offset
History display shows most recent error profile
Event logging for scene by scene analysis and archiving
Operates with SD or HD SDI video and AES-3id audio or audio is internally de-embedded from the SDI input
Digital and analog video and audio monitoring outputs
LipTracker™ searches frame by frame for a face in the input video. After finding a face, LipTracker™ automatically locks onto it and maintains lock during typical camera pans, tilts, zooms, and through the normal range of head motion. Minimum face height (from the top of the head to the bottom of the chin) is one quarter of the overall picture height.
In scenes with multiple faces, if LipTracker™ selects a non-speaking face for analysis, you can override the face selection by pointing to the correct face with the cursor and double clicking the mouse.
Determining The MuEv Offset
The sounds and mouth shapes that are used for MuEv analysis are commonly found in the natural speech patterns of many languages. When a face is detected, the input video is processed by locating the upper and lower lips within the face and extracting the mouth shape characteristics to generate a field by field stream of video MuEvs.
LipTracker™ does not need to be "trained" in advance to recognize any particular voice. The input audio is normalized and processed with LipTracker™'s proprietary technology to generate a stream of audio MuEvs that are speaker independent.
The audio and video MuEv streams are then correlated to determine the measurement of the lip sync error that is displayed on the screen. The silence segments that occur in
the audio input are also identified to provide additional cues for measuring the lip sync error.
LipTracker™'s results can be archived for scene by scene analysis. When logging is enabled, the audio offset measurements are written to an HTML file and/or a comma delimited (.csv) file. The .csv files can be imported into a spreadsheet or other application for further analysis.
For each program segment that is analyzed, a thumbnail of the first frame is stored along with the segment start time, the time of each measurement and the audio offset at that
time. The system clock and/or VITC from the video input signal can be selected as the logfile time reference.
Longitudinal time code (LTC) can also be recorded in the log files via LipTracker™'s 9-pin serial port. An external converter is required to translate baseband LTC to Sony™ serial protocol.
LipTracker™ analysis uses a number of key sounds that have the same distinct mouth shapes in virtually all languages. Examples include the EE sound (street in English, Paris in
French), the OO sound (moon in English, fruta in Spanish) and the AA sound (palm in English, nacht in German). Therefore, LipTracker™ is not limited to operating with English speakers but is langauge independent.
Scene Change Processing
In the normal mode of operation, analysis automatically restarts when a new face is detected. Each newly detected face is assumed to come from a different source and therefore could have a different audio offset than the face that preceded it.
However, for those applications where consecutive scenes are known to have the same audio offset, the "continuous" mode of scene change processing can be used. In this mode, the video and audio MuEv streams from consecutive scenes are combined and averaged to produce the audio offset results. This mode is also useful when the individual scenes are too short to generate measurements.
The LipTracker measurement window can be offset by up to ± 5 video frames in half frame increments. This offset parameter is used when there is a known fixed delay in either the video path or the audio path feeding LipTracker™.
For example, an HD to SD downconverter will add delay to the video path, or a digital audio processor can add delay to the audio path. Using the appropriate value of measurement offset ensures that LipTracker™ operates in the center of its measurement range.
Measurement Response Time
LipTracker™ provides two modes of measurement response time - Normal and Fast. Normal mode is appropriate for most applications where the audio offset does not change significantly during a single speaker segment. Fast mode is used when the audio offset may have significant changes that occur frequently during single speaker segments.
|Each LipTracker™ 1RU frame is shipped with a breakout cable (see below) and keyboard and mouse. Simply add a standard XGA monitor for a fully operational system.|
|Video Input (SD mode):
Input Formats (SD mode):
Video Input (HD mode):
Input Formats (HD mode):
Video Monitoring Outputs:
Audio Monitoring Output:
Standard Definition SDI video (SMPTE 259M-C)
|Analog Monitoring Output|
|Composite NTSC or PAL;
YC NTSC or PAL;
Y, R-Y, B-Y (Betacam™ or SMPTE)
|3 x BNC (75)
- breakout cable
1 balanced stereo pair
2 x XLR - breakout cable
Longitudinal time code can be used as a logging reference. An
external converter must be used to convert the baseband time code
to Sony™ serial format (RS-422).
|Rear Panel Interface (option)|
|The breakout cable connections can be replaced with the optional
1RU Rear Panel Interface.
LipTracker™ and Pixel Instruments are trademarks of Pixel Instruments Corporation. Sony and Betacam are trademarks of Sony Corporation.
Features and specifications subject to change without notice. U.S. Patent Applications 20040227856, 20070153125, 20070153089 and other patents applied for.