VAD Model Emotion Recognition | EchoDepth Insights

Published: 10 April 2026 Category: Emotion AI Reading time: ~7 min

Most emotion recognition systems classify faces into one of six or seven discrete categories. The canonical list — happiness, sadness, anger, fear, disgust, surprise, and sometimes contempt — derives from Paul Ekman's cross-cultural research in the 1970s on basic emotions. While valuable for establishing that facial expressions have universal elements, discrete emotion classification is a poor fit for operational environments. Real emotional states are blended, suppressed, and context-modified in ways that binary labels cannot capture. The VAD model — Valence-Arousal-Dominance — offers a better framework.

What Is the VAD Model?

The Valence-Arousal-Dominance model was developed by psychologist Albert Mehrabian and researcher James Russell in their 1977 paper on environmental psychology. It proposes that any emotional state can be located in a three-dimensional space defined by three independent axes.

Valence is the positivity–negativity dimension. It runs from very negative (distress, anger, grief) to very positive (joy, contentment, excitement). Valence captures whether an emotional state feels good or bad to the person experiencing it.

Arousal is the activation dimension. It runs from calm, low-energy states (sleep, boredom, contentment) to highly activated states (excitement, anxiety, panic). Arousal captures the intensity of an emotional state rather than its direction.

Dominance is the control dimension. It runs from submissive, powerless states (fear, helplessness) to dominant, in-control states (confidence, authority). Dominance captures the person's perceived agency in a situation.

Together, these three dimensions can represent the full emotional space without committing to discrete labels. A VAD coordinate of (−0.7, +0.8, −0.3) — high negative valence, high arousal, slightly low dominance — describes a state consistent with fear or acute stress. The same coordinate system can distinguish fear (low dominance) from anger (high dominance) despite both having similar valence and arousal values.

Why Discrete Labels Fail in Operational Environments

The six basic emotions model works well in laboratory conditions where subjects are asked to pose expressions, or in consumer applications where high-level emotional categories are sufficient. In defence, intelligence, and security environments — interview rooms, SOC analyst stations, control rooms, training facilities — three specific problems make discrete labels inadequate.

First, operational emotional states are blended. A person under credibility assessment is simultaneously anxious about the process, concentrating on their answers, and potentially suppressing specific responses. This produces a facial configuration that does not map cleanly to any single basic emotion. A discrete classifier forces an assignment — "this frame is classified as fear" — that discards the information in the expression's actual complexity.

Second, discrete labels are not auditable. If a system outputs "angry" or "deceptive", that output cannot be decomposed, reviewed, or challenged. What muscles activated? At what intensity? For how long? A VAD score — V: −0.52, A: +0.81, D: +0.14 — is a reproducible, queryable datum that documents exactly what the system measured.

Third, discrete classifiers cannot detect suppression. A person actively suppressing an emotional response will partially activate and then rapidly neutralise specific Action Units. This produces a VAD trajectory — a transient excursion into high arousal, high negative valence space, followed by rapid return — that is invisible to a discrete classifier but detectable in VAD time-series analysis.

"The advantages of a dimensional over a categorical description of affect are well established in the psychological literature. Dimensions provide more information, handle blended states, and avoid the forced-choice problem of categorical classification."

— Russell & Barrett, Core Affect, Prototypical Emotional Episodes, and Other Things Called Emotion (1999)

How EchoDepth Maps FACS AUs to VAD Space

EchoDepth's pipeline connects the two frameworks: FACS Action Units provide the measurement layer; the VAD model provides the representational framework. Specific AU combinations map to known VAD regions through a learned mapping trained across 14 cultural cohorts and 6 countries.

The AU-to-VAD mapping is not a simple lookup table. It is a learned model that accounts for AU combinations, their relative intensities, temporal sequencing, and cultural calibration. AU12 (lip corner puller) alone does not map reliably to positive valence — context matters. AU12 in combination with AU6 (cheek raiser) — the Duchenne smile — maps with high confidence to positive valence and moderate positive arousal. AU12 without AU6, the social or posed smile, maps to a different VAD region.

The output is a per-frame VAD triple, timestamped to ISO 8601 standard, with a confidence weighting reflecting the strength of the AU evidence. Downstream, this feeds into anomaly detection (sustained negative valence and high arousal relative to baseline), readiness scoring (arousal and dominance relative to mission-optimal range), and suppression detection (VAD trajectory analysis for transient negative excursions).

VAD Output in Practice: Three Use Cases

In credibility assessment, VAD time-series analysis identifies question-level stress and suppression patterns. A question that produces a transient spike in arousal and negative valence, followed by rapid neutralisation, is flagged for secondary review. The output is not "this person is lying" — it is "this question produced a statistically significant VAD excursion of magnitude X, duration Y, with AU combination Z."

In operator readiness monitoring, VAD arousal serves as the primary fatigue indicator. Sustained low arousal combined with increased eyelid droop AU activations signals fatigue onset. The system provides a continuous readiness score — not a binary pass/fail — enabling intervention before critical thresholds are breached.

In insider threat detection, VAD baseline deviation scoring surfaces individuals whose emotional state profiles deviate significantly from their established pattern. A person whose typical VAD readings sit in calm-neutral space who begins showing sustained negative valence and elevated arousal during routine access events triggers an anomaly flag — not an accusation, but a signal for additional monitoring or review.

Related capability

See VAD output in EchoDepth's processing pipeline

44 FACS Action Units per frame. Real-time VAD scoring. SIEM-ready structured JSON output.

Technical Architecture Request a Briefing

The VAD Model in Emotion Recognition:Why Dimensions Beat Labels

What Is the VAD Model?

Why Discrete Labels Fail in Operational Environments

How EchoDepth Maps FACS AUs to VAD Space

VAD Output in Practice: Three Use Cases

See VAD output in EchoDepth's processing pipeline

Related Insights

The VAD Model in Emotion Recognition:
Why Dimensions Beat Labels