Emotion recognition systems face a fundamental representational choice. Assign a single category label — angry, fearful, surprised — or represent emotional state as a point in continuous multidimensional space. The former is simpler to communicate; the latter is more accurate, more auditable, and operationally more useful. The Valence-Arousal-Dominance model is the established scientific framework for dimensional emotional representation, and it underpins EchoDepth's output architecture.
The three dimensions of the VAD model
The VAD model locates any emotional state in a three-dimensional space. Each dimension is an independent axis measured on a continuous scale, conventionally from −1 to +1. The three dimensions are not correlated — knowing a person's Valence score tells you nothing about their Arousal or Dominance score — which means the three axes together can represent the full emotional space without redundancy.
The positive-to-negative quality of the emotional experience. High positive valence: joy, contentment, excitement. High negative valence: grief, anger, disgust, fear. Valence captures whether the state feels good or bad, not how intense it is.
The activation or energy level of the emotional state. Low arousal: sleep, boredom, contentment. High arousal: excitement, anxiety, panic, rage. Arousal is independent of valence — sadness and contentment share low arousal despite opposite valence.
The perceived control or power in the situation. Low dominance: fear, helplessness, awe. High dominance: confidence, authority, contempt. Dominance is the dimension most often omitted — and the one most critical for distinguishing operationally important states.
Origins: Mehrabian, Russell, and the dimensional tradition
The three-dimensional model of affect was developed by psychologist Albert Mehrabian, who proposed that environmental and emotional responses could be characterised along Pleasure, Arousal, and Dominance axes in his 1974 work on emotional responses. His 1980 book Basic Dimensions for a General Psychological Theory formalised the framework as a general model of affect, now variously referred to as PAD (Pleasure-Arousal-Dominance) or VAD (Valence-Arousal-Dominance) — the frameworks are equivalent with minor terminological variation.
Independently, James Russell developed the two-dimensional circumplex model of affect in his 1980 paper in the Journal of Personality and Social Psychology. Russell's circumplex uses Valence and Arousal axes to organise emotional concepts in a circular arrangement — calm sits at low arousal/positive valence, anxiety at high arousal/negative valence, and so on. The circumplex is elegant and well-validated but lacks the Dominance dimension, which limits its discriminative power for states that share valence and arousal coordinates.
The full VAD model incorporating Dominance has been validated across multiple cross-cultural research programmes. Bradley and Lang's Affective Norms for English Words (ANEW) dataset, widely used in affective computing, provides empirically-derived VAD ratings for over 13,000 words. Warriner, Kuperman and Brysbaert extended this to over 13,000 English lemmas in 2013, producing the most comprehensive VAD lexicon currently available.
"The advantages of a dimensional over a categorical description of affect are well established in the psychological literature. Dimensions provide more information, handle blended states, and avoid the forced-choice problem inherent in categorical classification."
— Russell & Barrett, Core Affect, Prototypical Emotional Episodes, and Other Things Called Emotion (1999)Why categorical models fail in security environments
The canonical discrete emotion framework derives from Paul Ekman's cross-cultural research in the 1960s and 1970s, which proposed six universal basic emotions: happiness, sadness, anger, fear, disgust, and surprise. Ekman's contribution to establishing that facial expressions have cross-cultural components is scientifically significant. As a basis for operational emotion recognition in defence and security contexts, however, discrete labelling has three specific failure modes.
The blending problem
Operational emotional states are routinely blended. A person under credibility assessment may be anxious about the process, concentrating on their answers, and actively managing their presentation simultaneously. The resulting facial configuration does not map to any single basic emotion. A discrete classifier must force an assignment — producing output that is both less accurate and less honest about that inaccuracy. VAD representation places the same expression as a coordinate: V: −0.48, A: +0.71, D: +0.22 — negative, activated, slightly in control — without committing to a label the evidence does not support.
The suppression problem
Individuals in high-stakes environments frequently suppress emotional responses. Suppression does not eliminate the emotional state; it produces a characteristic pattern of partial AU activation followed by rapid neutralisation — what Ekman termed a micro-expression. This transient VAD excursion — a brief spike into high arousal, negative valence space, followed by rapid return to baseline — is detectable in VAD time-series analysis but invisible to a discrete classifier operating frame-by-frame. The temporal trajectory is the signal; discrete labels discard it.
The auditability problem
A system that outputs "deceptive" or "angry" provides no basis for review, challenge, or escalation. What evidence produced that output? At what intensity? For how long? VAD coordinates are reproducible, timestamped data. A finding of sustained negative valence and elevated arousal relative to individual baseline across a defined question sequence is a documentable, reviewable observation. This is not merely a compliance consideration — it is the difference between intelligence and assertion.
How EchoDepth maps FACS Action Units to VAD space
EchoDepth's processing pipeline connects two established frameworks: FACS (Facial Action Coding System) provides the measurement layer; VAD provides the representational output layer. The system measures 44 Action Units per video frame at up to 30fps, derives AU combination patterns with intensity ratings, and produces a per-frame VAD triple.
The AU-to-VAD mapping is a learned model, not a lookup table. It accounts for AU combinations, relative intensities, temporal sequencing, and cultural calibration across 14 cohorts. Key mappings illustrate how the model works:
| AU combination | Expression type | Approximate VAD region |
|---|---|---|
| AU6 + AU12 | Duchenne (genuine) smile | V: +0.8, A: +0.4, D: +0.5 |
| AU12 alone | Social (posed) smile | V: +0.3, A: +0.1, D: +0.4 |
| AU1 + AU4 + AU15 | Sadness / distress | V: −0.7, A: −0.2, D: −0.4 |
| AU4 + AU5 + AU7 + AU23 | Anger | V: −0.6, A: +0.7, D: +0.6 |
| AU1 + AU2 + AU4 + AU5 + AU20 + AU26 | Fear | V: −0.7, A: +0.7, D: −0.5 |
| AU4 + AU7 + AU23 + AU24 (suppressed) | Controlled anger / suppression | V: −0.4, A: +0.5, D: +0.3 (transient) |
The Dominance dimension is what separates fear from anger in this mapping. Both share high negative valence and high arousal. But AU5 (upper lid raiser) + AU20 (lip stretcher) in the fear configuration produces low dominance, while AU23 (lip tightener) + AU24 (lip pressor) in the anger configuration produces high dominance. This distinction is operationally critical and systematically lost in discrete six-emotion classification.
VAD output format and downstream applications
EchoDepth produces per-frame VAD triples timestamped to ISO 8601 standard, with a confidence weighting reflecting the strength of AU evidence. Typical output structure:
{
"timestamp": "2026-05-07T14:32:01.083Z",
"frame": 4127,
"valence": -0.52,
"arousal": 0.81,
"dominance": 0.14,
"confidence": 0.91,
"aus_active": ["AU4","AU5","AU7","AU17","AU20"],
"baseline_deviation": {
"valence_delta": -0.68,
"arousal_delta": +0.54
}
}
This structured output feeds three downstream analytical functions. Anomaly detection compares per-frame and windowed VAD coordinates against individual baselines, flagging sustained or acute deviations that exceed configurable thresholds. Readiness scoring tracks arousal and dominance within mission-optimal ranges, providing continuous crew or operator state monitoring. Trajectory analysis processes VAD time-series across question or event sequences to detect suppression patterns — transient excursions that indicate an emotional response was experienced and deliberately neutralised.
VAD in defence contexts: three operational applications
Personnel security and credibility assessment
In credibility assessment interviews, VAD time-series analysis identifies question-level stress and suppression patterns. A question that produces a transient spike in arousal (+0.6 delta) and negative valence (−0.5 delta) followed by rapid neutralisation within 400ms is flagged for review. The output documents what was observed — not a deception finding, but a quantified emotional response that warrants follow-up.
Operator readiness monitoring
In control room and SOC environments, VAD arousal serves as the primary continuous readiness indicator. Sustained low arousal combined with AU46 (winking/blink rate change) signals fatigue onset before performance degradation occurs. The Dominance dimension adds a secondary indicator: declining dominance scores during routine tasks may reflect growing cognitive load or confidence loss.
Insider threat and behavioural baseline monitoring
In insider threat programmes, individual VAD baselines are established over time and deviation scoring surfaces anomalous emotional state patterns during access events. An individual whose typical VAD profile is calm-neutral (V: +0.1, A: −0.2, D: +0.3) who begins showing sustained elevated arousal and negative valence during standard system access warrants review — not accusation, but proportionate attention.
44 Action Units. Real-time VAD scoring. SIEM-ready output.
See how EchoDepth's processing pipeline moves from raw video to structured VAD time-series — and how that output integrates with existing defence and security infrastructure.