How does deepfake detection technology work?

Deepfake detection using AU temporal coherence analysis works by examining the timing patterns of facial muscle activations across video frames. Genuine facial expressions follow characteristic temporal sequences — onset, apex, and offset — that are governed by neuromuscular physiology. Current deepfake generators, including diffusion models and GANs, synthesise pixel patterns frame-by-frame and cannot replicate these physiologically-constrained temporal patterns. EchoDepth's POKERFACE module detects these temporal deviations at the Action Unit level.

What is AU temporal coherence in deepfake detection?

AU temporal coherence refers to the characteristic timing patterns of facial Action Unit activations in genuine video. A real smile, for example, involves specific onset timing for AU6 (cheek raiser) and AU12 (lip corner puller), a peak duration correlated with the social context, and a characteristic offset. Deepfake generators produce plausible individual frames but create detectable inconsistencies in these temporal patterns across frames — inconsistencies that POKERFACE identifies.

Can deepfake detection be used in defence and intelligence contexts?

Yes. EchoDepth's POKERFACE module is designed for deployment in defence, intelligence, and cyber security contexts — analysing video evidence, source materials, and communications for synthetic generation indicators. It deploys fully on-premise with no cloud dependency, is SCIF-compatible, and produces structured output with confidence weightings that can be included in intelligence assessments and legal proceedings.

Deepfake Detection via AU Temporal Coherence

Published: 10 April 2026Category: Cyber SecurityReading time: ~7 min

Deepfake technology has reached the point where individual video frames of synthetic faces are visually indistinguishable from genuine photographs by both humans and many detection systems. The weakness of deepfake generators is not in individual frames — it is in the temporal relationships between frames. Genuine facial expressions follow physiologically-constrained timing patterns that current diffusion models and GANs cannot faithfully replicate. This is the basis of EchoDepth's POKERFACE deepfake detection module.

Why Individual-Frame Detection Is No Longer Sufficient

Early deepfake detection methods focused on artefacts visible in individual frames: inconsistent lighting, blurring at face boundaries, texture anomalies, and eye blinking anomalies. As deepfake generation technology improved — particularly with the adoption of diffusion models — many of these frame-level artefacts became harder to detect reliably.

Research from multiple academic groups including MIT CSAIL has documented the rapid reduction in frame-level detection accuracy as generation models improve. A detection method that achieves 95% accuracy against 2022-era deepfakes may perform substantially worse against 2025-era generations.

Temporal analysis addresses this limitation by examining properties that are constrained not by the sophistication of the generation model, but by the underlying physiology of the human face — constraints that AI generation models do not currently model.

The Neuromuscular Constraints Deepfakes Cannot Fake

Genuine facial expressions are produced by specific muscles, and those muscles have characteristic activation properties: onset rates, peak durations, offset rates, and fatigue patterns. The Facial Action Coding System (FACS) catalogues these at the level of individual Action Units.

A genuine Duchenne smile (AU6 + AU12) has characteristic temporal properties: AU12 (lip corner puller) activates faster than AU6 (cheek raiser); the peak is held for a duration correlated with the social context; and the offset follows a characteristic decay pattern. These timing relationships are governed by neuromuscular physiology — the speed at which specific muscle fibres contract and relax, the interplay between agonist and antagonist muscles, and the modulation of expression intensity over time.

Current deepfake generators operate primarily in the spatial domain — they produce spatially plausible frames. They do not model the temporal dynamics of individual AU activations with physiological accuracy. The result is that even visually plausible deepfake video tends to show AU activation patterns with temporal signatures that deviate from genuine expressions in measurable ways: abrupt onset/offset, implausible AU co-activation patterns, and temporal jitter in AU intensity that differs from genuine neuromuscular activation.

"Temporal analysis of facial action units provides detection robustness that is inherently tied to the physiology of the human face rather than to the specific failure modes of current generation models — making it more generalisable to future generation technologies."

— EchoDepth POKERFACE Technical Documentation, 2026

How POKERFACE Implements Temporal AU Analysis

EchoDepth's POKERFACE module processes video frame sequences rather than individual frames. For each sequence, it extracts 44 FACS-compliant Action Units per frame, builds temporal activation profiles for each AU, and analyses the resulting time-series against known distributions of genuine expression temporal dynamics.

Key detection signals include:

AU onset/offset rate deviation: abrupt or implausibly smooth activation changes that differ from genuine muscle contraction dynamics
AU co-activation implausibility: combinations of AUs that are physiologically unusual — for example, AU12 and AU20 (lip stretcher) occurring simultaneously, which is physiologically unusual in genuine expressions
AU intensity jitter: the characteristic micro-variations in muscle activation intensity that reflect genuine neuromuscular control, which are often absent in generated video
Inter-AU timing relationships: the relative timing of different AU activations in compound expressions, which follow consistent physiological patterns in genuine faces
Blink pattern analysis: blinking frequency and duration patterns that deviate from genuine baseline

POKERFACE produces a per-sequence confidence score with per-AU temporal deviation flagging — output that enables an analyst to understand not just whether a video is flagged, but which specific temporal patterns triggered the detection.

Applications in Defence and Intelligence

The primary defence and intelligence applications for deepfake detection include: verification of source video in intelligence assessments, screening of communications and video evidence for synthetic generation, counter-OSINT analysis, and verification of remotely-presented identities in credentialing and briefing contexts.

POKERFACE deploys fully on-premise with no cloud dependency — suitable for SCIF and air-gapped environments. Output is structured JSON with ISO 8601 timestamps, confidence weightings, and per-AU temporal deviation data — compatible with intelligence document formats and legal proceedings.

The system integrates with cyber security workflows via REST API and WebSocket, enabling automated deepfake screening of incoming video materials as part of existing SIEM and SOAR pipelines.

Related capability

Deepfake detection for defence, intelligence, and cyber security

POKERFACE — AU temporal coherence analysis. SCIF-compatible. Structured evidential output. Integrates with SIEM and SOAR platforms.

Human Risk in Cyber Security Request a Technical Briefing

Deepfake Detection Technology:How AU Temporal Coherence Identifies Synthetic Video

Why Individual-Frame Detection Is No Longer Sufficient

The Neuromuscular Constraints Deepfakes Cannot Fake

How POKERFACE Implements Temporal AU Analysis

Applications in Defence and Intelligence

Deepfake detection for defence, intelligence, and cyber security

Related Insights

Deepfake Detection Technology:
How AU Temporal Coherence Identifies Synthetic Video