The Temporal Dimension of Money Laundering

Money laundering is inherently sequential. Placement, layering, and integration occur over time, often spanning days or weeks. Traditional AML systems analyze transactions in isolation, missing patterns that only emerge when viewing the temporal sequence. Time series models address this blind spot.

Why Time Series Models?

Certain money laundering patterns are fundamentally temporal:

Structuring Over Time: Small deposits made daily to stay below thresholds
Layering Sequences: Funds moved through accounts in specific order
Velocity Changes: Sudden bursts of activity after dormancy
Cyclic Patterns: Repeating schemes with predictable timing
Sequential Dependencies: Transaction N depends on transaction N-1

LSTM for Sequence Modeling

Long Short-Term Memory networks excel at learning patterns in sequences. They maintain internal state that captures long-term dependencies.

Architecture

Our LSTM Configuration

Input: Last 180 days of transactions (variable length)
↓
Embedding Layer: Convert transaction features to 128-dim vectors
↓
LSTM Layer 1: 256 hidden units, return sequences
↓
Dropout: 0.3
↓
LSTM Layer 2: 128 hidden units, return sequences
↓
Dropout: 0.3
↓
LSTM Layer 3: 64 hidden units, return final state
↓
Dense Layer: 32 units, ReLU
↓
Output Layer: Risk score (0-1)

What LSTM Learns

Normal Sequences: Typical transaction ordering for different entity types
Anomalous Patterns: Deviations from learned sequences
Temporal Dependencies: How current transaction relates to previous ones
Long-Range Effects: Events weeks apart that are connected

Example: Structuring Detection

Sequence: Customer makes deposits of $9,800, $9,900, $9,700, $9,850 over 4 days

LSTM Analysis:

• Recognizes amounts consistently just below $10K threshold
• Detects regular timing (daily pattern)
• Compares to customer's historical sequence (normally 1-2 deposits/month)
• Flags entire sequence as high-risk structuring

Transformer Models for Transactions

Transformers use attention mechanisms to capture relationships between any two points in a sequence, not just adjacent ones.

Advantages Over LSTM

Parallel Processing: Transformers process entire sequence simultaneously (faster training)
Long-Range Dependencies: Attention can connect transactions months apart
Interpretability: Attention weights show which past transactions influenced current prediction

Transformer Architecture

Input: Transaction sequence (up to 512 transactions)
↓
Positional Encoding: Add temporal position information
↓
Multi-Head Attention: 8 attention heads, 128-dim each
↓
Feed-Forward Network: 512 hidden units
↓
Layer Norm + Residual
↓
[Repeat above block 6 times]
↓
Global Average Pooling
↓
Classification Head: Risk score

Attention Visualization

When analyzing Transaction T, the model pays attention to:

• Transaction T-1 (yesterday): 0.42 weight - immediate predecessor
• Transaction T-7 (last week): 0.28 weight - similar amount and counterparty
• Transaction T-30 (last month): 0.18 weight - start of suspicious pattern
• Other transactions: 0.12 weight combined

Feature Engineering for Time Series

What features do we feed into these models?

Transaction-Level Features

Amount: Log-transformed, normalized
Transaction Type: Deposit, withdrawal, transfer (one-hot encoded)
Counterparty: Hashed entity ID
Time of Day: Hour (0-23), business hours flag
Day of Week: Cyclic encoding (sin/cos)

Sequence-Level Features

Time Gaps: Seconds between transactions
Cumulative Amounts: Running total over window
Velocity: Transaction count in past 24h, 7d, 30d
Direction Changes: Deposits to withdrawals ratio
Counterparty Diversity: Unique entities in sequence

Anomaly Detection in Sequences

LSTM Autoencoders learn to reconstruct normal transaction sequences. High reconstruction error indicates anomalous patterns.

LSTM Autoencoder

Encoder: Transaction sequence → Compressed representation (32-dim)
Decoder: Compressed representation → Reconstructed sequence

Training: Minimize reconstruction error on normal sequences
Inference: High error = anomalous sequence

Example:
Normal sequence reconstruction error: 0.03
Anomalous sequence error: 0.47 → FLAG

Real-World Use Cases

Use Case 1: Layering Detection

Pattern: Funds deposited → immediately split across 5 accounts → consolidated in offshore account 3 days later

Detection: LSTM recognizes unusual split-consolidate temporal pattern

Result: Flagged $2.3M layering scheme, 7 linked accounts identified

Use Case 2: Dormant Account Reactivation

Pattern: Account inactive for 2 years → suddenly receives $500K → funds dispersed in 3 days

Detection: Transformer attention mechanism identifies abrupt change in account behavior

Result: Detected money mule account takeover

Training Strategies

Handling Variable-Length Sequences

Padding: Pad short sequences to max length
Masking: Mask padded positions so model ignores them
Bucketing: Group sequences by similar length for efficient batching

Addressing Class Imbalance

Oversampling: Replicate rare suspicious sequences
SMOTE for Sequences: Generate synthetic suspicious sequences
Weighted Loss: Penalize false negatives more heavily
Focal Loss: Focus learning on hard-to-classify sequences

Performance Metrics

LSTM Model

• Sequence length: up to 180 days
• Training time: 12 hours (100M sequences)
• Inference: 15ms per sequence
• AUC-ROC: 0.96

Transformer Model

• Sequence length: up to 512 transactions
• Training time: 8 hours (GPU parallelization)
• Inference: 25ms per sequence
• AUC-ROC: 0.97

Ensemble with Other Models

Time series models work best in combination with other approaches:

LSTM: Captures temporal patterns in individual entity sequences
GNN: Analyzes network structure and entity relationships
Isolation Forest: Detects statistical outliers in aggregated features
Rule-Based: Catches known typologies

Final risk score: weighted ensemble of all models, optimized for maximum precision at 95% recall.

Implementation Considerations

Computational Cost: LSTMs and Transformers require GPUs for real-time inference at scale
Data Storage: Need to store full transaction history (180 days+)
Retraining Frequency: Monthly retraining as patterns evolve
Explainability: Attention weights and sequence visualizations for analysts

Conclusion

Money laundering is a sequential process, and time series models capture this temporal dimension that traditional approaches miss. At nerous.ai—where our name embodies the ingenuity and brilliance of Finnish innovation—we've deployed LSTM and Transformer models that detect sophisticated schemes evolving over weeks or months.

The result: 40% improvement in detecting layering schemes, 60% reduction in false positives for velocity-based rules, and analyst tools that visualize exactly how patterns evolved over time.

Time Series Analysis for Transaction Monitoring