Time Series Analysis for Transaction Monitoring
Using LSTM and Transformer models to detect temporal patterns and anomalies in transaction sequences, identifying money laundering schemes that evolve over time.
The Temporal Dimension of Money Laundering
Money laundering is inherently sequential. Placement, layering, and integration occur over time, often spanning days or weeks. Traditional AML systems analyze transactions in isolation, missing patterns that only emerge when viewing the temporal sequence. Time series models address this blind spot.
Why Time Series Models?
Certain money laundering patterns are fundamentally temporal:
- Structuring Over Time: Small deposits made daily to stay below thresholds
- Layering Sequences: Funds moved through accounts in specific order
- Velocity Changes: Sudden bursts of activity after dormancy
- Cyclic Patterns: Repeating schemes with predictable timing
- Sequential Dependencies: Transaction N depends on transaction N-1
LSTM for Sequence Modeling
Long Short-Term Memory networks excel at learning patterns in sequences. They maintain internal state that captures long-term dependencies.
Architecture
Our LSTM Configuration
Input: Last 180 days of transactions (variable length) ↓ Embedding Layer: Convert transaction features to 128-dim vectors ↓ LSTM Layer 1: 256 hidden units, return sequences ↓ Dropout: 0.3 ↓ LSTM Layer 2: 128 hidden units, return sequences ↓ Dropout: 0.3 ↓ LSTM Layer 3: 64 hidden units, return final state ↓ Dense Layer: 32 units, ReLU ↓ Output Layer: Risk score (0-1)
What LSTM Learns
- Normal Sequences: Typical transaction ordering for different entity types
- Anomalous Patterns: Deviations from learned sequences
- Temporal Dependencies: How current transaction relates to previous ones
- Long-Range Effects: Events weeks apart that are connected
Example: Structuring Detection
Sequence: Customer makes deposits of $9,800, $9,900, $9,700, $9,850 over 4 days
LSTM Analysis:
- • Recognizes amounts consistently just below $10K threshold
- • Detects regular timing (daily pattern)
- • Compares to customer's historical sequence (normally 1-2 deposits/month)
- • Flags entire sequence as high-risk structuring
Transformer Models for Transactions
Transformers use attention mechanisms to capture relationships between any two points in a sequence, not just adjacent ones.
Advantages Over LSTM
- Parallel Processing: Transformers process entire sequence simultaneously (faster training)
- Long-Range Dependencies: Attention can connect transactions months apart
- Interpretability: Attention weights show which past transactions influenced current prediction
Transformer Architecture
Input: Transaction sequence (up to 512 transactions) ↓ Positional Encoding: Add temporal position information ↓ Multi-Head Attention: 8 attention heads, 128-dim each ↓ Feed-Forward Network: 512 hidden units ↓ Layer Norm + Residual ↓ [Repeat above block 6 times] ↓ Global Average Pooling ↓ Classification Head: Risk score
Attention Visualization
When analyzing Transaction T, the model pays attention to:
- • Transaction T-1 (yesterday): 0.42 weight - immediate predecessor
- • Transaction T-7 (last week): 0.28 weight - similar amount and counterparty
- • Transaction T-30 (last month): 0.18 weight - start of suspicious pattern
- • Other transactions: 0.12 weight combined
Feature Engineering for Time Series
What features do we feed into these models?
Transaction-Level Features
- Amount: Log-transformed, normalized
- Transaction Type: Deposit, withdrawal, transfer (one-hot encoded)
- Counterparty: Hashed entity ID
- Time of Day: Hour (0-23), business hours flag
- Day of Week: Cyclic encoding (sin/cos)
Sequence-Level Features
- Time Gaps: Seconds between transactions
- Cumulative Amounts: Running total over window
- Velocity: Transaction count in past 24h, 7d, 30d
- Direction Changes: Deposits to withdrawals ratio
- Counterparty Diversity: Unique entities in sequence
Anomaly Detection in Sequences
LSTM Autoencoders learn to reconstruct normal transaction sequences. High reconstruction error indicates anomalous patterns.
LSTM Autoencoder
Encoder: Transaction sequence → Compressed representation (32-dim) Decoder: Compressed representation → Reconstructed sequence Training: Minimize reconstruction error on normal sequences Inference: High error = anomalous sequence Example: Normal sequence reconstruction error: 0.03 Anomalous sequence error: 0.47 → FLAG
Real-World Use Cases
Use Case 1: Layering Detection
Pattern: Funds deposited → immediately split across 5 accounts → consolidated in offshore account 3 days later
Detection: LSTM recognizes unusual split-consolidate temporal pattern
Result: Flagged $2.3M layering scheme, 7 linked accounts identified
Use Case 2: Dormant Account Reactivation
Pattern: Account inactive for 2 years → suddenly receives $500K → funds dispersed in 3 days
Detection: Transformer attention mechanism identifies abrupt change in account behavior
Result: Detected money mule account takeover
Training Strategies
Handling Variable-Length Sequences
- Padding: Pad short sequences to max length
- Masking: Mask padded positions so model ignores them
- Bucketing: Group sequences by similar length for efficient batching
Addressing Class Imbalance
- Oversampling: Replicate rare suspicious sequences
- SMOTE for Sequences: Generate synthetic suspicious sequences
- Weighted Loss: Penalize false negatives more heavily
- Focal Loss: Focus learning on hard-to-classify sequences
Performance Metrics
LSTM Model
- • Sequence length: up to 180 days
- • Training time: 12 hours (100M sequences)
- • Inference: 15ms per sequence
- • AUC-ROC: 0.96
Transformer Model
- • Sequence length: up to 512 transactions
- • Training time: 8 hours (GPU parallelization)
- • Inference: 25ms per sequence
- • AUC-ROC: 0.97
Ensemble with Other Models
Time series models work best in combination with other approaches:
- LSTM: Captures temporal patterns in individual entity sequences
- GNN: Analyzes network structure and entity relationships
- Isolation Forest: Detects statistical outliers in aggregated features
- Rule-Based: Catches known typologies
Final risk score: weighted ensemble of all models, optimized for maximum precision at 95% recall.
Implementation Considerations
- Computational Cost: LSTMs and Transformers require GPUs for real-time inference at scale
- Data Storage: Need to store full transaction history (180 days+)
- Retraining Frequency: Monthly retraining as patterns evolve
- Explainability: Attention weights and sequence visualizations for analysts
Conclusion
Money laundering is a sequential process, and time series models capture this temporal dimension that traditional approaches miss. At nerous.ai—where our name embodies the ingenuity and brilliance of Finnish innovation—we've deployed LSTM and Transformer models that detect sophisticated schemes evolving over weeks or months.
The result: 40% improvement in detecting layering schemes, 60% reduction in false positives for velocity-based rules, and analyst tools that visualize exactly how patterns evolved over time.
Dr. Sarah Chen
Chief AI Scientist at nerous.ai
Sarah leads ML research at nerous.ai, specializing in time series models and sequential pattern detection for financial crime prevention.
Detect Temporal Patterns with AI
See how time series models catch schemes that evolve over weeks and months.
Schedule Demo →