Privacy-preserving, fully synthetic financial datasets and ML demos — generated by a deterministic engine that produces balanced double-entry general ledgers, multi-entity group consolidations, AML/banking transactions, and OCEL process logs, all carrying ground-truth fraud / anomaly labels and grounded in real accounting standards (IFRS · US/French/German GAAP · ISA) and the ISO 21378 audit-data model.
Everything here is synthetic — no client or real-world data — so it can be used freely to train, benchmark, and stress-test audit, fraud-detection, and graph-ML systems.
| Space | What it does |
|---|---|
| 🔍 Inverse-Audit Detector | Label-free anomaly detection on a synthetic GL — fit the normal-system manifold, then flag journal entries by deviation via two fit-on-self residual arms (per-JE density + relational account-flow-graph) routed into one risk score. Pick a fraud scenario, see per-arm ROC, recall @ audit budget, and the top suspicious entries. |
| 🔀 Counterfactual GL Explorer | Seed-locked baseline vs counterfactual ledgers from a causal-DAG intervention — pick a scenario (control-stress / SoD breakdown), see the effect-field distribution shift, the intervention trace, and the exact changed lines. Byte-deterministic generation, so the diff is signal, not noise. |
| 🛡️ Fraud-GNN Demo | Graph-neural-network fraud detection on the JE network — edge fraud predictor, node anomaly explorer, and a live check with confusion matrix + ROC. |
| 🔗 Accounting Network Explorer | Interactive ISO 21378 account-class flow graph — filter by business process, fraud, anomaly, amount, top-N; drill from Level-2 classes into Level-3 sub-classes. |
| 📊 Process Mining Demo | pm4py directly-follows graphs, variants, and statistics on the supply-chain OCEL 2.0 event log. |
| 🗂️ Data Explorer | Browse and inspect the VynFi synthetic datasets. |
| 🕵️ Perfect Audit Crime Challenge | Two-track community leaderboard — flag the planted fraud in synthetic GLs and help map the detectability frontier. Track A (ledger only): the mimetic perfect crime — fraud drawn from the ledger's own normal distribution — is provably uncatchable. Track B (ledger + ISA-520/505 evidence): it becomes catchable. Upload a submission → PR-AUC + per-observability recall on held-out labels. |
Group audit & consolidation
| Dataset | Highlights |
|---|---|
| vynfi-group-audit-enterprise-2000 | End-to-end 2 000-entity group: matched intercompany pairs, eliminations, IFRS-consolidated financial statements + schedules + notes + CTA/NCI/equity-method rollforwards. |
| vynfi-group-audit-3yr-medium | Multi-period (3-year) group-audit bundle — period N+1 opens from period N's closing trial balance. |
| vynfi-je-network-2k | 68.5 M-edge consolidated journal-entry network from the 2 000-entity group — drop-in for GNN training (PyG / DGL), with is_fraud, ic_pair_id, is_eliminated. |
General ledger / journal entries
| Dataset | Highlights |
|---|---|
| vynfi-journal-entries-1m | ~1 M-entry manufacturing GL with ISA 240 manual flags, fraud labels, and chart of accounts. |
| vynfi-journal-entries-10m | Research-scale ~10.9 M-entry synthetic GL. |
| vynfi-audit-p2p | Procure-to-Pay document chain (PO/GR/VI/Payment) with fraud labels — audit-engagement grade. |
Causal / counterfactual
| Dataset | Highlights |
|---|---|
| vynfi-counterfactual-gl | Seed-locked, byte-deterministic baseline ↔ counterfactual GL pairs under named causal-DAG interventions (control-environment, SoD) — each pair differs only by the intervention's effect; the diff split isolates the changed lines. A clean treatment/control substrate for causal ML, treatment-effect estimation, and residual-based audit analytics. |
AML / banking
| Dataset | Highlights |
|---|---|
| vynfi-aml-100k | 748 K banking transactions with AML/SAR-style labels and velocity features. |
| vynfi-sar-narratives | 156 K transactions paired with suspicious-activity-report narratives + AML labels. |
Process mining (OCEL 2.0)
| Dataset | Highlights |
|---|---|
| vynfi-ocel-manufacturing | Manufacturing OCEL 2.0 event log — production-order lifecycle + quality inspections. |
| vynfi-supply-chain-ocel | 5-company manufacturing supply-chain OCEL 2.0 event log for cross-process mining. |
Challenge
| Dataset | Highlights |
|---|---|
| perfect-audit-crime-data | The ledgers behind the Perfect Audit Crime Challenge — 3 multi-entity GLs across two tracks (ledger / ledger + ISA-520/505 evidence) with a planted mimetic perfect-crime family; labels held out for scoring. |
| Model | What it is |
|---|---|
| je-fraud-gnn | GraphSAGE 2-layer journal-entry fraud classifier (test AUC 0.914) + attribute-reconstruction GAE node-anomaly scorer (per-edge AUC 0.654, unsupervised). Includes weights, preprocessor, and full metrics. |
All datasets and demos are synthetic and contain no client or real-world data.