DeFi Fraud Analytics from On-Chain Transaction Databases
Main article
Abstract
Decentralized finance (DeFi) has expanded rapidly since 2020, but the pseudonymous and permissionless character of blockchain networks has simultaneously created conditions that favour financial fraud at an unprecedented scale. This article presents DATAMIND-OnChain, a structured on-chain transaction database purpose-built to support reproducible fraud analytics across five categories: money laundering through layered wallets, pump-and-dump token manipulation, rug-pull contract abandonment, flash loan-enabled arbitrage attacks, and smart contract exploits. The database integrates six core relational tables covering transactions, wallets, smart contracts, token transfers, blacklisted entities, and fraud alerts, complemented by a graph database layer for wallet-network analysis and a vector store for contract-embedding search. Data are collected from Ethereum mainnet and Binance Smart Chain over a 36-month period spanning January 2020 to December 2022, yielding 214 million transaction records, 8.9 million unique wallet addresses, and 1.3 million smart-contract deployments. A three-stage analytical pipeline combining graph embedding, community detection, and temporal pattern mining is applied, and a DATAMIND-GNN model is benchmarked against Isolation Forest and GraphSAGE baselines across all five fraud categories. DATAMIND-GNN achieves a macro-averaged F1 score of 0.86, outperforming baselines by 6 to 19 percentage points. Early-warning lead times range from 0.3 days for flash loan events to 6.8 days for pump-and-dump schemes. Fund-flow tracing coverage reaches 91 percent at five transaction hops. The database schema, indexing strategy, data-quality statistics, and open-access protocols are documented to support reproducible experimentation and automated fraud-intelligence pipelines.
