← All Projects

2023

Self-Supervised Graph Neural Networks for Blockchain Analytics

Built end-to-end GNN pipelines for Cardano (500M+ nodes) and Ethereum (3B+ nodes), producing unsupervised node embeddings that power entity resolution, Sybil detection, wallet profiling, and token analysis, all without labeled data.

Graph Neural NetworksPyTorch GeometricRustSelf-Supervised LearningCardanoEthereum

The Challenge

Blockchain analytics at scale demands understanding billions of interconnected transactions, wallets, and contracts. Traditional heuristics fail to capture the complex relational patterns needed for entity resolution, fraud detection, and behavioral analysis. Labeled training data is scarce or nonexistent, yet the graph structure itself encodes rich behavioral signals waiting to be extracted.

The Approach

Developed two full pipelines: one for Cardano's UTXO model and one for Ethereum's account model. Both produce dense vector embeddings for every node (wallet, transaction, contract, block) using self-supervised learning: the models learn entirely from graph structure without any labeled data. On Cardano, UTXO flow graphs trace fund movements across three founding entities (IOG, CF, Emurgo) to map convergence and mixing patterns. On Ethereum, a custom Rust ETL (streaming, O(1) memory) builds heterogeneous graphs with 4 node types and 23 edge types, including 17 bot-detection features on EOA nodes and DeFi primitives (DEX swaps, liquidity events, token transfers). Three self-supervised GNN architectures train on this graph: BGRL learns by comparing augmented views of the same graph without negative samples, GraphMAE reconstructs masked node features, and HGOT aligns semantic meta-path views via optimal transport. All three produce 128-dimensional embeddings that capture behavioral fingerprints. Wallets that behave similarly end up close together in embedding space.

The Results

The resulting embeddings unlock downstream tasks that would be impossible with rule-based methods: entity resolution (clustering wallets belonging to the same actor), Sybil detection (identifying coordinated bot networks via same-block activity patterns), hot wallet profiling (scoring traders by realized PnL and win rate), token appreciation prediction (surfacing tokens held by smart money), rug-pull detection (flagging serial deployers and early insider dumps), and CEX identification (detecting exchange consolidation patterns). The Cardano pipeline processes 88GB+ of UTXO data with zero-deadlock multithreading; the Ethereum pipeline targets the full chain at 3B+ nodes and 15B+ edges with a streaming Rust ETL that achieves 60% memory reduction through hashed address storage.