GNN-based Jet Classification Using Public LHC Data
June 9, 2025
Ongoing Project — June 2025 to Present
🎯 Objective
This project applies Graph Neural Networks (GNNs) to classify top-quark jets vs. QCD background using public LHC data. GNNs allow modeling jets as particle-level graphs, capturing fine-grained jet substructure patterns that traditional CNNs or BDTs may miss.
🔍 Motivation
In high-energy collisions at the LHC, jets from hadronization carry rich internal structure. Standard classifiers often rely on image-like calorimeter data or handcrafted observables.
Here, jets are treated as graphs $( G = (V, E) )$ where:
- $( V )$: particles with features $( {p_T, \eta, \phi, E} )$
- $( E )$: kinematic or spatial proximity $(( \Delta R < 0.4 ))$
This method retains the natural geometry of collider events and is scalable to variable particle counts.
📁 Dataset
Top Tagging Reference Dataset
Zenodo: DOI 10.5281/zenodo.2603256
- 1.2M simulated jets from LHC-like events
- Labels:
1
= top-quark jets,0
= QCD background - Each jet contains up to 200 constituents
- Features per particle: $( p_T, \eta, \phi, E )$
- Format: HDF5
🧠 Methodology
Graph Construction
- Build edges using $( k )$-nearest neighbors or $( \Delta R )$ metric in $( \eta-\phi )$ space
- Normalize features $(e.g., ( p_T ))$ and truncate/pad particles
GNN Models
- Message Passing Neural Networks (MPNN)
- EdgeConv Layers from DGCNN
- Trained using PyTorch Geometric
- The best model according to me is ParticleNet. Read complete documentation here.
Evaluation
- ROC AUC, accuracy, precision-recall
- Compare against CNN baselines (jet images)
- Interpret embeddings via t-SNE/UMAP
🛠️ Tech Stack
Python | PyTorch Geometric | NumPy | JetNet | h5py | matplotlib | scikit-learn
Phase | Milestone | Status |
---|---|---|
Phase 1 | Literature review on GNNs in jet physics | ✅ Completed |
Phase 2 | Dataset acquisition & preprocessing | ✅ Completed |
Phase 3 | Jet graph construction & baseline model | 🟡 In Progress |
Phase 4 | GNN tuning & evaluation | 🔲 Upcoming |
Phase 5 | Final report & documentation | 🔲 Upcoming |
📌 Goals
- Demonstrate the utility of GNNs in jet tagging
- Compare with traditional ML and image-based methods
- Develop a clear, reproducible pipeline for collider physicists interested in ML
📚 References & Related Work
🔹 Jet Flavor Identification
-
Bols et al. (2020) – Jet Flavour Classification Using DeepJet
Developed DeepJet, a deep learning-based classifier for jet flavor identification using low-level and high-level features.
arXiv:2008.10519 -
Li & Smith (2024) – ParticleNet and its Application on CEPC Jet Flavor Tagging
Explores GNN-based ParticleNet model for identifying heavy-flavor jets in future collider scenarios, achieving strong performance.
EPJC Article
🔹 Graph Neural Networks in Particle Physics
-
Genovese et al. (2025) – Mixture-of-Experts Graph Transformers for Interpretable Particle Collision Detection
Introduces interpretable MoE-Graph Transformers applied to high-energy physics classification tasks.
arXiv:2501.03432 -
Tripathy et al. (2025) – Scaling GNNs for Particle Track Reconstruction
Enhances the Exa.TrkX GNN pipeline for better scalability and efficiency in dense LHC environments.
arXiv:2504.04670
🔹 Jet Substructure and Heavy-Flavor Tagging
-
Hammad & Nojiri (2024) – Transformer Networks for Heavy Flavor Jet Tagging
Evaluates transformer-based models for identifying jets from heavy quarks, showing competitive results.
arXiv:2411.11519 -
EPJC (2022) – Jet Flavour Tagging for Future Colliders with Fast Simulation
Discusses use of GNNs and machine learning for jet tagging in future collider detectors like FCC-ee.
Springer Link