Model Card for mlpf-cms-v2.1.0
This model reconstructs particles in a detector, based on the tracks and calorimeter clusters recorded by the detector.
Model Details
The performance is measured with respect to generator-level jets and MET computed from Pythia particles, i.e. the truth-level jets and MET.
Jet performance
![ttbar jet resolution](/jpata/particleflow/resolve/main/cms/v2.1.0/pyg-cms_20241101_090645_682892/plots_checkpoint-18-2.778778/cms_pf_qcd/jet_response_iqr_over_med_pt.png)
![qq jet resolution](/jpata/particleflow/resolve/main/cms/v2.1.0/pyg-cms_20241101_090645_682892/plots_checkpoint-18-2.778778/cms_pf_ttbar/jet_response_iqr_over_med_pt.png)
![ttbar jet resolution](/jpata/particleflow/resolve/main/cms/v2.1.0/pyg-cms_20241101_090645_682892/plots_checkpoint-18-2.778778/cms_pf_ztt/jet_response_iqr_over_med_pt.png)
MET performance
![ttbar MET resolution](/jpata/particleflow/resolve/main/cms/v2.1.0/pyg-cms_20241101_090645_682892/plots_checkpoint-18-2.778778/cms_pf_qcd/met_response_iqr_over_med.png)
![qq MET resolution](/jpata/particleflow/resolve/main/cms/v2.1.0/pyg-cms_20241101_090645_682892/plots_checkpoint-18-2.778778/cms_pf_ttbar/met_response_iqr_over_med.png)
![ttbar MET resolution](/jpata/particleflow/resolve/main/cms/v2.1.0/pyg-cms_20241101_090645_682892/plots_checkpoint-18-2.778778/cms_pf_ztt/met_response_iqr_over_med.png)
Model Description
- Developed by: CMS MLPF Team
- Model type: transformer
- License: Apache License
Model Sources
Uses
Direct Use
This model may be used to study the physics and computational performance on ML-based reconstruction in simulation within the CMS collaboration.
Out-of-Scope Use
This model is not intended for physics measurements on real data or for use outside the CMS collaboration.
Bias, Risks, and Limitations
The model has only been trained on simulation data and has not been validated against real data. The model has not been peer reviewed or published in a peer-reviewed journal.
How to Get Started with the Model
Use the code below to get started with the model.
#get the code
git clone https://github.com/jpata/particleflow
cd particleflow
git checkout v2.1.0
#get the models
git clone https://huggingface.co/jpata/particleflow models
Training Details
Trained on 8x MI250X for 18 epochs over ~26 days. The training was continued multiple times from a checkpoint due to the 24h time limit.
Training Data
The following datasets were used:
179G /local/joosep/mlpf/tensorflow_datasets/cms/cms_pf_qcd/2.5.0
84G /local/joosep/mlpf/tensorflow_datasets/cms/cms_pf_qcd_nopu/2.5.0
179G /local/joosep/mlpf/tensorflow_datasets/cms/cms_pf_ttbar/2.5.0
86G /local/joosep/mlpf/tensorflow_datasets/cms/cms_pf_ttbar_nopu/2.5.0
173G /local/joosep/mlpf/tensorflow_datasets/cms/cms_pf_ztt/2.5.0
57G /local/joosep/mlpf/tensorflow_datasets/cms/cms_pf_ztt_nopu/2.5.0
Training Procedure
#!/bin/bash
#SBATCH --job-name=mlpf-train
#SBATCH --account=project_465000301
#SBATCH --time=3-00:00:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=32
#SBATCH --mem=400G
#SBATCH --gpus-per-task=8
#SBATCH --partition=small-g
#SBATCH --no-requeue
#SBATCH -o logs/slurm-%x-%j-%N.out
cd /scratch/project_465000301/particleflow
module load LUMI/24.03 partition/G
export IMG=/scratch/project_465000301/pytorch-rocm6.2.simg
export PYTHONPATH=`pwd`
export TFDS_DATA_DIR=/scratch/project_465000301/tensorflow_datasets
#export MIOPEN_DISABLE_CACHE=true
export MIOPEN_USER_DB_PATH=/tmp/${USER}-${SLURM_JOB_ID}-miopen-cache
export MIOPEN_CUSTOM_CACHE_DIR=${MIOPEN_USER_DB_PATH}
export TF_CPP_MAX_VLOG_LEVEL=-1 #to suppress ROCm fusion is enabled messages
export ROCM_PATH=/opt/rocm
#export NCCL_DEBUG=INFO
#export MIOPEN_ENABLE_LOGGING=1
#export MIOPEN_ENABLE_LOGGING_CMD=1
#export MIOPEN_LOG_LEVEL=4
export KERAS_BACKEND=torch
env
#TF training
singularity exec \
--rocm \
-B /scratch/project_465000301 \
-B /tmp \
--env LD_LIBRARY_PATH=/opt/rocm/lib/ \
--env CUDA_VISIBLE_DEVICES=$ROCR_VISIBLE_DEVICES \
$IMG python3 mlpf/pipeline.py --gpus 8 \
--data-dir $TFDS_DATA_DIR --config parameters/pytorch/pyg-cms.yaml \
--train --gpu-batch-multiplier 5 --num-workers 8 --prefetch-factor 50 --checkpoint-freq 1 --conv-type attention --dtype bfloat16 --lr 0.0001
Evaluation
#!/bin/bash
#SBATCH --partition gpu
#SBATCH --gres gpu:mig:1
#SBATCH --mem-per-gpu 100G
#SBATCH -o logs/slurm-%x-%j-%N.out
IMG=/home/software/singularity/pytorch.simg:2024-08-18
cd ~/particleflow
WEIGHTS=experiments/pyg-cms_20241101_090645_682892/checkpoints/checkpoint-08-2.986092.pth
DATASET=$1
env
singularity exec -B /scratch/persistent --nv \
--env PYTHONPATH=`pwd` \
--env KERAS_BACKEND=torch \
$IMG python mlpf/pipeline.py --gpus 1 \
--data-dir /scratch/persistent/joosep/tensorflow_datasets --config parameters/pytorch/pyg-cms-nopu.yaml \
--test --make-plots --gpu-batch-multiplier 2 --load $WEIGHTS --ntest 50000 --dtype bfloat16 --num-workers 8 --prefetch-factor 10 --test-datasets $DATASET
Citation
Glossary
- PF: particle flow reconstruction
- MLPF: machine learning for particle flow
- CMS: Compact Muon Solenoid
Model Card Contact
Joosep Pata, joosep.pata@cern.ch