Skip to content

Validation

Compression is only useful if we can say clearly what it preserves and what it changes.

For that reason, step2point separates validation into two categories.

Quantities that should stay unchanged

These are the observables that define whether the compressed shower still behaves like the original shower.

Total shower energy

A natural diagnostic is the per-shower ratio:

E_post / E_pre

Using a ratio is usually more informative than looking at raw energies directly, because it factors out the broad physical energy range in the dataset.

Shower profiles

The compressed shower should preserve the broad shape of the shower:

  • longitudinal profile
  • radial profile
  • phi profile

Shower moments

Useful compact summaries are:

  • first longitudinal moment
  • second longitudinal moment
  • first radial moment
  • second radial moment

Detector-aware quantities

If cell_id is present, detector-aware checks become especially important:

  • distribution of log(cell_energy)
  • ratio of the number of cells before and after compression

Quantities that are expected to change

Some changes are not only acceptable but are the whole point of compression.

Point-energy spectrum

The distribution of individual point energies will change because points are being merged.

Number of points

The ratio

N_points_post / N_points_pre

is one of the central performance indicators of a compression algorithm.

Example inspection workflow

For shower inspection, use examples/inspect_showers.py. The script always produces a dataset-level observables, and it also produces single-shower plots if --shower-index is given.

Note: PYTHONPATH=src is only needed when running directly from a source checkout without installing the package first. If you already ran pip install -e .[dev], you can drop that prefix and use python ... directly.

Dataset-level only:

PYTHONPATH=src python examples/inspect_showers.py \
  --input tests/data/ODD_gamma_10ev_theta90deg_phi0deg_posX0mmY1250mmZ0mm_10GeV.h5 \
  --axis 0 1 0 \
  --outdir outputs/inspect_gamma

Dataset plus single-shower plots:

PYTHONPATH=src python examples/inspect_showers.py \
  --input tests/data/ODD_gamma_10ev_theta90deg_phi0deg_posX0mmY1250mmZ0mm_10GeV.h5 \
  --shower-index 0 \
  --axis 0 1 0 \
  --outdir outputs/inspect_gamma

Recommended axis override for these front-face ODD samples:

PYTHONPATH=src python examples/inspect_showers.py \
  --input tests/data/ODD_gamma_10ev_theta90deg_phi0deg_posX0mmY1250mmZ0mm_10GeV.h5 \
  --shower-index 0 \
  --axis 0 1 0 \
  --outdir outputs/inspect_gamma

Expected outputs:

  • dataset_observables.png
  • shower_<id>_projections.png
  • shower_<id>_distributions.png
  • shower_<id>_overview.png

Detector cell inspection

For detector-aware debugging of merging strategies, use examples/plot_detector_cells.py.

This workflow reads the DD4hep compact XML and factory-derived barrel geometry (Open Data Detector-like PolyhedraBarrel), then optionally overlays a shower from HDF5 or EDM4hep ROOT on top of:

  • module envelopes
  • layer outlines
  • cell footprints

Typical use, to see the entire detector:

PYTHONPATH=src python examples/plot_detector_cells.py \
  --compact-xml ../OpenDataDetector/xml/OpenDataDetector.xml \
  --collection ECalBarrelCollection \
  --draw-modules \
  --overlay-input tests/data/ODD_gamma_10ev_theta90deg_phi0deg_posX0mmY1250mmZ0mm_10GeV.h5 \
  --overlay-shower-index 0 \
  --outdir outputs/detector_cells

Zoomed cell view for one module:

PYTHONPATH=src python examples/plot_detector_cells.py \
  --compact-xml ../OpenDataDetector/xml/OpenDataDetector.xml \
  --collection ECalBarrelCollection \
  --draw-cells \
  --overlay-input tests/data/ODD_gamma_10ev_theta90deg_phi0deg_posX0mmY1250mmZ0mm_10GeV.h5 \
  --overlay-shower-index 0 \
  --outdir outputs/detector_cells \
  --module 10

Zoomed view with cells spanning only over the sensitive material:

PYTHONPATH=src python examples/plot_detector_cells.py \
  --compact-xml ../OpenDataDetector/xml/OpenDataDetector.xml \
  --collection ECalBarrelCollection \
  --draw-cells \
  --sensitive-only \
  --overlay-input tests/data/ODD_gamma_10ev_theta90deg_phi0deg_posX0mmY1250mmZ0mm_10GeV.h5 \
  --overlay-shower-index 0 \
  --outdir outputs/detector_cells \
  --module 10

Manual ranges can be controlled separately for:

  • axes only:
  • --xlim-axis
  • --ylim-axis
  • --zlim-axis
  • overlay-point selection only:
  • --xlim-points
  • --ylim-points
  • --zlim-points

Example with separate view crop and point filtering:

PYTHONPATH=src python examples/plot_detector_cells.py \
  --compact-xml ../OpenDataDetector/xml/OpenDataDetector.xml \
  --collection ECalBarrelCollection \
  --draw-cells \
  --sensitive-only \
  --overlay-input tests/data/ODD_gamma_10ev_theta90deg_phi0deg_posX0mmY1250mmZ0mm_10GeV.h5 \
  --overlay-shower-index 0 \
  --outdir outputs/detector_cells \
  --module 10 \
  --xlim-axis -7.65 7.65 \
  --ylim-axis 1307 1319 \
  --zlim-axis -7.65 7.65  \
  --xlim-points -7.65 7.65 \
  --ylim-points 1307 1319 \
  --zlim-points -7.65 7.65

There is also a --debug flag that allows to print the decoded cell ID bitfields to investigate visually the compression algorithms:

PYTHONPATH=src python examples/plot_detector_cells.py \
  --compact-xml ../OpenDataDetector/xml/OpenDataDetector.xml \
  --collection ECalBarrelCollection \
  --draw-cells \
  --sensitive-only \
  --overlay-input tests/data/ODD_gamma_10ev_theta90deg_phi0deg_posX0mmY1250mmZ0mm_10GeV_merge_within_cell_reference.h5 \
  --overlay-shower-index 0 \
  --outdir outputs/detector_cells \
  --module 10 \
  --xlim-axis -2.55 2.55 \
  --ylim-axis 1307 1319 \
  --zlim-axis -2.55 2.55  \
  --xlim-points -2.55 2.55 \
  --ylim-points 1307 1319 \
  --zlim-points -2.55 2.55 \
  --debug

The views below are all XY projections:

Module envelopes with shower overlay

detector modules xy

Module 10 cell view with shower verlay

detector cells xy

Module 10 sensitive-only cell view (zoomed-in) for input file

detector cells sensitive xy

Module 10 sensitive-only cell view (zoomed-in) for merge_within_cell compression

detector cells sensitive xy

Units used in the plots

The example HDF5 files in this repository are produced by step2point dataset repository, which preserves the EDM4hep values directly:

  • deposited energy is plotted as GeV
  • time is plotted as ns
  • positions and shower-shape coordinates are plotted as mm

NOTE: This matters only for axis labels and interpretation of the plots. The reclustering/compression code itself does not assume a special unit system: it preserves the units present in the input arrays. If an input dataset used different but internally consistent units, the compressed output would remain in the same units.

Validation results

EM showers

Animation below shows an example of a single electromagnetic shower:

gamma

Single-shower inspection outputs produced with --shower-index 0 on the gamma sample:

Projections

gamma projections

Distributions

gamma distributions

Overview

gamma overview

Dataset observables matrix

gamma matrix

hadronic showers

Animation below shows an example of a single hadronic shower:

pion

Single-shower inspection outputs produced with --shower-index 0 --axis 0 1 0 on the pion sample:

Projections

pion projections

Distributions

pion distributions

Overview

pion overview

Dataset observables

pion matrix