[Halld-offline] REST Results Non-deterministic
pmatt at jlab.org
Fri Mar 7 00:43:12 EST 2014
The results of the danarest plugin are sometimes non-deterministic. I was trying to do a study of the small-REST-file problem, so I picked an hdgeant_smeared.hddm file that previously led to a small REST file. I then submitted a bunch of jobs to execute danarest on this file with compression enabled, and none of the resulting REST files were small. In addition, the file sizes of both the REST and the ROOT files were (minutely) different from job to job, and the number of reconstructed objects varied very slightly from file to file (e.g. +/- 1-3 tracks, or +/- 1-3 track-shower matches in a 5k-event file). This is the case with both the dc-2 branch and the trunk.
I think we need to track this down soon ... in addition to the obvious reasons, fixing it may lead us to more consistently-reproducible small REST files. I would appreciate some help on this; the file I've been testing this on is at (on ifarm1102):
With new histograms in the monitoring_hists plugin, you can see the # of reconstructed tracks, hits, showers, and track-shower matches. Between executions I only see differences in the # of time-based tracks (not wire-based), the # of neutral showers (not # bcal or fcal showers), and the # of matches between tracks and TOF/BCAL/FCAL/SC hits. This suggests (at least for this file) that the problem is in the time-based tracking or in the track-shower/TOF/SC matching.
While searching for this, I fixed a problem in the track-shower matching code that was leading to undefined behavior. While better, the REST results are still slightly non-deterministic. Details:
Every now and then a track gets reconstructed with completely unphysical values (e.g. with a track position closer to infinity than the beamline). After track reconstruction the track gets swum through the magnetic field, but the swimming terminates when the track leaves the physical volume (whose x/y/z/r values are hard-coded, by the way...). For these unphysical tracks, this results in a DReferenceTrajectory object with only one swim step.
The problem was occurring when the reconstruction code would attempt to match BCAL showers, TOF hits, etc. to these tracks. It was calling the match functions with variables for the path length and flight time passed in by reference, but in some cases these were passed in uninitialized. On these bogus tracks, they would also be returned uninitialized, causing undefined behavior.
The solution (checked into trunk):
1) Prior to calling these functions, the path length and flight time are initialized to 9E9.
2) When the bogus track case is detected, the DReferenceTrajectory::DistToRTwithTime() method sets these variables anyway (to 1E6).
3) Tracks with only one swim step are no longer saved: they get rejected prior to registering them with the factories.
4) In the monitoring_hists plugin, the # of reconstructed tracks, showers, hits, etc. are histogrammed.
But like I said, I'm still getting non-consistent results, even with this fix (as demonstrated by the updated monitoring_hists plugin).
More information about the Halld-offline