[Halld-offline] non-reproducibility study

Paul Mattione pmatt at jlab.org
Tue Mar 11 17:32:10 EDT 2014


Beni, Sean, and I did a bunch of studies over the weekend; I checked in some changes to the trunk that improve things, but there are still issues.  What we've done/determined so far: 

1) Fixed bugs: Variables were sometimes uninitialized because functions (such as DReferenceTrajectory::DistToRT()) were exited from early.  

2) Fixed bugs: Sometimes we weren't checking that the return value was "NOERROR" on some of the DReferenceTrajectory methods.  

3) It looks like everything through wire-based tracking is OK, except DTrackWireBased::t0() is non-deterministic (the rest of the tracking parameters are fine).  This was discovered by saving the DTrackWireBased results to REST instead of DTrackTimeBased in our local test builds, then running hddm-xml and diff on the output (I agree this is the best way to go).  

4) It looks like the track-BCAL/FCAL/TOF/SC matching is now deterministic (it wasn't before).  However, it looks like there are occasional differences when multithreading.   Perhaps this is due to problems with 3)??  

5) It looks like time-based tracking still has issues, which are worse when running multithreaded.  Perhaps these are related to 3)??

There are other potential problems but it's hard to tell for certain; I'm not sure, but it looks like some of these bugs behave differently when compiling with different versions of gcc. 

I tried to fix all the bugs I could find that were outside of the Kalman Filter, but it's hard to tell if I got them all.  We got Simon up to date so he's looking at things now.  

 - Paul

On Mar 11, 2014, at 5:14 PM, Mark Ito wrote:

> Folks,
> 
> Ran 1000 short jobs using the latest DC2 tags.
> 
>   tags/hdds-dc-2.1
>   tags/sim-recon-dc-2.4
> 
> No EM background, 1000 events per job, saving the smeared raw data 
> (hdgeant_smeared.hddm). Then I ran the same hd_root again against the 
> saved smeared raw data.
> 
> Doing a diff on the dana_rest output:
> 
> differ = 994
> identical = 6
> 
> I then dumped the hddm into xml form and diffed the output for a dozen 
> pairs. Find the results at
> 
>   https://halldweb1.jlab.org/talks/2014-1Q/diffhddm.txt
> 
> You can see that the differences are not that big. You can tell that the 
> same event was analyzed in iteration 1 and iteration 2. Close but no 
> cigar. I could look at more files if that is of interest to anyone.
> 
> The rest files and the root files are at JLab in
> 
>   /volatile/halld/home/gluex/proj/dc_02_1
>   /volatile/halld/home/gluex/proj/dc_02_2
> 
> Each of these directories has a rest directory and a hd_root directory. 
> 1 and 2 indicate the original run and the re-run respectively, of 
> course. Compression was off and there are no short rest files.
> 
> Hope this helps someone. Not sure what my next step should be. 
> Suggestions welcome.
> 
>   -- Mark
> _______________________________________________
> Halld-offline mailing list
> Halld-offline at jlab.org
> https://mailman.jlab.org/mailman/listinfo/halld-offline





More information about the Halld-offline mailing list