[Halld-offline] non-reproducibility study

Paul Mattione pmatt at jlab.org
Wed Mar 12 13:46:40 EDT 2014


I just checked DTrackWireBased::t0() (single-threaded via saving to REST, hddm-xml, and diff).  The results are still sometimes inconsistent, and it happens on inconsistent event numbers.  

Sometimes it's the difference between a t0 of 0 and -0, although sometimes it's not.  However, in every case the t0err is 10.0, which means it's coming from the DTrackFitterKalmanSIMD::mT0MinimumDriftTime variable.  Also, most (but not all) of the events with these discrepancies have different values for t0det (CDC or FDC).  This means that DTrackFitterKalmanSIMD::mMinDriftID is inconsistent as well.  

I've attached a log file showing the results of running the diff command on one of the file pairs.  Hopefully this helps narrow it down some more.  

 - Paul

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: log.txt
URL: <https://mailman.jlab.org/pipermail/halld-offline/attachments/20140312/d61319f5/attachment-0002.txt>
-------------- next part --------------



On Mar 12, 2014, at 11:56 AM, Paul Mattione wrote:

> It looks like calling TObject::SetObjectStat(kFALSE) (say, in the DApplication constructor) will avoid using the TObjectTable for all TObjects, but I don't know what the side effects of this will be.  
> 
> - Paul
> 
> On Mar 12, 2014, at 9:35 AM, Simon Taylor wrote:
> 
>> On 03/11/2014 05:32 PM, Paul Mattione wrote:
>>> 3) It looks like everything through wire-based tracking is OK, except DTrackWireBased::t0() is non-deterministic (the rest of the tracking parameters are fine).  This was discovered by saving the DTrackWireBased results to REST instead of DTrackTimeBased in our local test builds, then running hddm-xml and diff on the output (I agree this is the best way to go).
>> 
>> I have checked in a change to DTrackFitterKalmanSIMD that I believe may address this issue.  I changed how and where this t0 is computed slightly,
>> although I do not understand why the previous method caused non-deterministic behavior.  Please let me know if the new version helps.
>> 
>>> 4) It looks like the track-BCAL/FCAL/TOF/SC matching is now deterministic (it wasn't before).  However, it looks like there are occasional differences when multithreading.   Perhaps this is due to problems with 3)??
>>> 
>>> 5) It looks like time-based tracking still has issues, which are worse when running multithreaded.  Perhaps these are related to 3)??
>>> 
>> 
>> I have been looking into the multi-threading issues with valgrind. There are some indications that our use of TVector2, TVector3, and TMatrix in various parts of the tracking code are not thread safe because of an apparent use of global memory by TObject (from which all of the above classes inherit) through something called TObjectTable.   I have developed my own DVector2 and DVector3 classes to avoid this TObject dependency but
>> it will take more effort to move away from TMatrix if this is what we want to do.
>> 
>> Simon
>> 
>> 
> 
> 
> _______________________________________________
> Halld-offline mailing list
> Halld-offline at jlab.org
> https://mailman.jlab.org/mailman/listinfo/halld-offline



More information about the Halld-offline mailing list