[Halld-offline] non-reproducibility study

Mark Ito marki at jlab.org
Wed Mar 12 17:37:08 EDT 2014


Re-did yesterday's exercise with tag 2.5. Still see difference as others do.

But I realized that I am not running on the same machine each time on 
the batch farm. Should that matter? All nodes have the same OS and are 
all 64-bit. Intel vs AMD? Intel A vs. Intel B? Don't know.

The diffs for three files are at 
https://halldweb1.jlab.org/talks/2014-1Q/diffhddm_2.5.txt FWIW. Note 
that I filtered out differences generated by 0 != -0 .

On 03/12/2014 01:46 PM, Paul Mattione wrote:
> I just checked DTrackWireBased::t0() (single-threaded via saving to REST, hddm-xml, and diff).  The results are still sometimes inconsistent, and it happens on inconsistent event numbers.
>
> Sometimes it's the difference between a t0 of 0 and -0, although sometimes it's not.  However, in every case the t0err is 10.0, which means it's coming from the DTrackFitterKalmanSIMD::mT0MinimumDriftTime variable.  Also, most (but not all) of the events with these discrepancies have different values for t0det (CDC or FDC).  This means that DTrackFitterKalmanSIMD::mMinDriftID is inconsistent as well.
>
> I've attached a log file showing the results of running the diff command on one of the file pairs.  Hopefully this helps narrow it down some more.
>
>   - Paul
>
>
>
>
>
> On Mar 12, 2014, at 11:56 AM, Paul Mattione wrote:
>
>> It looks like calling TObject::SetObjectStat(kFALSE) (say, in the DApplication constructor) will avoid using the TObjectTable for all TObjects, but I don't know what the side effects of this will be.
>>
>> - Paul
>>
>> On Mar 12, 2014, at 9:35 AM, Simon Taylor wrote:
>>
>>> On 03/11/2014 05:32 PM, Paul Mattione wrote:
>>>> 3) It looks like everything through wire-based tracking is OK, except DTrackWireBased::t0() is non-deterministic (the rest of the tracking parameters are fine).  This was discovered by saving the DTrackWireBased results to REST instead of DTrackTimeBased in our local test builds, then running hddm-xml and diff on the output (I agree this is the best way to go).
>>> I have checked in a change to DTrackFitterKalmanSIMD that I believe may address this issue.  I changed how and where this t0 is computed slightly,
>>> although I do not understand why the previous method caused non-deterministic behavior.  Please let me know if the new version helps.
>>>
>>>> 4) It looks like the track-BCAL/FCAL/TOF/SC matching is now deterministic (it wasn't before).  However, it looks like there are occasional differences when multithreading.   Perhaps this is due to problems with 3)??
>>>>
>>>> 5) It looks like time-based tracking still has issues, which are worse when running multithreaded.  Perhaps these are related to 3)??
>>>>
>>> I have been looking into the multi-threading issues with valgrind. There are some indications that our use of TVector2, TVector3, and TMatrix in various parts of the tracking code are not thread safe because of an apparent use of global memory by TObject (from which all of the above classes inherit) through something called TObjectTable.   I have developed my own DVector2 and DVector3 classes to avoid this TObject dependency but
>>> it will take more effort to move away from TMatrix if this is what we want to do.
>>>
>>> Simon
>>>
>>>
>>
>> _______________________________________________
>> Halld-offline mailing list
>> Halld-offline at jlab.org
>> https://mailman.jlab.org/mailman/listinfo/halld-offline
>
>
> _______________________________________________
> Halld-offline mailing list
> Halld-offline at jlab.org
> https://mailman.jlab.org/mailman/listinfo/halld-offline

-- 
Mark M. Ito, Jefferson Lab, marki at jlab.org, (757)269-5295

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/halld-offline/attachments/20140312/061daecf/attachment-0002.html>


More information about the Halld-offline mailing list