[Halld-offline] Fwd: r7587 - trunk/sim-recon/src/libraries/TRACKING

David Lawrence davidl at jlab.org
Thu Mar 17 15:24:53 EDT 2011


Hi Richard,

     Let's put this on the agenda (and by us, I mean Mark) for the 
offline meeting next week.

Note that we currently have around 150k lines of code (+30k for JANA). 
With 5 brave volunteers, they would need to review about 30k lines each. 
In the 150k lines, we have about 290 uses of STL sort.

Regards,
-David

On 3/17/11 2:47 PM, Richard Jones wrote:
> David and Curtis,
>
> I agree with David's alternative fix to the the SortInteractions() 
> code.  The point of my example was to clarify what was causing the 
> problem, not to recommend the best fix.  Maurizio Ungaro, who also saw 
> my post, suggested an elegant alternative, by replacing the "double" 
> declarations for the local variables to hold intermediate results in 
> SortInteractions() with "long double".  This should work both on i686 
> and x86_64 hardware, and neatly avoids the entropy problem by saving 
> the full 80bit mantissa by extending the double representation in 
> memory to the full precision (actually extends it to 96 bits for m32 
> and 128 bits for m64).
>
> More generally I agree with Curtis that this is a general issue, not 
> just about STL sort.  We need to look at all instances in the code 
> where comparisons are being made between doubles, and make sure that 
> fuzzy compares are being done, or the logic is tolerant of a little 
> entropy.  Maybe we can divide up the work between us.  Between 4 or 5 
> of us, we could do this before the May collaboration meeting.
>
> -Richard J.
>
>
>
>
>
> On 3/17/2011 12:20 PM, David Lawrence wrote:
>> Hi Curtis,
>>
>>      I have created a new issue in Mantis to review the existing STL 
>> sort calls in our reconstruction to ensure they do not have the 
>> potential of having a similar bug.
>>
>> https://halldnew.jlab.org/mantisbt/view.php?id=50
>>
>> Regards,
>> -David
>>
>> On 3/17/11 12:14 PM, Curtis A. Meyer wrote:
>> Hi David -
>>
>>      does it make sense to open a "general ticket" in Mantis about 
>> the more general
>> affects of this issue?
>>
>>    Curtis
>> On 3/17/11 11:23 AM, David Lawrence wrote:
>>
>> Hi All,
>>
>>      I've just committed a fix to the seg. fault/hang problem based 
>> on Richard's analysis. This is slightly different than the fix 
>> Richard suggested. It avoids the 80bit/64bit comparison issue by 
>> pre-calculating the values to be compared rather than doing it in the 
>> sort algorithm itself. This should also speed things up a little 
>> since DVector3::Perp() is not being called repeatedly during the sort 
>> for the same object.
>>
>>      I have been able to run through my one reliably-problematic 
>> event using the new code without any problem. If anyone notices an 
>> issue, please let me know.
>>
>>      This problem has been marked as resolved in Mantis.
>>
>> Regards,
>> -David
>>
>> -------- Original Message --------
>> Subject:        r7587 - trunk/sim-recon/src/libraries/TRACKING
>> Date:   Thu, 17 Mar 2011 11:16:40 -0400
>> From:   
>> Hall-D.SVN.Repository at jlab.org<mailto:Hall-D.SVN.Repository at jlab.org>
>> To:     davidl at jlab.org<mailto:davidl at jlab.org>, 
>> brash at pcs.cnu.edu<mailto:brash at pcs.cnu.edu>, 
>> wolin at jlab.org<mailto:wolin at jlab.org>, 
>> zisis at uregina.ca<mailto:zisis at uregina.ca>, 
>> mashephe at indiana.edu<mailto:mashephe at indiana.edu>, 
>> remitche at indiana.edu<mailto:remitche at indiana.edu>, 
>> zihlmann at jlab.org<mailto:zihlmann at jlab.org>, 
>> somov at jlab.org<mailto:somov at jlab.org>, 
>> staylor at jlab.org<mailto:staylor at jlab.org>
>>
>>
>>
>> Author: davidl
>> Date: 2011-03-17 11:16:39 -0400 (Thu, 17 Mar 2011)
>> New Revision: 7587
>>
>> Modified:
>>     
>> trunk/sim-recon/src/libraries/TRACKING/DTrackCandidate_factory_CDC.cc
>> Log:
>> This is a fix for the seg. fault/hang problem that has been plaguing us
>> for the last ~4 months. It precalculates the values used in the 
>> comparison
>> in the SortIntersections routine to avoid issues with values calculated
>> with 80bit precision being compared with values having been copied to 
>> and
>> from a 64bit register. See the report on the GlueX wiki here:
>>
>> http://www.jlab.org/Hall-D/software/wiki/index.php/Diagnosing_segmentation_faults_in_reconstruction_software 
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Halld-offline mailing list
>> Halld-offline at jlab.org<mailto:Halld-offline at jlab.org>
>> https://mailman.jlab.org/mailman/listinfo/halld-offline
>>
>>
>>
>> -- 
>> Prof. Curtis A. Meyer           Department of Physics
>> Phone:  (412) 268-2745          Carnegie Mellon University
>> Fax:    (412) 681-0648          Pittsburgh PA 15213-3890
>> cmeyer at ernest.phys.cmu.edu<mailto:cmeyer at ernest.phys.cmu.edu>    
>> http://www.curtismeyer.com/
>>
>>
>
>
>
> _______________________________________________
> Halld-offline mailing list
> Halld-offline at jlab.org
> https://mailman.jlab.org/mailman/listinfo/halld-offline
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailman.jlab.org/pipermail/halld-offline/attachments/20110317/70d7deef/attachment.html 


More information about the Halld-offline mailing list