<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
<br>
Hi Richard,<br>
<br>
Let's put this on the agenda (and by us, I mean Mark) for the
offline meeting next week. <br>
<br>
Note that we currently have around 150k lines of code (+30k for
JANA). With 5 brave volunteers, they would need to review about 30k
lines each. In the 150k lines, we have about 290 uses of STL sort.<br>
<br>
Regards,<br>
-David<br>
<br>
On 3/17/11 2:47 PM, Richard Jones wrote:
<blockquote cite="mid:4D82573A.3020600@uconn.edu" type="cite">David
and Curtis,
<br>
<br>
I agree with David's alternative fix to the the SortInteractions()
code. The point of my example was to clarify what was causing the
problem, not to recommend the best fix. Maurizio Ungaro, who also
saw my post, suggested an elegant alternative, by replacing the
"double" declarations for the local variables to hold intermediate
results in SortInteractions() with "long double". This should
work both on i686 and x86_64 hardware, and neatly avoids the
entropy problem by saving the full 80bit mantissa by extending the
double representation in memory to the full precision (actually
extends it to 96 bits for m32 and 128 bits for m64).
<br>
<br>
More generally I agree with Curtis that this is a general issue,
not just about STL sort. We need to look at all instances in the
code where comparisons are being made between doubles, and make
sure that fuzzy compares are being done, or the logic is tolerant
of a little entropy. Maybe we can divide up the work between us.
Between 4 or 5 of us, we could do this before the May
collaboration meeting.
<br>
<br>
-Richard J.
<br>
<br>
<br>
<br>
<br>
<br>
On 3/17/2011 12:20 PM, David Lawrence wrote:
<br>
<blockquote type="cite">Hi Curtis,
<br>
<br>
I have created a new issue in Mantis to review the existing
STL sort calls in our reconstruction to ensure they do not have
the potential of having a similar bug.
<br>
<br>
<a class="moz-txt-link-freetext" href="https://halldnew.jlab.org/mantisbt/view.php?id=50">https://halldnew.jlab.org/mantisbt/view.php?id=50</a>
<br>
<br>
Regards,
<br>
-David
<br>
<br>
On 3/17/11 12:14 PM, Curtis A. Meyer wrote:
<br>
Hi David -
<br>
<br>
does it make sense to open a "general ticket" in Mantis
about the more general
<br>
affects of this issue?
<br>
<br>
Curtis
<br>
On 3/17/11 11:23 AM, David Lawrence wrote:
<br>
<br>
Hi All,
<br>
<br>
I've just committed a fix to the seg. fault/hang problem
based on Richard's analysis. This is slightly different than the
fix Richard suggested. It avoids the 80bit/64bit comparison
issue by pre-calculating the values to be compared rather than
doing it in the sort algorithm itself. This should also speed
things up a little since DVector3::Perp() is not being called
repeatedly during the sort for the same object.
<br>
<br>
I have been able to run through my one reliably-problematic
event using the new code without any problem. If anyone notices
an issue, please let me know.
<br>
<br>
This problem has been marked as resolved in Mantis.
<br>
<br>
Regards,
<br>
-David
<br>
<br>
-------- Original Message --------
<br>
Subject: r7587 - trunk/sim-recon/src/libraries/TRACKING
<br>
Date: Thu, 17 Mar 2011 11:16:40 -0400
<br>
From:
<a class="moz-txt-link-abbreviated" href="mailto:Hall-D.SVN.Repository@jlab.org">Hall-D.SVN.Repository@jlab.org</a><a class="moz-txt-link-rfc2396E" href="mailto:Hall-D.SVN.Repository@jlab.org"><mailto:Hall-D.SVN.Repository@jlab.org></a><br>
To: <a class="moz-txt-link-abbreviated" href="mailto:davidl@jlab.org">davidl@jlab.org</a><a class="moz-txt-link-rfc2396E" href="mailto:davidl@jlab.org"><mailto:davidl@jlab.org></a>,
<a class="moz-txt-link-abbreviated" href="mailto:brash@pcs.cnu.edu">brash@pcs.cnu.edu</a><a class="moz-txt-link-rfc2396E" href="mailto:brash@pcs.cnu.edu"><mailto:brash@pcs.cnu.edu></a>,
<a class="moz-txt-link-abbreviated" href="mailto:wolin@jlab.org">wolin@jlab.org</a><a class="moz-txt-link-rfc2396E" href="mailto:wolin@jlab.org"><mailto:wolin@jlab.org></a>,
<a class="moz-txt-link-abbreviated" href="mailto:zisis@uregina.ca">zisis@uregina.ca</a><a class="moz-txt-link-rfc2396E" href="mailto:zisis@uregina.ca"><mailto:zisis@uregina.ca></a>,
<a class="moz-txt-link-abbreviated" href="mailto:mashephe@indiana.edu">mashephe@indiana.edu</a><a class="moz-txt-link-rfc2396E" href="mailto:mashephe@indiana.edu"><mailto:mashephe@indiana.edu></a>,
<a class="moz-txt-link-abbreviated" href="mailto:remitche@indiana.edu">remitche@indiana.edu</a><a class="moz-txt-link-rfc2396E" href="mailto:remitche@indiana.edu"><mailto:remitche@indiana.edu></a>,
<a class="moz-txt-link-abbreviated" href="mailto:zihlmann@jlab.org">zihlmann@jlab.org</a><a class="moz-txt-link-rfc2396E" href="mailto:zihlmann@jlab.org"><mailto:zihlmann@jlab.org></a>,
<a class="moz-txt-link-abbreviated" href="mailto:somov@jlab.org">somov@jlab.org</a><a class="moz-txt-link-rfc2396E" href="mailto:somov@jlab.org"><mailto:somov@jlab.org></a>,
<a class="moz-txt-link-abbreviated" href="mailto:staylor@jlab.org">staylor@jlab.org</a><a class="moz-txt-link-rfc2396E" href="mailto:staylor@jlab.org"><mailto:staylor@jlab.org></a>
<br>
<br>
<br>
<br>
Author: davidl
<br>
Date: 2011-03-17 11:16:39 -0400 (Thu, 17 Mar 2011)
<br>
New Revision: 7587
<br>
<br>
Modified:
<br>
trunk/sim-recon/src/libraries/TRACKING/DTrackCandidate_factory_CDC.cc
<br>
Log:
<br>
This is a fix for the seg. fault/hang problem that has been
plaguing us
<br>
for the last ~4 months. It precalculates the values used in the
comparison
<br>
in the SortIntersections routine to avoid issues with values
calculated
<br>
with 80bit precision being compared with values having been
copied to and
<br>
from a 64bit register. See the report on the GlueX wiki here:
<br>
<br>
<a class="moz-txt-link-freetext" href="http://www.jlab.org/Hall-D/software/wiki/index.php/Diagnosing_segmentation_faults_in_reconstruction_software">http://www.jlab.org/Hall-D/software/wiki/index.php/Diagnosing_segmentation_faults_in_reconstruction_software</a>
<br>
<br>
<br>
<br>
<br>
<br>
_______________________________________________
<br>
Halld-offline mailing list
<br>
<a class="moz-txt-link-abbreviated" href="mailto:Halld-offline@jlab.org">Halld-offline@jlab.org</a><a class="moz-txt-link-rfc2396E" href="mailto:Halld-offline@jlab.org"><mailto:Halld-offline@jlab.org></a>
<br>
<a class="moz-txt-link-freetext" href="https://mailman.jlab.org/mailman/listinfo/halld-offline">https://mailman.jlab.org/mailman/listinfo/halld-offline</a>
<br>
<br>
<br>
<br>
--
<br>
Prof. Curtis A. Meyer Department of Physics
<br>
Phone: (412) 268-2745 Carnegie Mellon University
<br>
Fax: (412) 681-0648 Pittsburgh PA 15213-3890
<br>
<a class="moz-txt-link-abbreviated" href="mailto:cmeyer@ernest.phys.cmu.edu">cmeyer@ernest.phys.cmu.edu</a><a class="moz-txt-link-rfc2396E" href="mailto:cmeyer@ernest.phys.cmu.edu"><mailto:cmeyer@ernest.phys.cmu.edu></a>
<a class="moz-txt-link-freetext" href="http://www.curtismeyer.com/">http://www.curtismeyer.com/</a>
<br>
<br>
<br>
</blockquote>
<br>
<br>
<pre wrap="">
<fieldset class="mimeAttachmentHeader"></fieldset>
_______________________________________________
Halld-offline mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Halld-offline@jlab.org">Halld-offline@jlab.org</a>
<a class="moz-txt-link-freetext" href="https://mailman.jlab.org/mailman/listinfo/halld-offline">https://mailman.jlab.org/mailman/listinfo/halld-offline</a></pre>
</blockquote>
</body>
</html>