<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
<br>
Hi Richard,<br>
<br>
This is a nice piece of detective work. I believe there is a
violation of the Wolin principle of least astonishment in there
somewhere since one of the main advantages of working in a high
level language is to NOT have to worry about the hardware details
underneath. This does seem to indicate we can't remain so blissfully
ignorant of those details.<br>
<br>
Thanks for tracking this down.<br>
<br>
Regards,<br>
-David<br>
<br>
On 3/16/11 11:02 PM, Richard Jones wrote:
<blockquote cite="mid:4D8179C6.9010707@uconn.edu" type="cite">Dear
colleagues,
<br>
<br>
I have reproduced and diagnosed the segfaults that take place in
the current GlueX reconstruction code, when compiled for the i686
platform. Note that they also occur on 64bit hardware when
running the 32bit executable, so it is not just a 32bit issue.
The explanation is a bit too long for email, so I have written it
up in the form of a wiki page. Please see it at the following
URL.
<br>
<br>
<a class="moz-txt-link-freetext" href="http://www.jlab.org/Hall-D/software/wiki/index.php/Diagnosing_segmentation_faults_in_reconstruction_software">http://www.jlab.org/Hall-D/software/wiki/index.php/Diagnosing_segmentation_faults_in_reconstruction_software</a>
<br>
<br>
In that wiki page, I also explain why this should not be
considered to be a compiler optimization bug, but rather a bug in
our user code, in the context of x87 math. That, in spite of the
fact that recompiling with -O0 seemed to solve it! In fact,
turning off optimization is not a reliable solution, and the
current bug probably will break out again in -O0 code in the near
future, as g++ continues to evolve. What is more, in considering
the impact of this bug, the segfault is really only the tip of the
iceberg. I would expect this problem to be happening much more
often in -m32 builds, but only showing up as segfaults in the
(rare?) case that the memory between the valid data and the end of
the valid data segment contains all zeros. In what might be the
more normal occurrance of this bug, we could be getting bogus
results from the tracking and not know it. In other words, the
segfault is your friend.
<br>
<br>
Besides this, there is the more serious issue of how robust the
rest of the code is against what I might call the "x87 entropy
problem" with randomly fluctuating least-significant bits in
doubles. This probably warrants a broader discussion, beyond the
resolution of this particular bug.
<br>
<br>
-Richard J.
<br>
<br>
<br>
<pre wrap="">
<fieldset class="mimeAttachmentHeader"></fieldset>
_______________________________________________
Halld-offline mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Halld-offline@jlab.org">Halld-offline@jlab.org</a>
<a class="moz-txt-link-freetext" href="https://mailman.jlab.org/mailman/listinfo/halld-offline">https://mailman.jlab.org/mailman/listinfo/halld-offline</a></pre>
</blockquote>
</body>
</html>