[Halld-offline] diagnosis of cause for segfaults in DTrackCandidate_factory_CDC::FindThetaZRegression()

Richard Jones richard.t.jones at uconn.edu
Thu Mar 17 09:04:02 EDT 2011


Beni,

It can hit you any time you are doing comparisons between two doubles that needs to respect some kind of strict [in]equality constraint.  For example, suppose you have an array of doubles, and somehow know one of the values, and want to search for which element in the array holds that value.  Then you need to do fuzzy compares to find it.  Think of it like entropy in the least significant digit of every double in the code.

-Richard J.


On 3/17/2011 8:36 AM, Beni Zihlmann wrote:
> Hi Richard,
> I have one question to this issue. If I understand this correctly it means that there is a
> potential problem every time I do an explicit calculation inside an if statement and
> comparing that with a variable from memory.
>
> cheers,
> Beni
>
> Dear colleagues,
>
> I have reproduced and diagnosed the segfaults that take place in the current GlueX reconstruction code, when compiled for the i686 platform.  Note that they also occur on 64bit hardware when running the 32bit executable, so it is not just a 32bit issue.  The explanation is a bit too long for email, so I have written it up in the form of a wiki page.  Please see it at the following URL.
>
> http://www.jlab.org/Hall-D/software/wiki/index.php/Diagnosing_segmentation_faults_in_reconstruction_software#How_to_diagnose_this_kind_of_error
>
> In that wiki page, I also explain why this should not be considered to be a compiler optimization bug, but rather a bug in our user code, in the context of x87 math.  That, in spite of the fact that recompiling with -O0 seemed to solve it!  In fact, turning off optimization is not a reliable solution, and the current bug probably will break out again in -O0 code in the near future, as g++ continues to evolve.  What is more, in considering the impact of this bug, the segfault is really only the tip of the iceberg.  I would expect this problem to be happening much more often in -m32 builds, but only showing up as segfaults in the (rare?) case that the memory between the valid data and the end of the valid data segment contains all zeros.  In what might be the more normal occurrance of this bug, we could be getting bogus results from the tracking and not know it.  In other words, the segfault is your friend.
>
> Besides this, there is the more serious issue of how robust the rest of the code is against what I might call the "x87 entropy problem" with randomly fluctuating least-significant bits in doubles.  This probably warrants a broader discussion, beyond the resolution of this particular bug.
>
> -Richard J.
>
>
>
>
>
> _______________________________________________
> Halld-offline mailing list
> Halld-offline at jlab.org<mailto:Halld-offline at jlab.org>
> https://mailman.jlab.org/mailman/listinfo/halld-offline
>
>


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4092 bytes
Desc: S/MIME Cryptographic Signature
URL: <https://mailman.jlab.org/pipermail/halld-offline/attachments/20110317/18bb887e/attachment.p7s>


More information about the Halld-offline mailing list