[Halld-offline] diagnosis of cause for segfaults in DTrackCandidate_factory_CDC::FindThetaZRegression()

Beni Zihlmann zihlmann at jlab.org
Thu Mar 17 08:36:57 EDT 2011


Hi Richard,
I have one question to this issue. If I understand this correctly it 
means that there is a
potential problem every time I do an explicit calculation inside an if 
statement and
comparing that with a variable from memory.

cheers,
Beni

> Dear colleagues,
>
> I have reproduced and diagnosed the segfaults that take place in the 
> current GlueX reconstruction code, when compiled for the i686 
> platform.  Note that they also occur on 64bit hardware when running 
> the 32bit executable, so it is not just a 32bit issue.  The 
> explanation is a bit too long for email, so I have written it up in 
> the form of a wiki page.  Please see it at the following URL.
>
> http://www.jlab.org/Hall-D/software/wiki/index.php/Diagnosing_segmentation_faults_in_reconstruction_software#How_to_diagnose_this_kind_of_error 
>
>
> In that wiki page, I also explain why this should not be considered to 
> be a compiler optimization bug, but rather a bug in our user code, in 
> the context of x87 math.  That, in spite of the fact that recompiling 
> with -O0 seemed to solve it!  In fact, turning off optimization is not 
> a reliable solution, and the current bug probably will break out again 
> in -O0 code in the near future, as g++ continues to evolve.  What is 
> more, in considering the impact of this bug, the segfault is really 
> only the tip of the iceberg.  I would expect this problem to be 
> happening much more often in -m32 builds, but only showing up as 
> segfaults in the (rare?) case that the memory between the valid data 
> and the end of the valid data segment contains all zeros.  In what 
> might be the more normal occurrance of this bug, we could be getting 
> bogus results from the tracking and not know it.  In other words, the 
> segfault is your friend.
>
> Besides this, there is the more serious issue of how robust the rest 
> of the code is against what I might call the "x87 entropy problem" 
> with randomly fluctuating least-significant bits in doubles.  This 
> probably warrants a broader discussion, beyond the resolution of this 
> particular bug.
>
> -Richard J.
>
>
> _______________________________________________
> Halld-offline mailing list
> Halld-offline at jlab.org
> https://mailman.jlab.org/mailman/listinfo/halld-offline

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/halld-offline/attachments/20110317/3c9703c7/attachment-0002.html>


More information about the Halld-offline mailing list