[Halld-offline] hd_ana hanging after variable number of bggen events

Matthew Shepherd mashephe at indiana.edu
Mon Mar 14 09:28:17 EDT 2011


Hi Richard and Dave,

I dug through this problem when Kei and Jake were having the trouble.  The behavior I saw that was the the STL sort algorithm was not respecting the boundaries specified by the begin and end iterators (particularly the end iterator).

The result was one of two things:

sort would continue presumably sorting memory past the end iterator ad infinitum

or

sort would eventually dereference some memory past the end iterator that it was not allowed to access resulting in a segfault

As Dave noted, the "fix" I found was to compile this routine without compiler optimization.  I'd like to think that it is a bug in the compiler optimization of this sort algorithm, but I'm a bit afraid that this is just a symptom of some other memory handling error.  I wasn't able to find any other issues using valgrind.

-Matt

On Mar 14, 2011, at 8:45 AM, Richard Jones wrote:

> Dave,
> 
> Ok, I thought it might be related.  I think the following represents progress on this front:
> 	• It is not a "hang" but a segfault in child thread #2
> 	• Segfault in a child thread causes the code to hang.  This seems to be because we are going through the root signal handling mechanism, which is badly broken in the JANA context.
> I suggest that the top priority is to fix item #2, by writing your own signal recovery and backtrace mechanism for the JANA framework.  This seems like a first-order requirement for our analysis framework, to have a signal recovery and backtrace mechanism with an appropriate behavior.  Once that is done, tracing other problems,     such as item #1, will be more feasible.
> 
> -Richard J.
> 
> 
> 
> 
> On 3/14/2011 8:37 AM, David Lawrence wrote:
> 
>> Hi Richard,
>> 
>>     The DTrackCandidate_factory_CDC::FindThetaZRegression() has come up before when Kei and Jake reported problems with DANA programs hanging as far back as December. This led to the "fix" currently used (though not incorporated into the build system) where optimization is turned off when compiling DTrackCandidate_factory_CDC.cc. As of yet, we have not been able to identify the bug exactly as the behavior is not deterministic.
>> 
>>     I will take another look at this today to see if I can make some more headway on the problem.
>> 
>> Regards,
>> -Dave
>> 
> 
> _______________________________________________
> Halld-offline mailing list
> Halld-offline at jlab.org
> https://mailman.jlab.org/mailman/listinfo/halld-offline





More information about the Halld-offline mailing list