[Halld-offline] highlevel plugin crash

David Lawrence davidl at jlab.org
Wed Feb 8 16:38:53 EST 2017


Hi All,

  Just to update you. Simon and co. may have figured this problem out. He suggested the
issue may be due to the DFCALCluster factory trying to make clusters out of the LED events
were every block had a hit. After reproducing the error on file hd_rawdata_030280_041.evio
using Alexander’s recipe, I put in a config parameter that defaults to having a maximum of
250 DFCAL hits or else clustering is not performed. This seemed to allow the code to get
past the crash point. I’ve put in (and already accepted) a pull request with this change
and recompiled sim-recon on the gluons. We’ll see if this really fixes it, but things do look
better so far.

Regards,
-David



> On Feb 8, 2017, at 3:28 PM, David Lawrence <davidl at jlab.org> wrote:
> 
> 
> Hi All,
> 
>  Yes, I have seen this crash quite a bit over the past few days. It usually has DFCALCluster in the stack trace,
> but often will have several frames with “???” above it. It also seems mysteriously temporal. I have seen periods
> where all jobs crash within one minute of starting and others where they all survive the entire run. I’m still
> looking into it. Maybe this clue Alex points to will yield something. Stay tuned….
> 
> -David
> 
> 
>> On Feb 8, 2017, at 3:08 PM, Sean Dobbs <s-dobbs at northwestern.edu <mailto:s-dobbs at northwestern.edu>> wrote:
>> 
>> For what it's worth, looking at the bracktrace under gdb shows that the stack appears corrupt.  Maybe running it under valgrind would help find the memory error?
>> 
>> Snippet:
>> #38 0x0000000000a2dee9 in jana::JEventLoop::Get<DFCALCluster> (this=0x7fffd0873998, t=<error reading variable: Cannot access memory at address 0x54>, 
>>     tag=0x7fffb5b8fc20 "P\t\032\001", allow_deftag=<optimized out>)
>>     at /u/group/halld/Software/builds/Linux_CentOS7-x86_64-gcc4.8.5/jana/jana_0.7.5p2^root6.06.08/Linux_CentOS7-x86_64-gcc4.8.5/include/JANA/JEventLoop.h:304
>> Backtrace stopped: previous frame inner to this frame (corrupt stack?)
>> 
>> ---Sean
>> 
>> On Wed, Feb 8, 2017 at 1:56 PM Alexander Austregesilo <aaustreg at jlab.org <mailto:aaustreg at jlab.org>> wrote:
>> Hi David,
>> 
>> I saw that you fixed a bug in occupancy_online plugin. Unfortunately, there seems to be another one in highlevel_online. In about 10% of all jobs, I see crashes without any reasonable hint to what is happening. However, I could track it down to this plugin. Here is an example:
>> 
>> hd_root -PEVENTS_TO_SKIP=155600 -PPLUGINS=highlevel_online /cache/halld/RunPeriod-2017-01/rawdata/Run030280/hd_rawdata_030280_041.evio
>> 
>> This should crash after 75events and the stack trace looks awful.
>> 
>> Would you mind having a look?
>> 
>> Cheers,
>> 
>> Alex
>> 
>> 
>> On 02/06/2017 03:25 PM, gluex wrote:
>>> Test status for this pull request: SUCCESS
>>> 
>>> Summary: /work/halld/pull_request_test/sim-recon^davidl_L1trigger_monitor/tests/summary.txt <https://urldefense.proofpoint.com/v2/url?u=https-3A__halldweb.jlab.org_pull-5Frequest-5Ftest_sim-2Drecon-255Edavidl-5FL1trigger-5Fmonitor_tests-2Dsummary.txt&d=CwMGaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=y4ZD58I4nPR6tZqjerSGt-WlWVAhqa3FHDMXqQ_5aUc&m=T8VpbeyfSWpCctf2DXXwHXHUoJu6h76LJ1kyhxsnxVw&s=vvfBME-Ay7Y1_fXZEGyZ9VAZRgvpPpiy3X3yr4E8PwM&e=>
>>> Logs: /work/halld/pull_request_test/sim-recon^davidl_L1trigger_monitor/tests/log <https://urldefense.proofpoint.com/v2/url?u=https-3A__halldweb.jlab.org_pull-5Frequest-5Ftest_sim-2Drecon-255Edavidl-5FL1trigger-5Fmonitor_tests-2Dlogs&d=CwMGaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=y4ZD58I4nPR6tZqjerSGt-WlWVAhqa3FHDMXqQ_5aUc&m=T8VpbeyfSWpCctf2DXXwHXHUoJu6h76LJ1kyhxsnxVw&s=ZTnhZcea46_6IiibB8Hb989B9rRfgZxafX0UqxzJBSU&e=>
>>> Build log: /work/halld/pull_request_test/sim-recon^davidl_L1trigger_monitor/make_davidl_L1trigger_monitor.log <https://urldefense.proofpoint.com/v2/url?u=https-3A__halldweb.jlab.org_pull-5Frequest-5Ftest_sim-2Drecon-255Edavidl-5FL1trigger-5Fmonitor_make-5Fdavidl-5FL1trigger-5Fmonitor.log&d=CwMGaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=y4ZD58I4nPR6tZqjerSGt-WlWVAhqa3FHDMXqQ_5aUc&m=T8VpbeyfSWpCctf2DXXwHXHUoJu6h76LJ1kyhxsnxVw&s=yv9lEc5U-oeLGXxzsHhzEnKK6_aC9lnAyUuk1ubLyqE&e=>
>>> Build report: /work/halld/pull_request_test/sim-recon^davidl_L1trigger_monitor/report_davidl_L1trigger_monitor.txt <https://urldefense.proofpoint.com/v2/url?u=https-3A__halldweb.jlab.org_pull-5Frequest-5Ftest_sim-2Drecon-255Edavidl-5FL1trigger-5Fmonitor_report-5Fdavidl-5FL1trigger-5Fmonitor.txt&d=CwMGaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=y4ZD58I4nPR6tZqjerSGt-WlWVAhqa3FHDMXqQ_5aUc&m=T8VpbeyfSWpCctf2DXXwHXHUoJu6h76LJ1kyhxsnxVw&s=bMmJ-y-gjgxbMWIqtdnTp5sUXu7vWfj9YITnNxtpqe4&e=>
>>> Location of build: /work/halld/pull_request_test/sim-recon^davidl_L1trigger_monitor
>>> 
>>>>>> You are receiving this because you are subscribed to this thread.
>>> Reply to this email directly, view it on GitHub <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_JeffersonLab_sim-2Drecon_pull_693-23issuecomment-2D277802649&d=CwMGaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=y4ZD58I4nPR6tZqjerSGt-WlWVAhqa3FHDMXqQ_5aUc&m=T8VpbeyfSWpCctf2DXXwHXHUoJu6h76LJ1kyhxsnxVw&s=iPPY5ZXi4p72SSOFBHTVHpKIMZ9JnyP5BgaW3jhCAb0&e=>, or mute the thread <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AIZiyGShO0ZM4XGcSbvmtnDDIBkiQpO3ks5rZ4HJgaJpZM4L4piG&d=CwMGaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=y4ZD58I4nPR6tZqjerSGt-WlWVAhqa3FHDMXqQ_5aUc&m=T8VpbeyfSWpCctf2DXXwHXHUoJu6h76LJ1kyhxsnxVw&s=wYQ_LWCpzKZEziZHzlJpbdNM0mthtURBt_Np9E8oe84&e=>.
>>> 
>> 
> 
> _______________________________________________
> Halld-offline mailing list
> Halld-offline at jlab.org
> https://mailman.jlab.org/mailman/listinfo/halld-offline

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/halld-offline/attachments/20170208/3c0f2015/attachment-0002.html>


More information about the Halld-offline mailing list