[Halld-offline] highlevel plugin crash

David Lawrence davidl at jlab.org
Wed Feb 8 15:28:56 EST 2017


Hi All,

 Yes, I have seen this crash quite a bit over the past few days. It usually has DFCALCluster in the stack trace,
but often will have several frames with “???” above it. It also seems mysteriously temporal. I have seen periods
where all jobs crash within one minute of starting and others where they all survive the entire run. I’m still
looking into it. Maybe this clue Alex points to will yield something. Stay tuned….

-David


> On Feb 8, 2017, at 3:08 PM, Sean Dobbs <s-dobbs at northwestern.edu> wrote:
> 
> For what it's worth, looking at the bracktrace under gdb shows that the stack appears corrupt.  Maybe running it under valgrind would help find the memory error?
> 
> Snippet:
> #38 0x0000000000a2dee9 in jana::JEventLoop::Get<DFCALCluster> (this=0x7fffd0873998, t=<error reading variable: Cannot access memory at address 0x54>, 
>     tag=0x7fffb5b8fc20 "P\t\032\001", allow_deftag=<optimized out>)
>     at /u/group/halld/Software/builds/Linux_CentOS7-x86_64-gcc4.8.5/jana/jana_0.7.5p2^root6.06.08/Linux_CentOS7-x86_64-gcc4.8.5/include/JANA/JEventLoop.h:304
> Backtrace stopped: previous frame inner to this frame (corrupt stack?)
> 
> ---Sean
> 
> On Wed, Feb 8, 2017 at 1:56 PM Alexander Austregesilo <aaustreg at jlab.org <mailto:aaustreg at jlab.org>> wrote:
> Hi David,
> 
> I saw that you fixed a bug in occupancy_online plugin. Unfortunately, there seems to be another one in highlevel_online. In about 10% of all jobs, I see crashes without any reasonable hint to what is happening. However, I could track it down to this plugin. Here is an example:
> 
> hd_root -PEVENTS_TO_SKIP=155600 -PPLUGINS=highlevel_online /cache/halld/RunPeriod-2017-01/rawdata/Run030280/hd_rawdata_030280_041.evio
> 
> This should crash after 75events and the stack trace looks awful.
> 
> Would you mind having a look?
> 
> Cheers,
> 
> Alex
> 
> 
> On 02/06/2017 03:25 PM, gluex wrote:
>> Test status for this pull request: SUCCESS
>> 
>> Summary: /work/halld/pull_request_test/sim-recon^davidl_L1trigger_monitor/tests/summary.txt <https://urldefense.proofpoint.com/v2/url?u=https-3A__halldweb.jlab.org_pull-5Frequest-5Ftest_sim-2Drecon-255Edavidl-5FL1trigger-5Fmonitor_tests-2Dsummary.txt&d=CwMGaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=y4ZD58I4nPR6tZqjerSGt-WlWVAhqa3FHDMXqQ_5aUc&m=T8VpbeyfSWpCctf2DXXwHXHUoJu6h76LJ1kyhxsnxVw&s=vvfBME-Ay7Y1_fXZEGyZ9VAZRgvpPpiy3X3yr4E8PwM&e=>
>> Logs: /work/halld/pull_request_test/sim-recon^davidl_L1trigger_monitor/tests/log <https://urldefense.proofpoint.com/v2/url?u=https-3A__halldweb.jlab.org_pull-5Frequest-5Ftest_sim-2Drecon-255Edavidl-5FL1trigger-5Fmonitor_tests-2Dlogs&d=CwMGaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=y4ZD58I4nPR6tZqjerSGt-WlWVAhqa3FHDMXqQ_5aUc&m=T8VpbeyfSWpCctf2DXXwHXHUoJu6h76LJ1kyhxsnxVw&s=ZTnhZcea46_6IiibB8Hb989B9rRfgZxafX0UqxzJBSU&e=>
>> Build log: /work/halld/pull_request_test/sim-recon^davidl_L1trigger_monitor/make_davidl_L1trigger_monitor.log <https://urldefense.proofpoint.com/v2/url?u=https-3A__halldweb.jlab.org_pull-5Frequest-5Ftest_sim-2Drecon-255Edavidl-5FL1trigger-5Fmonitor_make-5Fdavidl-5FL1trigger-5Fmonitor.log&d=CwMGaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=y4ZD58I4nPR6tZqjerSGt-WlWVAhqa3FHDMXqQ_5aUc&m=T8VpbeyfSWpCctf2DXXwHXHUoJu6h76LJ1kyhxsnxVw&s=yv9lEc5U-oeLGXxzsHhzEnKK6_aC9lnAyUuk1ubLyqE&e=>
>> Build report: /work/halld/pull_request_test/sim-recon^davidl_L1trigger_monitor/report_davidl_L1trigger_monitor.txt <https://urldefense.proofpoint.com/v2/url?u=https-3A__halldweb.jlab.org_pull-5Frequest-5Ftest_sim-2Drecon-255Edavidl-5FL1trigger-5Fmonitor_report-5Fdavidl-5FL1trigger-5Fmonitor.txt&d=CwMGaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=y4ZD58I4nPR6tZqjerSGt-WlWVAhqa3FHDMXqQ_5aUc&m=T8VpbeyfSWpCctf2DXXwHXHUoJu6h76LJ1kyhxsnxVw&s=bMmJ-y-gjgxbMWIqtdnTp5sUXu7vWfj9YITnNxtpqe4&e=>
>> Location of build: /work/halld/pull_request_test/sim-recon^davidl_L1trigger_monitor
>> 
>>>> You are receiving this because you are subscribed to this thread.
>> Reply to this email directly, view it on GitHub <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_JeffersonLab_sim-2Drecon_pull_693-23issuecomment-2D277802649&d=CwMGaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=y4ZD58I4nPR6tZqjerSGt-WlWVAhqa3FHDMXqQ_5aUc&m=T8VpbeyfSWpCctf2DXXwHXHUoJu6h76LJ1kyhxsnxVw&s=iPPY5ZXi4p72SSOFBHTVHpKIMZ9JnyP5BgaW3jhCAb0&e=>, or mute the thread <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AIZiyGShO0ZM4XGcSbvmtnDDIBkiQpO3ks5rZ4HJgaJpZM4L4piG&d=CwMGaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=y4ZD58I4nPR6tZqjerSGt-WlWVAhqa3FHDMXqQ_5aUc&m=T8VpbeyfSWpCctf2DXXwHXHUoJu6h76LJ1kyhxsnxVw&s=wYQ_LWCpzKZEziZHzlJpbdNM0mthtURBt_Np9E8oe84&e=>.
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/halld-offline/attachments/20170208/03bae981/attachment-0002.html>


More information about the Halld-offline mailing list