[Halld-online] [New Logentry] gluon crashes - Sandy Bridge and OpenSuSE both crash

davidl at jlab.org davidl at jlab.org
Thu Aug 7 13:40:04 EDT 2014


Logentry Text:
--
Today we observed crashes on the Sandy Bridge node running CentOS 6.5 (gluon56) and on one
of the original Ivy Bridge nodes (gluon46) running OpenSuSE.

We have tried several configurations to identify the source of the gluon crashes when running CODA. These included:

- Disabling network interfaces and using completely different network interfaces
- Using node with no NIC installed (only onboard)
- Replacing RAM
- Using different JAVA version (1.7_60 instead of 1.7_17)
- Replacing OS with RHEL 6.5
- Running on different hardware 
  - CentOS 6.5 Sandy Bridge
  - RHEL6.4 AMD opteron
- Replacing OS with OpenSuSE 13.1

Some changes appeared to change the time constant of the crashes, but did not eliminate them.

The only hardware that does not seem to have ever crashed is the AMD machine (gluon53)
and the Desktop systems (gluon02, gluon03). It should be noted that the Sandy Bridge nodes
were very similar to the Ivy Bridge nodes. The chassis' looked identical as did the nodes themselves
(One chassis houses and powers 4 nodes.) It is likely that the motherboard designs of the Ivy Bridge
and Sandy Bridge computers have common ancestry. Therefore, if the problem is in the 
hardware, the same design defect would likely exist in both of those machines.




---

This is a plain text email for clients that cannot display HTML.  The full logentry can be found online at https://logbooks.jlab.org/entry/3290880
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailman.jlab.org/pipermail/halld-online/attachments/20140807/afe23240/attachment.html 


More information about the Halld-online mailing list