[Halld-online] [New Logentry] Testing new memory in gluon48

davidl at jlab.org davidl at jlab.org
Fri Aug 1 08:00:06 EDT 2014


Logentry Text:
--
New memory was installed on gluon48 to test if changing the memory brand would prevent the system freezes we've been experiencing while running CODA on the Ivy Bridge machines. We first tested a configuration on gluon46 to confirm that it would crash in a short amount of time (within 1.5M events with a 4kHz event rate). The gluon46 system was successfully crashed. The configuration used 8 ROCS (BCAL and TOF) and a Primary Event Builder (PEB) along with an Event Recorder (ER). The PEB was run on gluon46 while the ER was run on gluon53.

We switched to using gluon48 with the new memory for running the PEB. This crashed also and fairly quickly. A couple of things to note:
  1.) The BIOS was adjusted to use "AUTO" for the DDR speed (it had been left at "FORCE 1333" from previous efforts to diagnose the problem).
  2.) The BIOS reported the speed as "1600" upon reboot whereas the original memory reported "1867" under this setting

We then switched the new memory into gluon111 so that we could try and observe the IPMI report after a crash. The one previous crash event on this machine resulted in the memory temperature sensors being "n/a" after the crash. This  is what led us to suspect the memory could be the culprit.

Running on gluon111 with the new memory for more than 5M did not result in a crash. We decided to switch to gluon109 to confirm that we could crash it since the networking on gluon111 and gluon109 is different than on gluon46-gluon52. Gluon109 did not crash after more than 2M events. We switched back to running on gluon111 so we could let it run overnight.

The next morning, gluon111 was found to have crashed. Apparently just after we left the evening before as it was only a little over 1M events in. The IPMI record was checked and it showed the same condition as with the original memory. Namely, that the temperature readings of all DIMM slots was "na". All other readings, including DIMM voltages, were good.

Note that the memory that originally came out of gluon48 was placed back into gluon48 and the BIOS was left at the original DDR speed setting of "AUTO" as it was when it arrived from the vendor.



---

This is a plain text email for clients that cannot display HTML.  The full logentry can be found online at https://logbooks.jlab.org/entry/3290483
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailman.jlab.org/pipermail/halld-online/attachments/20140801/881b584d/attachment.html 


More information about the Halld-online mailing list