[Halld-online] [New Logentry] gluon48 down, monitoring moved to gluon46
davidl at jlab.org
davidl at jlab.org
Sat Feb 27 23:25:02 EST 2016
Logentry Text:
--
For the second time in a week gluon48 has frozen and become non-responsive to ssh. During this time it has been running the cMsg server used by the monitoring system. It was also used to run the RSArchiver program used to create a ROOT file to archive the monitoring histograms generated online. Consequently, the online monitoring system was unable to launch. First, because the start_monitoring script froze when trying to check for an existing process on gluon48 because the ssh command would stall and not complete. Beyond that, the monitoring system processes would have been unable to communicate without the cMsg server on gluon48.
It is not clear whether the cMsg server running on gluon48 and the system locking up are related at this point. It may be worth noting that the server did undergo heavy traffic from the monitoring system. Until recently, this included sending all histograms. Recent changes to the monitoring system now allow histograms that can be compressed to less than 65kB (which is most of them) to be transferred via a direct UDP packet, lessening the traffic the cMsg server is required to handle.
The cMsg server was switched back to gluon46 and the RSArchiver switched to run on gluon49. The monitoring system is now able to run again.
It should be noted that recent changes to the monitoring system may require frequent pressing of the green update button to get the screen to update.
---
This is a plain text email for clients that cannot display HTML. The full logentry can be found online at https://logbooks.jlab.org/entry/3385482
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/halld-online/attachments/20160227/bef36914/attachment.html>
More information about the Halld-online
mailing list