[Halld-online] [New Logentry] DAQ testing
davidl at jlab.org
davidl at jlab.org
Tue Jul 15 11:15:02 EDT 2014
Logentry Text:
--
- Initial testing
- Created hd_all_mc3 configuration based on hd_all_mc2 but without FCAL crates
and 4 BCAL crates which appear to be in use. This leaves a configuration with 33 crates
(including roctrig1)
- rcm.sh hung initially while trying to connect to gluon50-daq from gluon03.
Edited ~/.rcm.ini to make it use gluon50 instead of gluon50-daq
- RCM GUI started, but refused to launch hd_all_mc3 configuration since the
/home/hdops/CDAQ/daq_dev_vers/daq/config/hd_all_mc3 directory did not exist.
This message only showed up in the original terminal used to launch rcm.sh
and should probably show up somewhere else in the GUI. It also needed a
file “Default” to be present in that directory. I created and empty one and it
then proceeded with launching the components.
- rcgui died quickly (before window came up). This turns out to be because the
platform would not start locally because one was already running for the FCAL.
Made /home/hdops/CDAQ/daq_dev_vers/daq/config/hd_all_mc3/hosts file
and listed all components to run on gluon51.
- Stopped and then started components again via RCM. Components launched
and rcgui came up. Selected “Connect” and then hit “Configure”.
- Most components configured, but SEB and DCBCAL failed. Then all of the
non-ROC component windows disappeared. (ROCS did not have windows
to start with since they were started with “use xterms” unchecked
- Tried again with no changes.
- Configure succeeded
- Download succeeded
- Prestart failed: The SEB could not connect to the ER’s ET system. The connection
type was set to “Direct” but the IP address corresponded to gluon52 while
the ER was running on gluon51
- Set correct IP address in jcedit and tried again
- Configure failed:
- “CodaRcPrestart service failed”
- “transition failed. SEB0 failed”
- Hit configure again without restarting anything
- Configure succeeded
- Download succeeded
- Prestart failed: same “CodaRcPrestart service failed” message in dialog.
- Restart all components
- Configure succeeded
- Download succeeded
- Prestart failed: DC’s cannot connect to multi-cast port.
- Change all DC links to use “direct” to 127.0.0.1
- Hit Configure without restarting anything.
- Configure failed with “transition failed. DCFDC failed”
- Hit Configure once more without restarting anything.
- Configure succeeded
- Download succeeded
- Prestart failed: “CodaRcPrestart service failed”. Noticed “EtDeadException”s on
all DC windows.
- Restart all components
- Notice that updating on RCM ROCs window does nothing. Tried hitting “stop servers”
followed by “start servers”, but still all ROCs are colored grey.
- Tried starting components anyway, but all ROCs fail to connect. Likely they
are not being started because the shm_serv is down.
- Several minutes later, the ROCs seem to be reporting again. (Perhaps Sergey was
restarting something from his office??) Called Sergey. It turns out that he did restart
the master server from a different account.
- Problems getting all components to configure. Always one ROC failing.
- Stopped all components, waited a couple of minutes and restarted them.
- Try again:
- Configure succeeded
- Download succeeded
- Prestart succeeded
- Go succeeded
- Forgot to start softROCcontroller. Hit end.
- End failed, but that’s expected since ROCS are not aligned.
- Reset and start again.
- Configure succeeded
- Download succeeded
- Prestart succeeded
- start softROCcontroller
- Go succeeded
- Update softROCcontroller to use 2000 events
—— gluon51 crashed ——
- Called Paul Letta. Tried flipping the newly installed NMI switch
- Crash state was locked to the point that NMI could not initiate a reboot + kernel dump
- Switched to gluon49 to try and cause crash there
- Ran for >10minutes without crash.
- Tried reducing pre scale in mcROL from 1000 to 500
- Unable to start because user somov has started using some of the TOF crates
- Remove TOF crates from configuration and start again
- Rate is now doubled to 4kHz
- Ran for 20 minutes on gluon49 and no crash
- Multi-cast test
- Switched back to using multi-cast from SEB-ER. Success
- Switched to using gluon49 and multi-cast worked.
- Switched to using gluon46 and multi-cast worked.
- Switched to using gluon50. Components can't connect since platform is already running.
- Found platform still running on gluon49. Killed it.
- Switched to using gluon51. multi-cast failed.
- RAID disk mounts switched on all gluons to use fast NIC (infiniband)
---
This is a plain text email for clients that cannot display HTML. The full logentry can be found online at https://logbooks.jlab.org/entry/3289607
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailman.jlab.org/pipermail/halld-online/attachments/20140715/1d988c85/attachment.html
More information about the Halld-online
mailing list