[Halld-online] [New Logentry] DAQ testing

Tue Jul 15 11:15:02 EDT 2014

Logentry Text:
--

- Initial testing
  - Created hd_all_mc3 configuration based on hd_all_mc2 but without FCAL crates 
    and 4 BCAL crates which appear to be in use. This leaves a configuration with 33 crates
    (including roctrig1)
  - rcm.sh hung initially while trying to connect to gluon50-daq from gluon03.
    Edited ~/.rcm.ini to make it use gluon50 instead of gluon50-daq
  - RCM GUI started, but refused to launch hd_all_mc3 configuration since the
    /home/hdops/CDAQ/daq_dev_vers/daq/config/hd_all_mc3 directory did not exist.
    This message only showed up in the original terminal used to launch rcm.sh
    and should probably show up somewhere else in the GUI. It also needed a
    file “Default” to be present in that directory. I created and empty one and it
    then proceeded with launching the components.
  - rcgui died quickly (before window came up). This turns out to be because the
    platform would not start locally because one was already running for the FCAL.
    Made  /home/hdops/CDAQ/daq_dev_vers/daq/config/hd_all_mc3/hosts file
    and listed all components to run on gluon51.
  - Stopped and then started components again via RCM. Components launched
    and rcgui came up. Selected “Connect” and then hit “Configure”.
  - Most components configured, but SEB and DCBCAL failed. Then all of the 
    non-ROC component windows disappeared. (ROCS did not have windows
    to start with since they were started with “use xterms” unchecked
  - Tried again with no changes.
    - Configure succeeded
    - Download succeeded
    - Prestart failed: The SEB could not connect to the ER’s ET system. The connection 
      type was set to “Direct” but the IP address corresponded to gluon52 while
      the ER was running on gluon51
  - Set correct IP address in jcedit and tried again
    - Configure failed:
      - “CodaRcPrestart service failed”
      - “transition failed. SEB0 failed”
  - Hit configure again without restarting anything
    - Configure succeeded
    - Download succeeded
    - Prestart failed: same “CodaRcPrestart service failed” message in dialog.
  - Restart all components
    - Configure succeeded
    - Download succeeded
    - Prestart failed: DC’s cannot connect to multi-cast port.
  - Change all DC links to use “direct” to 127.0.0.1
  - Hit Configure without restarting anything.
    - Configure failed with “transition failed. DCFDC failed”
  - Hit Configure once more without restarting anything.
    - Configure succeeded
    - Download succeeded
    - Prestart failed: “CodaRcPrestart service failed”. Noticed “EtDeadException”s on 
      all DC windows.
  - Restart all components
  - Notice that updating on RCM ROCs window does nothing. Tried hitting “stop servers”
    followed by “start servers”, but still all ROCs are colored grey.
    - Tried starting components anyway, but all ROCs fail to connect. Likely they
      are not being started because the shm_serv is down.
    - Several minutes later, the ROCs seem to be reporting again. (Perhaps Sergey was
      restarting something from his office??) Called Sergey. It turns out that he did restart
      the master server from a different account.
  - Problems getting all components to configure. Always one ROC failing.
  - Stopped all components, waited a couple of minutes and restarted them.
  - Try again:
    - Configure succeeded
    - Download succeeded
    - Prestart succeeded
    - Go succeeded
    - Forgot to start softROCcontroller. Hit end.
    - End failed, but that’s expected since ROCS are not aligned.
  - Reset and start again.
    - Configure succeeded
    - Download succeeded
    - Prestart succeeded
    - start softROCcontroller
    - Go succeeded
  - Update softROCcontroller to use 2000 events

  —— gluon51 crashed ——
    - Called Paul Letta. Tried flipping the newly installed NMI switch
    - Crash state was locked to the point that NMI could not initiate a reboot + kernel dump

  - Switched to gluon49 to try and cause crash there
  - Ran for >10minutes without crash.
  - Tried reducing pre scale in mcROL from 1000 to 500
  - Unable to start because user somov has started using some of the TOF crates
  - Remove TOF crates from configuration and start again
  - Rate is now doubled to 4kHz
  - Ran for 20 minutes on gluon49 and no crash

  - Multi-cast test
    - Switched back to using multi-cast from SEB-ER. Success
    - Switched to using gluon49 and multi-cast worked.
    - Switched to using gluon46 and multi-cast worked.
    - Switched to using gluon50. Components can't connect since platform is already running.
    - Found platform still running on gluon49. Killed it.
    - Switched to using gluon51. multi-cast failed.

 - RAID disk mounts switched on all gluons to use fast NIC (infiniband)


---

This is a plain text email for clients that cannot display HTML.  The full logentry can be found online at https://logbooks.jlab.org/entry/3289607
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://mailman.jlab.org/pipermail/halld-online/attachments/20140715/1d988c85/attachment.html