[Clas12_software] DB initialization and FTCAL
Vardan Gyurjyan
gurjyan at jlab.org
Sun Mar 4 11:36:12 EST 2018
Hi All,
Here is an observed behavior running Clara on clonfarm0 (32 core AMD, RH Linux 6).
Clas12 software release: 5a.1.0
Data file: clas_003191.evio.7.hipo
Running production service chain except FTCAL and FTHODO services: runs with no problems and exceptions, yet there is this long (unexplained) DB activities at the configure stage of the reconstruction (see attached docx file ). This init process can cause DB issues.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: init.docx
Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document
Size: 109283 bytes
Desc: not available
URL: <https://mailman.jlab.org/pipermail/clas12_software/attachments/20180304/233ca7b8/attachment-0001.docx>
-------------- next part --------------
Adding FTCLA to the service chain: "[IndexedTable] ---> error.. entry does not exist? console printout for every event (slowing down reconstruction), and after a while out of memory exception (see below).
[IndexedTable] ---> error.. entry does not exist
[IndexedTable] ---> error.. entry does not exist
[IndexedTable] ---> error.. entry does not exist
[IndexedTable] ---> error.. entry does not exist
[IndexedTable] ---> error.. entry does not exist
[IndexedTable] ---> error.. entry does not exist
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1357)
at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112)
at org.jlab.coda.xmsg.core.xMsg$1.handle(xMsg.java:568)
at org.jlab.coda.xmsg.core.xMsgSubscription$Handler.run(xMsgSubscription.java:108)
at java.lang.Thread.run(Thread.java:745)
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1357)
at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112)
at org.jlab.clara.sys.Service.execute(Service.java:176)
at org.jlab.clara.sys.Service.access$300(Service.java:53)
at org.jlab.clara.sys.Service$ServiceCallBack.callback(Service.java:286)
at org.jlab.coda.xmsg.core.xMsg$1.lambda$handle$0(xMsg.java:568)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
[
Note that this Not the only FT issue (as previously thought), and can be an overall clas12 service engine memory management issue (garbage creation, etc.). Let us first start understanding and addressing DB communication issues, and get read of printouts for every event in any service engine. Printouts are useful for debuting of engines, but not for production service deployments. Clara presents mechanisms to report errors directly to the, triggering specific actions. After these we can return to tackle "out of memory" exception.
Also please take a look at the benchmark results below. We spent almost 3sec/event/core to reconstruct charge particle tracks in DC. This is using 5a.1.0 on online events (Run 3544, currently active). Is this real?
2018-03-04 11:04:17.263: Benchmark results:
2018-03-04 11:04:17.264: READER 3000 events total time = 140.63 s average event time = 46.88 ms
2018-03-04 11:04:17.264: FTCAL 3000 events total time = 2.73 s average event time = 0.91 ms
2018-03-04 11:04:17.265: FTHODO 3000 events total time = 2.61 s average event time = 0.87 ms
2018-03-04 11:04:17.266: FTEB 3000 events total time = 2.78 s average event time = 0.93 ms
2018-03-04 11:04:17.267: DCHB 3000 events total time = 6354.61 s average event time = 2118.20 ms
2018-03-04 11:04:17.267: DCTB 3000 events total time = 2198.50 s average event time = 732.83 ms
2018-03-04 11:04:17.268: FTOF 3000 events total time = 11.69 s average event time = 3.90 ms
2018-03-04 11:04:17.269: LTCC 3000 events total time = 1.04 s average event time = 0.35 ms
2018-03-04 11:04:17.269: EC 3000 events total time = 6.25 s average event time = 2.08 ms
2018-03-04 11:04:17.270: EBHB 3000 events total time = 6.99 s average event time = 2.33 ms
2018-03-04 11:04:17.271: EBTB 3000 events total time = 5.85 s average event time = 1.95 ms
2018-03-04 11:04:17.272: WRITER 3000 events total time = 10.63 s average event time = 3.54 ms
2018-03-04 11:04:17.272: TOTAL 3000 events total time = 8744.31 s average event time = 2914.77 ms
Best regards,
-vardan
More information about the Clas12_software
mailing list