[Clas12_software] DB initialization and FTCAL

Sun Mar 4 11:36:12 EST 2018

Hi All,

Here is an observed behavior running Clara on clonfarm0 (32 core AMD, RH Linux 6). 

Clas12 software release:  5a.1.0  
Data file: clas_003191.evio.7.hipo

Running production service chain except FTCAL and FTHODO services:  runs with no problems and exceptions, yet there is this long (unexplained) DB activities at the configure stage of the reconstruction (see attached docx file ). This init process can cause DB issues.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: init.docx
Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document
Size: 109283 bytes
Desc: not available
URL: <https://mailman.jlab.org/pipermail/clas12_software/attachments/20180304/233ca7b8/attachment-0001.docx>
-------------- next part --------------

Adding FTCLA to the service chain: "[IndexedTable] ---> error.. entry does not exist? console printout for every event (slowing down reconstruction), and after a while out of memory exception (see below). 

[IndexedTable] ---> error.. entry does not exist
[IndexedTable] ---> error.. entry does not exist
[IndexedTable] ---> error.. entry does not exist
[IndexedTable] ---> error.. entry does not exist
[IndexedTable] ---> error.. entry does not exist
[IndexedTable] ---> error.. entry does not exist
java.lang.OutOfMemoryError: unable to create new native thread
	at java.lang.Thread.start0(Native Method)
	at java.lang.Thread.start(Thread.java:714)
	at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950)
	at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1357)
	at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112)
	at org.jlab.coda.xmsg.core.xMsg$1.handle(xMsg.java:568)
	at org.jlab.coda.xmsg.core.xMsgSubscription$Handler.run(xMsgSubscription.java:108)
	at java.lang.Thread.run(Thread.java:745)
java.lang.OutOfMemoryError: unable to create new native thread
	at java.lang.Thread.start0(Native Method)
	at java.lang.Thread.start(Thread.java:714)
	at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950)
	at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1357)
	at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112)
	at org.jlab.clara.sys.Service.execute(Service.java:176)
	at org.jlab.clara.sys.Service.access$300(Service.java:53)
	at org.jlab.clara.sys.Service$ServiceCallBack.callback(Service.java:286)
	at org.jlab.coda.xmsg.core.xMsg$1.lambda$handle$0(xMsg.java:568)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
[

Note that this Not the only FT issue (as previously thought), and can be an overall clas12 service engine memory management issue (garbage creation, etc.). Let us first start understanding and addressing DB communication issues, and get read of printouts for every event in any service engine. Printouts are useful for debuting of engines, but not for production service deployments. Clara presents mechanisms to report errors directly to the, triggering specific actions. After these we can return to tackle "out of memory" exception.

Also please take a look at the benchmark results below. We spent almost 3sec/event/core to reconstruct charge particle tracks in DC. This is using 5a.1.0 on online events (Run 3544, currently active). Is this real?

2018-03-04 11:04:17.263: Benchmark results:
2018-03-04 11:04:17.264:   READER           3000 events    total time =   140.63 s    average event time =   46.88 ms
2018-03-04 11:04:17.264:   FTCAL            3000 events    total time =     2.73 s    average event time =    0.91 ms
2018-03-04 11:04:17.265:   FTHODO           3000 events    total time =     2.61 s    average event time =    0.87 ms
2018-03-04 11:04:17.266:   FTEB             3000 events    total time =     2.78 s    average event time =    0.93 ms
2018-03-04 11:04:17.267:   DCHB             3000 events    total time =  6354.61 s    average event time = 2118.20 ms
2018-03-04 11:04:17.267:   DCTB             3000 events    total time =  2198.50 s    average event time =  732.83 ms
2018-03-04 11:04:17.268:   FTOF             3000 events    total time =    11.69 s    average event time =    3.90 ms
2018-03-04 11:04:17.269:   LTCC             3000 events    total time =     1.04 s    average event time =    0.35 ms
2018-03-04 11:04:17.269:   EC               3000 events    total time =     6.25 s    average event time =    2.08 ms
2018-03-04 11:04:17.270:   EBHB             3000 events    total time =     6.99 s    average event time =    2.33 ms
2018-03-04 11:04:17.271:   EBTB             3000 events    total time =     5.85 s    average event time =    1.95 ms
2018-03-04 11:04:17.272:   WRITER           3000 events    total time =    10.63 s    average event time =    3.54 ms
2018-03-04 11:04:17.272:   TOTAL            3000 events    total time =  8744.31 s    average event time = 2914.77 ms

Best regards,
-vardan