[Clas12_software] Critical bug in 5a.1.0
Vardan Gyurjyan
gurjyan at jlab.org
Fri Mar 2 20:13:43 EST 2018
Hi Silvester,
I also notice that FTCAL and FTHODO services are making many dB accesses during the initialization. I expect to see a single access to the database followed by a single dB disconnect.
-Vardan
Sent from my iPhone
> On Mar 2, 2018, at 7:50 PM, Sylvester J. Joosten <tuf42480 at temple.edu> wrote:
>
> Hi Rafaella, hi Vardan,
>
> In case this is useful: I recognize this error in the context of accessing tables from CCDB from COATJAVA. If this happens when
>
> if(this.entries.hasItem(index)==false) (org.jlab.utils.groups.IndexedTable)
>
> fails. entries.hasItem() contains the following checks:
>
> —> calls IndexedList.hasItem(int... index) (org.jlab.utils.groups.IndexedList)
> —> has 2 internal checks:
> 1. if(index.length!=this.indexSize)
> 2. IndexGenerator.hashCode(index);
>
> In my experience, this implies that there is some kind of issue or inconsistency when accessing CCDB, at least for this particular run.
>
> Just my 2 cents.
> Best,
> Sylvester
>
>
>> On Mar 2, 2018, at 7:37 PM, Vardan Gyurjyan <gurjyan at jlab.org> wrote:
>>
>> Hi Raffaella,
>> I do not know what is exactly the cause but whenever I use one of these services in the data processing chain I get out memory exception. This exception is not recoverable. As I mention this happens every single time on clonfar0 node. I never saw this error on the farm machines though. May be FX can comment. I am getting these error on data over the ET as well as decoded files from FX’s decoded files directory.
>> Vardan
>> Sent from my iPhone
>>
>>> On Mar 2, 2018, at 4:53 PM, Raffaella De Vita <Raffaella.Devita at ge.infn.it> wrote:
>>>
>>> Hi Vardan,
>>> I'm the author of those services. The only change that was done recently was a modification of an hardcoded constant and, after that was done, FX cooked several files for FT studies. Nothing else was changed in more than one month. Anyway, I will try to reproduce the problem and debug it.What data are you processing?
>>> Regards,
>>> Raffaella
>>>
>>> Vardan Gyurjyan wrote:
>>>> Hi FX,
>>>>
>>>> Since I am not sure who is the FTCAL and FTHODO engines author I am cc-ing this email to clas12.
>>>> There is a critical bug introduced in these service engine’s code for the 5a.1.0 release. It is 100% reproducible on clonfarm0 node (online reconstruction node), where JVM crashes with the out of memory exception. Reconstruction chain without these services function properly.
>>>> These are service engines that also print for every event warning messages such as “ [IndexedTable] ---> error.. entry does not exist” (I consider them warning since if this is a real error the processing should be stoped). Any ways, this is a serious bug that can result in large number of job failures on the farm.
>>>>
>>>> -vardan
>>>> --------------------------------------------------
>>>> Vardan H. Gyurjyan, Ph.D.
>>>> Staff Scientist
>>>> Thomas Jefferson Accelerator Facility
>>>> Newport News, VA, 23606
>>>> E-mail: gurjyan at jlab.org
>>>> 757-269-5879 (JLAB)
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Clas12_software mailing list
>>>> Clas12_software at jlab.org
>>>> https://mailman.jlab.org/mailman/listinfo/clas12_software
>>>
>> _______________________________________________
>> Clas12_software mailing list
>> Clas12_software at jlab.org
>> https://mailman.jlab.org/mailman/listinfo/clas12_software
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/clas12_software/attachments/20180302/2c00fc67/attachment-0002.html>
More information about the Clas12_software
mailing list