[Clas12_software] Critical bug in 5a.1.0

Gagik Gavalian gavalian at jlab.org
Fri Mar 2 20:40:51 EST 2018


We have tools for accessing the database, which are safe. It is adviced that people use those to avoid problem, but it’s not enforced.

Gagik

Sent from my iPhone

> On Mar 2, 2018, at 8:31 PM, Francois-Xavier Girod <fxgirod at jlab.org> wrote:
> 
> Dear Vardan
> 
> Can we agree on the same file to test? It may be easier to start from a raw file and decode it so I might use a recent run from the "online" directory, or might even copy fresh data
> 
> I can run this on one of the clara machine tonight
> 
> Best regards
> FX
> 
>> On Fri, Mar 2, 2018 at 8:13 PM, Vardan Gyurjyan <gurjyan at jlab.org> wrote:
>> Hi Silvester,
>> I also notice that FTCAL and FTHODO services are making many dB accesses during the initialization. I expect to see a single access to the database followed by a single dB disconnect.
>> -Vardan
>> 
>> 
>> Sent from my iPhone
>> 
>>> On Mar 2, 2018, at 7:50 PM, Sylvester J. Joosten <tuf42480 at temple.edu> wrote:
>>> 
>>> Hi Rafaella, hi Vardan,
>>> 
>>> In case this is useful: I recognize this error in the context of accessing tables from CCDB from COATJAVA. If this happens when
>>> 
>>> if(this.entries.hasItem(index)==false) (org.jlab.utils.groups.IndexedTable)
>>> 
>>> fails. entries.hasItem() contains the following checks:
>>>  
>>> —> calls IndexedList.hasItem(int... index) (org.jlab.utils.groups.IndexedList)
>>> —> has 2 internal checks:
>>>          1. if(index.length!=this.indexSize)
>>>          2. IndexGenerator.hashCode(index);
>>> 
>>> In my experience, this implies that there is some kind of issue or inconsistency when accessing CCDB, at least for this particular run.
>>> 
>>> Just my 2 cents.
>>> Best,
>>> Sylvester
>>> 
>>> 
>>>> On Mar 2, 2018, at 7:37 PM, Vardan Gyurjyan <gurjyan at jlab.org> wrote:
>>>> 
>>>> Hi Raffaella,
>>>> I do not know what is exactly the cause but whenever I use one of these services in the data processing chain I get out memory exception. This exception is not recoverable. As I mention this happens every single time on clonfar0 node. I never saw this error on the farm machines though. May be FX can comment.  I am getting these error on data over the ET as well as decoded files from FX’s decoded files directory. 
>>>> Vardan
>>>> Sent from my iPhone
>>>> 
>>>>> On Mar 2, 2018, at 4:53 PM, Raffaella De Vita <Raffaella.Devita at ge.infn.it> wrote:
>>>>> 
>>>>> Hi Vardan,
>>>>> I'm the author of those services. The only change that was done recently was a modification of an hardcoded constant and, after that was done, FX cooked several files for FT studies. Nothing else was changed in more than one month. Anyway, I will try  to reproduce the problem and debug it.What data are you processing?
>>>>> Regards,
>>>>>     Raffaella
>>>>> 
>>>>> Vardan Gyurjyan wrote:
>>>>>> Hi FX,
>>>>>> 
>>>>>> Since I am not sure who is the  FTCAL and FTHODO engines author I am cc-ing this email to clas12.
>>>>>> There is a critical bug introduced in these service engine’s code for the 5a.1.0 release. It is 100% reproducible on clonfarm0 node (online reconstruction node), where JVM crashes with the out of memory exception. Reconstruction chain without these services function properly.
>>>>>> These are service engines that also print for every event warning messages such as “ [IndexedTable] ---> error.. entry does not exist” (I consider them warning since if this is a real error the processing should be stoped). Any ways, this is a serious bug that can result in large number of job failures on the farm.
>>>>>>  
>>>>>> -vardan
>>>>>> --------------------------------------------------
>>>>>> Vardan H. Gyurjyan, Ph.D.
>>>>>> Staff Scientist
>>>>>> Thomas Jefferson Accelerator Facility
>>>>>> Newport News, VA, 23606
>>>>>> E-mail: gurjyan at jlab.org
>>>>>> 757-269-5879 (JLAB)
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Clas12_software mailing list
>>>>>> Clas12_software at jlab.org
>>>>>> https://mailman.jlab.org/mailman/listinfo/clas12_software
>>>>> 
>>>> _______________________________________________
>>>> Clas12_software mailing list
>>>> Clas12_software at jlab.org
>>>> https://mailman.jlab.org/mailman/listinfo/clas12_software
>>> 
>> 
>> _______________________________________________
>> Clas12_software mailing list
>> Clas12_software at jlab.org
>> https://mailman.jlab.org/mailman/listinfo/clas12_software
> 
> _______________________________________________
> Clas12_software mailing list
> Clas12_software at jlab.org
> https://mailman.jlab.org/mailman/listinfo/clas12_software
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/clas12_software/attachments/20180302/0cbca8a8/attachment-0002.html>


More information about the Clas12_software mailing list