[Clas12_software] Critical bug in 5a.1.0

Nathan Baltzell baltzell at jlab.org
Fri Mar 2 20:51:40 EST 2018


Will be good to give people a reference model.  I based EB’s CCDB access on ECAL's.

-Nathan


> On Mar 2, 2018, at 8:40 PM, Gagik Gavalian <gavalian at jlab.org> wrote:
> 
> We have tools for accessing the database, which are safe. It is adviced that people use those to avoid problem, but it’s not enforced.
> 
> Gagik
> 
> Sent from my iPhone
> 
> On Mar 2, 2018, at 8:31 PM, Francois-Xavier Girod <fxgirod at jlab.org> wrote:
> 
>> Dear Vardan
>> 
>> Can we agree on the same file to test? It may be easier to start from a raw file and decode it so I might use a recent run from the "online" directory, or might even copy fresh data
>> 
>> I can run this on one of the clara machine tonight
>> 
>> Best regards
>> FX
>> 
>> On Fri, Mar 2, 2018 at 8:13 PM, Vardan Gyurjyan <gurjyan at jlab.org> wrote:
>> Hi Silvester,
>> I also notice that FTCAL and FTHODO services are making many dB accesses during the initialization. I expect to see a single access to the database followed by a single dB disconnect.
>> -Vardan
>> 
>> 
>> Sent from my iPhone
>> 
>> On Mar 2, 2018, at 7:50 PM, Sylvester J. Joosten <tuf42480 at temple.edu> wrote:
>> 
>>> Hi Rafaella, hi Vardan,
>>> 
>>> In case this is useful: I recognize this error in the context of accessing tables from CCDB from COATJAVA. If this happens when
>>> 
>>> if(this.entries.hasItem(index)==false) (org.jlab.utils.groups.IndexedTable)
>>> 
>>> fails. entries.hasItem() contains the following checks:
>>>  
>>> —> calls IndexedList.hasItem(int... index) (org.jlab.utils.groups.IndexedList)
>>> —> has 2 internal checks:
>>>          1. if(index.length!=this.indexSize)
>>>          2. IndexGenerator.hashCode(index);
>>> 
>>> In my experience, this implies that there is some kind of issue or inconsistency when accessing CCDB, at least for this particular run.
>>> 
>>> Just my 2 cents.
>>> Best,
>>> Sylvester
>>> 
>>> 
>>>> On Mar 2, 2018, at 7:37 PM, Vardan Gyurjyan <gurjyan at jlab.org> wrote:
>>>> 
>>>> Hi Raffaella,
>>>> I do not know what is exactly the cause but whenever I use one of these services in the data processing chain I get out memory exception. This exception is not recoverable. As I mention this happens every single time on clonfar0 node. I never saw this error on the farm machines though. May be FX can comment.  I am getting these error on data over the ET as well as decoded files from FX’s decoded files directory. 
>>>> Vardan
>>>> Sent from my iPhone
>>>> 
>>>> On Mar 2, 2018, at 4:53 PM, Raffaella De Vita <Raffaella.Devita at ge.infn.it> wrote:
>>>> 
>>>>> Hi Vardan,
>>>>> I'm the author of those services. The only change that was done recently was a modification of an hardcoded constant and, after that was done, FX cooked several files for FT studies. Nothing else was changed in more than one month. Anyway, I will try  to reproduce the problem and debug it.What data are you processing?
>>>>> Regards,
>>>>>     Raffaella
>>>>> 
>>>>> Vardan Gyurjyan wrote:
>>>>>> Hi FX,
>>>>>> 
>>>>>> Since I am not sure who is the  FTCAL and FTHODO engines author I am cc-ing this email to clas12.
>>>>>> There is a critical bug introduced in these service engine’s code for the 5a.1.0 release. It is 100% reproducible on clonfarm0 node (online reconstruction node), where JVM crashes with the out of memory exception. Reconstruction chain without these services function properly.
>>>>>> These are service engines that also print for every event warning messages such as “ [IndexedTable] ---> error.. entry does not exist” (I consider them warning since if this is a real error the processing should be stoped). Any ways, this is a serious bug that can result in large number of job failures on the farm.
>>>>>>  
>>>>>> -vardan
>>>>>> --------------------------------------------------
>>>>>> Vardan H. Gyurjyan, Ph.D.
>>>>>> Staff Scientist
>>>>>> Thomas Jefferson Accelerator Facility
>>>>>> Newport News, VA, 23606
>>>>>> E-mail: gurjyan at jlab.org
>>>>>> 757-269-5879 (JLAB)
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> ______________________________
>>>>>> _________________
>>>>>> Clas12_software mailing list
>>>>>> 
>>>>>> Clas12_software at jlab.org
>>>>>> https://mailman.jlab.org/mailman/listinfo/clas12_software
>>>>> 
>>>> _______________________________________________
>>>> Clas12_software mailing list
>>>> Clas12_software at jlab.org
>>>> https://mailman.jlab.org/mailman/listinfo/clas12_software
>>> 
>> 
>> _______________________________________________
>> Clas12_software mailing list
>> Clas12_software at jlab.org
>> https://mailman.jlab.org/mailman/listinfo/clas12_software
>> 
>> _______________________________________________
>> Clas12_software mailing list
>> Clas12_software at jlab.org
>> https://mailman.jlab.org/mailman/listinfo/clas12_software
> _______________________________________________
> Clas12_software mailing list
> Clas12_software at jlab.org
> https://mailman.jlab.org/mailman/listinfo/clas12_software





More information about the Clas12_software mailing list