[Clas12_software] Critical bug in 5a.1.0
Raffaella De Vita
Raffaella.Devita at ge.infn.it
Sat Mar 3 11:07:12 EST 2018
Hi,
The FT services use the ConstantManager, similarly to the ec, except for accessing the geometry table because that was created long ago and doesn’t have the sector, layer, component indexes. The errors Vardan reported seems to be related to ccdb acceded via the ConstantsManager since that retrieves IndexedTables but the fact that those happen only on the online machine is very puzzling.
Vardan, can you send the log with the errors you get? Also, can I try myself on clonfarm0 to see what happens?
Thanks,
Raffaella
> Il giorno 03 mar 2018, alle ore 03:38, Francois-Xavier Girod <fxgirod at jlab.org> ha scritto:
>
> I just installed the version 5a.1.1 and tested it on clara1602 with
>
> /u/scratch/fxgirod/test/clas_003512.evio.1400.hipo
>
> The log is attached
>
>
>
>> On Fri, Mar 2, 2018 at 8:51 PM, Nathan Baltzell <baltzell at jlab.org> wrote:
>> Will be good to give people a reference model. I based EB’s CCDB access on ECAL's.
>>
>> -Nathan
>>
>>
>> > On Mar 2, 2018, at 8:40 PM, Gagik Gavalian <gavalian at jlab.org> wrote:
>> >
>> > We have tools for accessing the database, which are safe. It is adviced that people use those to avoid problem, but it’s not enforced.
>> >
>> > Gagik
>> >
>> > Sent from my iPhone
>> >
>> > On Mar 2, 2018, at 8:31 PM, Francois-Xavier Girod <fxgirod at jlab.org> wrote:
>> >
>> >> Dear Vardan
>> >>
>> >> Can we agree on the same file to test? It may be easier to start from a raw file and decode it so I might use a recent run from the "online" directory, or might even copy fresh data
>> >>
>> >> I can run this on one of the clara machine tonight
>> >>
>> >> Best regards
>> >> FX
>> >>
>> >> On Fri, Mar 2, 2018 at 8:13 PM, Vardan Gyurjyan <gurjyan at jlab.org> wrote:
>> >> Hi Silvester,
>> >> I also notice that FTCAL and FTHODO services are making many dB accesses during the initialization. I expect to see a single access to the database followed by a single dB disconnect.
>> >> -Vardan
>> >>
>> >>
>> >> Sent from my iPhone
>> >>
>> >> On Mar 2, 2018, at 7:50 PM, Sylvester J. Joosten <tuf42480 at temple.edu> wrote:
>> >>
>> >>> Hi Rafaella, hi Vardan,
>> >>>
>> >>> In case this is useful: I recognize this error in the context of accessing tables from CCDB from COATJAVA. If this happens when
>> >>>
>> >>> if(this.entries.hasItem(index)==false) (org.jlab.utils.groups.IndexedTable)
>> >>>
>> >>> fails. entries.hasItem() contains the following checks:
>> >>>
>> >>> —> calls IndexedList.hasItem(int... index) (org.jlab.utils.groups.IndexedList)
>> >>> —> has 2 internal checks:
>> >>> 1. if(index.length!=this.indexSize)
>> >>> 2. IndexGenerator.hashCode(index);
>> >>>
>> >>> In my experience, this implies that there is some kind of issue or inconsistency when accessing CCDB, at least for this particular run.
>> >>>
>> >>> Just my 2 cents.
>> >>> Best,
>> >>> Sylvester
>> >>>
>> >>>
>> >>>> On Mar 2, 2018, at 7:37 PM, Vardan Gyurjyan <gurjyan at jlab.org> wrote:
>> >>>>
>> >>>> Hi Raffaella,
>> >>>> I do not know what is exactly the cause but whenever I use one of these services in the data processing chain I get out memory exception. This exception is not recoverable. As I mention this happens every single time on clonfar0 node. I never saw this error on the farm machines though. May be FX can comment. I am getting these error on data over the ET as well as decoded files from FX’s decoded files directory.
>> >>>> Vardan
>> >>>> Sent from my iPhone
>> >>>>
>> >>>> On Mar 2, 2018, at 4:53 PM, Raffaella De Vita <Raffaella.Devita at ge.infn.it> wrote:
>> >>>>
>> >>>>> Hi Vardan,
>> >>>>> I'm the author of those services. The only change that was done recently was a modification of an hardcoded constant and, after that was done, FX cooked several files for FT studies. Nothing else was changed in more than one month. Anyway, I will try to reproduce the problem and debug it.What data are you processing?
>> >>>>> Regards,
>> >>>>> Raffaella
>> >>>>>
>> >>>>> Vardan Gyurjyan wrote:
>> >>>>>> Hi FX,
>> >>>>>>
>> >>>>>> Since I am not sure who is the FTCAL and FTHODO engines author I am cc-ing this email to clas12.
>> >>>>>> There is a critical bug introduced in these service engine’s code for the 5a.1.0 release. It is 100% reproducible on clonfarm0 node (online reconstruction node), where JVM crashes with the out of memory exception. Reconstruction chain without these services function properly.
>> >>>>>> These are service engines that also print for every event warning messages such as “ [IndexedTable] ---> error.. entry does not exist” (I consider them warning since if this is a real error the processing should be stoped). Any ways, this is a serious bug that can result in large number of job failures on the farm.
>> >>>>>>
>> >>>>>> -vardan
>> >>>>>> --------------------------------------------------
>> >>>>>> Vardan H. Gyurjyan, Ph.D.
>> >>>>>> Staff Scientist
>> >>>>>> Thomas Jefferson Accelerator Facility
>> >>>>>> Newport News, VA, 23606
>> >>>>>> E-mail: gurjyan at jlab.org
>> >>>>>> 757-269-5879 (JLAB)
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> ______________________________
>> >>>>>> _________________
>> >>>>>> Clas12_software mailing list
>> >>>>>>
>> >>>>>> Clas12_software at jlab.org
>> >>>>>> https://mailman.jlab.org/mailman/listinfo/clas12_software
>> >>>>>
>> >>>> _______________________________________________
>> >>>> Clas12_software mailing list
>> >>>> Clas12_software at jlab.org
>> >>>> https://mailman.jlab.org/mailman/listinfo/clas12_software
>> >>>
>> >>
>> >> _______________________________________________
>> >> Clas12_software mailing list
>> >> Clas12_software at jlab.org
>> >> https://mailman.jlab.org/mailman/listinfo/clas12_software
>> >>
>> >> _______________________________________________
>> >> Clas12_software mailing list
>> >> Clas12_software at jlab.org
>> >> https://mailman.jlab.org/mailman/listinfo/clas12_software
>> > _______________________________________________
>> > Clas12_software mailing list
>> > Clas12_software at jlab.org
>> > https://mailman.jlab.org/mailman/listinfo/clas12_software
>>
>
> <clara_log>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/clas12_software/attachments/20180303/cb82bd69/attachment-0002.html>
More information about the Clas12_software
mailing list