[Clas12_software] Critical bug in 5a.1.0
Francois-Xavier Girod
fxgirod at jlab.org
Fri Mar 2 21:38:20 EST 2018
I just installed the version 5a.1.1 and tested it on clara1602 with
/u/scratch/fxgirod/test/clas_003512.evio.1400.hipo
The log is attached
On Fri, Mar 2, 2018 at 8:51 PM, Nathan Baltzell <baltzell at jlab.org> wrote:
> Will be good to give people a reference model. I based EB’s CCDB access
> on ECAL's.
>
> -Nathan
>
>
> > On Mar 2, 2018, at 8:40 PM, Gagik Gavalian <gavalian at jlab.org> wrote:
> >
> > We have tools for accessing the database, which are safe. It is adviced
> that people use those to avoid problem, but it’s not enforced.
> >
> > Gagik
> >
> > Sent from my iPhone
> >
> > On Mar 2, 2018, at 8:31 PM, Francois-Xavier Girod <fxgirod at jlab.org>
> wrote:
> >
> >> Dear Vardan
> >>
> >> Can we agree on the same file to test? It may be easier to start from a
> raw file and decode it so I might use a recent run from the "online"
> directory, or might even copy fresh data
> >>
> >> I can run this on one of the clara machine tonight
> >>
> >> Best regards
> >> FX
> >>
> >> On Fri, Mar 2, 2018 at 8:13 PM, Vardan Gyurjyan <gurjyan at jlab.org>
> wrote:
> >> Hi Silvester,
> >> I also notice that FTCAL and FTHODO services are making many dB
> accesses during the initialization. I expect to see a single access to the
> database followed by a single dB disconnect.
> >> -Vardan
> >>
> >>
> >> Sent from my iPhone
> >>
> >> On Mar 2, 2018, at 7:50 PM, Sylvester J. Joosten <tuf42480 at temple.edu>
> wrote:
> >>
> >>> Hi Rafaella, hi Vardan,
> >>>
> >>> In case this is useful: I recognize this error in the context of
> accessing tables from CCDB from COATJAVA. If this happens when
> >>>
> >>> if(this.entries.hasItem(index)==false) (org.jlab.utils.groups.
> IndexedTable)
> >>>
> >>> fails. entries.hasItem() contains the following checks:
> >>>
> >>> —> calls IndexedList.hasItem(int... index) (org.jlab.utils.groups.
> IndexedList)
> >>> —> has 2 internal checks:
> >>> 1. if(index.length!=this.indexSize)
> >>> 2. IndexGenerator.hashCode(index);
> >>>
> >>> In my experience, this implies that there is some kind of issue or
> inconsistency when accessing CCDB, at least for this particular run.
> >>>
> >>> Just my 2 cents.
> >>> Best,
> >>> Sylvester
> >>>
> >>>
> >>>> On Mar 2, 2018, at 7:37 PM, Vardan Gyurjyan <gurjyan at jlab.org> wrote:
> >>>>
> >>>> Hi Raffaella,
> >>>> I do not know what is exactly the cause but whenever I use one of
> these services in the data processing chain I get out memory exception.
> This exception is not recoverable. As I mention this happens every single
> time on clonfar0 node. I never saw this error on the farm machines though.
> May be FX can comment. I am getting these error on data over the ET as
> well as decoded files from FX’s decoded files directory.
> >>>> Vardan
> >>>> Sent from my iPhone
> >>>>
> >>>> On Mar 2, 2018, at 4:53 PM, Raffaella De Vita <
> Raffaella.Devita at ge.infn.it> wrote:
> >>>>
> >>>>> Hi Vardan,
> >>>>> I'm the author of those services. The only change that was done
> recently was a modification of an hardcoded constant and, after that was
> done, FX cooked several files for FT studies. Nothing else was changed in
> more than one month. Anyway, I will try to reproduce the problem and debug
> it.What data are you processing?
> >>>>> Regards,
> >>>>> Raffaella
> >>>>>
> >>>>> Vardan Gyurjyan wrote:
> >>>>>> Hi FX,
> >>>>>>
> >>>>>> Since I am not sure who is the FTCAL and FTHODO engines author I
> am cc-ing this email to clas12.
> >>>>>> There is a critical bug introduced in these service engine’s code
> for the 5a.1.0 release. It is 100% reproducible on clonfarm0 node (online
> reconstruction node), where JVM crashes with the out of memory exception.
> Reconstruction chain without these services function properly.
> >>>>>> These are service engines that also print for every event warning
> messages such as “ [IndexedTable] ---> error.. entry does not exist” (I
> consider them warning since if this is a real error the processing should
> be stoped). Any ways, this is a serious bug that can result in large number
> of job failures on the farm.
> >>>>>>
> >>>>>> -vardan
> >>>>>> --------------------------------------------------
> >>>>>> Vardan H. Gyurjyan, Ph.D.
> >>>>>> Staff Scientist
> >>>>>> Thomas Jefferson Accelerator Facility
> >>>>>> Newport News, VA, 23606
> >>>>>> E-mail: gurjyan at jlab.org
> >>>>>> 757-269-5879 (JLAB)
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> ______________________________
> >>>>>> _________________
> >>>>>> Clas12_software mailing list
> >>>>>>
> >>>>>> Clas12_software at jlab.org
> >>>>>> https://mailman.jlab.org/mailman/listinfo/clas12_software
> >>>>>
> >>>> _______________________________________________
> >>>> Clas12_software mailing list
> >>>> Clas12_software at jlab.org
> >>>> https://mailman.jlab.org/mailman/listinfo/clas12_software
> >>>
> >>
> >> _______________________________________________
> >> Clas12_software mailing list
> >> Clas12_software at jlab.org
> >> https://mailman.jlab.org/mailman/listinfo/clas12_software
> >>
> >> _______________________________________________
> >> Clas12_software mailing list
> >> Clas12_software at jlab.org
> >> https://mailman.jlab.org/mailman/listinfo/clas12_software
> > _______________________________________________
> > Clas12_software mailing list
> > Clas12_software at jlab.org
> > https://mailman.jlab.org/mailman/listinfo/clas12_software
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/clas12_software/attachments/20180302/2363dfe5/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: clara_log
Type: application/octet-stream
Size: 162786 bytes
Desc: not available
URL: <https://mailman.jlab.org/pipermail/clas12_software/attachments/20180302/2363dfe5/attachment-0002.obj>
More information about the Clas12_software
mailing list