[Clas12_software] Critical bug in 5a.1.0

Francois-Xavier Girod fxgirod at jlab.org
Fri Mar 2 20:25:15 EST 2018


I tagged the current master since there are many bug fixes already since
5a.1.0 from last week

Is there one person who has the expertise to systematically assess the DB
accesses of all services, or do we have to rely on every system expert to
certify that their service will not do DB access every event? I agree with
Vardan that this, and the systematic printing of error messages at every
event, need to be fixed in priority. This seems like it should be easy to
fix and can dramatically improve performances.

On Fri, Mar 2, 2018 at 8:13 PM, Vardan Gyurjyan <gurjyan at jlab.org> wrote:

> Hi Silvester,
> I also notice that FTCAL and FTHODO services are making many dB accesses
> during the initialization. I expect to see a single access to the database
> followed by a single dB disconnect.
> -Vardan
>
>
> Sent from my iPhone
>
> On Mar 2, 2018, at 7:50 PM, Sylvester J. Joosten <tuf42480 at temple.edu>
> wrote:
>
> Hi Rafaella, hi Vardan,
>
> In case this is useful: I recognize this error in the context of accessing
> tables from CCDB from COATJAVA. If this happens when
>
> if(this.entries.hasItem(index)==false) (org.jlab.utils.groups.
> IndexedTable)
>
> fails. entries.hasItem() contains the following checks:
>
> —> calls IndexedList.hasItem(int... index) (org.jlab.utils.groups.
> IndexedList)
> —> has 2 internal checks:
>          1. if(index.length!=this.indexSize)
>          2. IndexGenerator.hashCode(index);
>
> In my experience, this implies that there is some kind of issue or
> inconsistency when accessing CCDB, at least for this particular run.
>
> Just my 2 cents.
> Best,
> Sylvester
>
>
> On Mar 2, 2018, at 7:37 PM, Vardan Gyurjyan <gurjyan at jlab.org> wrote:
>
> Hi Raffaella,
> I do not know what is exactly the cause but whenever I use one of these
> services in the data processing chain I get out memory exception. This
> exception is not recoverable. As I mention this happens every single time
> on clonfar0 node. I never saw this error on the farm machines though. May
> be FX can comment.  I am getting these error on data over the ET as well as
> decoded files from FX’s decoded files directory.
> Vardan
> Sent from my iPhone
>
> On Mar 2, 2018, at 4:53 PM, Raffaella De Vita <Raffaella.Devita at ge.infn.it>
> wrote:
>
> Hi Vardan,
> I'm the author of those services. The only change that was done recently
> was a modification of an hardcoded constant and, after that was done, FX
> cooked several files for FT studies. Nothing else was changed in more than
> one month. Anyway, I will try  to reproduce the problem and debug it.What
> data are you processing?
> Regards,
>     Raffaella
>
> Vardan Gyurjyan wrote:
>
> Hi FX,
>
> Since I am not sure who is the  FTCAL and FTHODO engines author I am
> cc-ing this email to clas12.
> There is a critical bug introduced in these service engine’s code for the
> 5a.1.0 release. It is 100% reproducible on clonfarm0 node (online
> reconstruction node), where JVM crashes with the out of memory exception.
> Reconstruction chain without these services function properly.
> These are service engines that also print for every event warning messages
> such as “ [IndexedTable] ---> error.. entry does not exist” (I consider
> them warning since if this is a real error the processing should be
> stoped). Any ways, this is a serious bug that can result in large number of
> job failures on the farm.
>
> -vardan
> --------------------------------------------------
> Vardan H. Gyurjyan, Ph.D.
> Staff Scientist
> Thomas Jefferson Accelerator Facility
> Newport News, VA, 23606
> E-mail: gurjyan at jlab.org
> 757-269-5879 <(757)%20269-5879> (JLAB)
>
>
>
> _______________________________________________
> Clas12_software mailing listClas12_software at jlab.orghttps://mailman.jlab.org/mailman/listinfo/clas12_software
>
>
> _______________________________________________
> Clas12_software mailing list
> Clas12_software at jlab.org
> https://mailman.jlab.org/mailman/listinfo/clas12_software
>
>
>
> _______________________________________________
> Clas12_software mailing list
> Clas12_software at jlab.org
> https://mailman.jlab.org/mailman/listinfo/clas12_software
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/clas12_software/attachments/20180302/8658a3d9/attachment-0002.html>


More information about the Clas12_software mailing list