[G12] Simulation failure
Michael C. Kunkel
mkunkel at jlab.org
Fri Aug 29 17:53:05 EDT 2014
Greetings,
I did not check which farm nodes they were because it was not only I who
had jobs crash. Another member had 2 jobs complete while 8 jobs crashed
in the a1c stage. I will verify nodes though, might give insight.
As to Paolones statement, I was able to run a simulation as well a few
weeks ago. I ran the same simulation that I ran weeks ago and that
crashed, meaning less than a month ago the same simulation jobs passed
while now they crash, all jobs comparisons were done identical, i.e.
same random seed, ffread, build, environment etc. The only difference
was when they were submitted. I remember you (Paolone) saying you had
issues with pion simulation, when you fixed it, did you touch anything
with the build? I ask because your email is vague on this, you say "
I haven't touched my build at all since then
"
So does that mean the build was touched at some point when you were
fixing it?
BR
MK
On 8/29/14 4:51 PM, Eugene Pasyuk wrote:
> MK, did you check on which farm nodes you jobs crashed? Some times one farm node goes bad and all jobs crash on it.
>
> -Eugene
>
> ----- Original Message -----
>> From: "Michael Paolone" <mpaolone at jlab.org>
>> To: "Michael C. Kunkel" <mkunkel at jlab.org>
>> Cc: "g12" <g12 at jlab.org>
>> Sent: Friday, August 29, 2014 4:37:23 PM
>> Subject: Re: [G12] Simulation failure
>>
>> Just FYI, I was able to run a simulation just a week ago with no
>> errors.
>> Also, I haven't touched my build at all since then. Since there is
>> likely
>> more than just a difference in build between Will's and MK's
>> simulation
>> (i.e. database, environment, ffread, gpp params, etc..), the problem
>> could
>> be elsewhere. One easy way to check would be for Will to run
>> everything
>> the same except with my a1c, or MK run everything the same except
>> with the
>> group build.
>>
>> -Michael
>>
>>
>>> Thanks Will,
>>>
>>> So it appears something ha changed pertaing to Paolones build.
>>>
>>> BR
>>> MK
>>>
>>>
>>> On 8/29/14 3:40 PM, Will Phelps wrote:
>>>> Hey MK,
>>>> I submitted ~500 jobs (5 million events) during that meeting using
>>>> the
>>>> standard clas build and run 56855 and it looks like everything
>>>> passed.
>>>> -Will
>>>>
>>>> On Aug 29, 2014, at 3:31 PM, Michael C. Kunkel <mkunkel at jlab.org>
>>>> wrote:
>>>>
>>>>> Greetings,
>>>>>
>>>>> In the last meeting I spoke of the simulation issue I was seeing
>>>>> with
>>>>> Harsh. I have ran 10 jobs that I know ran 1 month ago and all
>>>>> fail for
>>>>> the exact a1c error that was shown during the meeting.
>>>>>
>>>>> What shall we do next? Is it possbile for Paolone to rebuild his
>>>>> a1c?
>>>>>
>>>>> BR
>>>>> MK
>>>>> _______________________________________________
>>>>> G12 mailing list
>>>>> G12 at jlab.org
>>>>> https://mailman.jlab.org/mailman/listinfo/g12
>>>
>>> BR
>>> MK
>>> _______________________________________________
>>> G12 mailing list
>>> G12 at jlab.org
>>> https://mailman.jlab.org/mailman/listinfo/g12
>>>
>>
>> _______________________________________________
>> G12 mailing list
>> G12 at jlab.org
>> https://mailman.jlab.org/mailman/listinfo/g12
>>
BR
MK
More information about the G12
mailing list