<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
Hi David and all,
<div class=""><br class="">
</div>
<div class="">I've run into some strange errors that seem possibly related to the email chain I dug up and attached below.</div>
<div class=""><br class="">
</div>
<div class="">I'm running 25 analysis jobs over ~100 rest hddm files in a single run (using the karst machine at IU).</div>
<div class=""><br class="">
</div>
<div class="">20 finished fine, but 5 crashed with this error:</div>
<div class=""><br class="">
</div>
<div class="">
<div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;" class="">
<span style="font-variant-ligatures: no-common-ligatures" class="">JANA ERROR>>-- ERROR: md5 checksum for the following resource file does not match expected</span></div>
<div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;" class="">
<span style="font-variant-ligatures: no-common-ligatures" class="">JANA ERROR>>-- /tmp/remitche/resources/Magnets/Solenoid/solenoid_1200A_poisson_20160222</span></div>
<div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;" class="">
<span style="font-variant-ligatures: no-common-ligatures" class="">JANA ERROR>>-- for the resource: </span></div>
<div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;" class="">
<span style="font-variant-ligatures: no-common-ligatures" class="">JANA ERROR>>-- URL_base =
<a href="https://halldweb.jlab.org/resources" class="">https://halldweb.jlab.org/resources</a></span></div>
<div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;" class="">
<span style="font-variant-ligatures: no-common-ligatures" class="">JANA ERROR>>-- path = Magnets/Solenoid/solenoid_1200A_poisson_20160222</span></div>
<div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;" class="">
<span style="font-variant-ligatures: no-common-ligatures" class="">JANA ERROR>>-- md5 = 9a70615a1a3e42236cf9c9dabd8cf546</span></div>
<div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;" class="">
<span style="font-variant-ligatures: no-common-ligatures" class="">JANA ERROR>>-- The md5sum for the existing file is: 902c0923bd13f626fe8cc4dc2e41f222</span></div>
<div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;" class="">
<span style="font-variant-ligatures: no-common-ligatures" class="">JANA ERROR>>--</span></div>
<div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;" class="">
<span style="font-variant-ligatures: no-common-ligatures" class="">JANA ERROR>>-- This can happen if the resource download was previously interrupted.</span></div>
<div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;" class="">
<span style="font-variant-ligatures: no-common-ligatures" class="">JANA ERROR>>-- Try removing the existing file and re-running to trigger a re-download.</span></div>
<div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;" class="">
<span style="font-variant-ligatures: no-common-ligatures" class="">JANA ERROR>>--</span></div>
<div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;" class="">
<span style="font-variant-ligatures: no-common-ligatures" class="">JANA ERROR>>-- This is a fatal error and the program will stop now. To bypass checking</span></div>
<div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;" class="">
<span style="font-variant-ligatures: no-common-ligatures" class="">JANA ERROR>>-- the md5sum, set the JANA:RESOURCE_CHECK_MD5 config. parameter to 0.</span></div>
<div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Menlo;" class="">
<span style="font-variant-ligatures: no-common-ligatures" class="">JANA ERROR>>--</span></div>
</div>
<div class=""><br class="">
</div>
<div class="">The message is the same for all 5 crashed jobs, but the "md5sum for the existing file" (line 7) is different for each.</div>
<div class=""><br class="">
</div>
<div class="">Was maybe one job downloading the file while others were trying to read it? Is there a way to avoid this?</div>
<div class=""><br class="">
</div>
<div class="">Thanks,</div>
<div class="">Ryan</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
<div>
<blockquote type="cite" class="">
<div class="">On Jul 18, 2014, at 11:52 AM, Matthew Shepherd <<a href="mailto:mashephe@indiana.edu" class="">mashephe@indiana.edu</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div class=""><br class="">
Great -- thanks.<br class="">
<br class="">
It was going to catch someone at some point. (I'm<br class="">
assuming I'm not the only one that hits ctrl-c when I realize<br class="">
that I've just executed a long program with <br class="">
incorrect arguments.) The field download takes<br class="">
a while and happens right at the time you realize you<br class="">
made an execution mistake.<br class="">
<br class="">
For such a large system with so many users and<br class="">
possible permutations of use it is really important<br class="">
to make every sanity check possible and abort<br class="">
immediately when something doesn't seem right.<br class="">
<br class="">
Matt<br class="">
<br class="">
---------------------------------------------------------------------<br class="">
Matthew Shepherd, Associate Professor<br class="">
Department of Physics, Indiana University, Swain West 265<br class="">
727 East Third Street, Bloomington, IN 47405<br class="">
<br class="">
Office Phone: +1 812 856 5808<br class="">
<br class="">
On Jul 18, 2014, at 10:23 AM, David Lawrence <<a href="mailto:davidl@jlab.org" class="">davidl@jlab.org</a>> wrote:<br class="">
<br class="">
<blockquote type="cite" class=""><br class="">
Hi Matt,<br class="">
<br class="">
Sorry this has cost you so much time. You’re the first person to<br class="">
report this as an issue in the 6 months since it was deployed. I’ll<br class="">
go ahead and put in the code to generate and check the md5<br class="">
checksum whenever a program is started so that an error can<br class="">
be flagged if there is a mismatch.<br class="">
<br class="">
Regards,<br class="">
-David<br class="">
<br class="">
On Jul 18, 2014, at 10:15 AM, Matthew Shepherd <<a href="mailto:mashephe@indiana.edu" class="">mashephe@indiana.edu</a>> wrote:<br class="">
<br class="">
<blockquote type="cite" class=""><br class="">
With bits and pieces from 3 different people, I think <br class="">
I've figured this out... gggrrrrrr!<br class="">
<br class="">
Mike Staib suggested my log seemed to indicate my field doesn't have<br class="">
enough z points.<br class="">
<br class="">
It seems like jana is using JANA_RESOURCE_DIR to cache fields. (I got<br class="">
this from Paul.. I've never set this, but grep and reading jana source <br class="">
tells me it is set by default to /tmp/username/resources.)<br class="">
<br class="">
If I go to that directory and delete it and rerun bfiled2root, I get <br class="">
a full field.<br class="">
<br class="">
What happened? Here's my theory:<br class="">
<br class="">
I ran a job and that job chose to download the field. This<br class="">
is the first time consuming thing that happens in a job. If, <br class="">
during the field download you kill the job with ctrl-c then <br class="">
you are left with a partial field.<br class="">
(I tested this out and was able to repeat it.)<br class="">
<br class="">
I must have been unlucky and hit ctrl-c on the job that<br class="">
tried to download the field. Note this is normal for must users.<br class="">
It is easy to execute a command accidentally or realize just<br class="">
after you press return that you didn't specify all the arguments<br class="">
that you wanted.<br class="">
<br class="">
This is a really nasty behavior because every subsequent job<br class="">
then just reads this partial field and never prints any message<br class="">
or error.<br class="">
<br class="">
Can we put some sort of check in the field? Write the number<br class="">
of points in the file first. And then on read back when there<br class="">
isn't than many points abort with an error.<br class="">
<br class="">
nasty nasty nasty... that cost Paul and me a ton of time<br class="">
this week<br class="">
<br class="">
Matt<br class="">
<br class="">
---------------------------------------------------------------------<br class="">
Matthew Shepherd, Associate Professor<br class="">
Department of Physics, Indiana University, Swain West 265<br class="">
727 East Third Street, Bloomington, IN 47405<br class="">
<br class="">
Office Phone: +1 812 856 5808<br class="">
<br class="">
On Jul 18, 2014, at 7:18 AM, David Lawrence <<a href="mailto:davidl@jlab.org" class="">davidl@jlab.org</a>> wrote:<br class="">
<br class="">
<blockquote type="cite" class=""><br class="">
Hi Matt,<br class="">
<br class="">
The output looks right. What happens if you run the bfield2root utility? This should<br class="">
be built as part of the default sim-recon build. (Source is in $HALLD_HOME/src/programs/Utilities/bfield2root)<br class="">
<br class="">
Draw the field map in ROOT using:<br class="">
<br class="">
<blockquote type="cite" class="">bfield2root<br class="">
root bfield.root<br class="">
</blockquote>
root [1] Bz_vs_r_vs_z->Draw("colz")<br class="">
<br class="">
Also, how are you checking that the field map is returning zeros?<br class="">
<br class="">
Regards,<br class="">
-David<br class="">
<br class="">
On Jul 17, 2014, at 9:49 PM, Matthew Shepherd <<a href="mailto:mashephe@indiana.edu" class="">mashephe@indiana.edu</a>> wrote:<br class="">
<br class="">
<blockquote type="cite" class=""><br class="">
Hi all,<br class="">
<br class="">
Paul and I have been trying to understand why I cannot<br class="">
get the example analysis software working and we seem<br class="">
to have traced the problem down to the fact that the <br class="">
magnetic field map is returning a zero field.<br class="">
<br class="">
Does anyone know how to debug this? The startup<br class="">
of the job suggests all is OK (I think):<br class="">
<br class="">
JANA >>URL: <a href="sqlite:////home/s4/mashephe/gluex/ccdb.sqlite" class="">sqlite:////home/s4/mashephe/gluex/ccdb.sqlite</a><br class="">
JANA >>context: default<br class="">
JANA >>Reading Magnetic field map from Magnets/Solenoid/solenoid_1350_poisson_20<br class="">
130925 ...<br class="">
Nx=251 Ny=1 Nz=43 ) at 0x7f46720a0ce0<br class="">
Fine-mesh evio file does not exist.<br class="">
Constructing the fine-mesh B-field map...<br class="">
rmin: 0 rmax: 88.5 dr: 0.1 zmin: 0 zmax: 600 dz: 0.1vg.: 0.0Hz) <br class="">
Number of points in z = 6000<br class="">
Number of points in r = 885<br class="">
JANA >>10599 entries found (Created Magnetic field map of type DMagneticFieldMap<br class="">
<br class="">
I'm using:<br class="">
<br class="">
sim-recon-2014-06-30<br class="">
ccdb_1.02<br class="">
hdds-2.1<br class="">
jana_0.7.1p3<br class="">
<br class="">
and a ccdb.sqlite file from July 14 (copied from Mark's web page link that day). I've also<br class="">
tried the ccdb.sqlite file from dc2 conditions.<br class="">
<br class="">
Matt<br class="">
<br class="">
<br class="">
---------------------------------------------------------------------<br class="">
Matthew Shepherd, Associate Professor<br class="">
Department of Physics, Indiana University, Swain West 265<br class="">
727 East Third Street, Bloomington, IN 47405<br class="">
<br class="">
Office Phone: +1 812 856 5808<br class="">
<br class="">
<br class="">
_______________________________________________<br class="">
Halld-offline mailing list<br class="">
<a href="mailto:Halld-offline@jlab.org" class="">Halld-offline@jlab.org</a><br class="">
https://mailman.jlab.org/mailman/listinfo/halld-offline<br class="">
</blockquote>
<br class="">
</blockquote>
<br class="">
</blockquote>
<br class="">
</blockquote>
<br class="">
<br class="">
_______________________________________________<br class="">
Halld-offline mailing list<br class="">
<a href="mailto:Halld-offline@jlab.org" class="">Halld-offline@jlab.org</a><br class="">
https://mailman.jlab.org/mailman/listinfo/halld-offline<br class="">
<br class="">
<br class="">
</div>
</div>
</blockquote>
</div>
<br class="">
</div>
</body>
</html>