<div dir="ltr">Hi Ryan,<div><br></div><div>I think that you've correctly diagnosed the problem. The base directory that the resources are stored in is given by the environment variable JANA_RESOURCE_DIR. A couple solutions to the problem come to mind:</div><div><br></div><div>1) Create a separate temporary JANA_RESOURCE_DIR for each job in some temporary area. Perhaps your batch system makes this easy, but I'd only suggest this if you have a caching http proxy, or you will make halldweb rather sad.</div><div><br></div><div>2) If you have a shared disk that the jobs can access, set up your own JANA_RESOURCE_DIR on that disk, and run a short interactive job to populate it before running your batch jobs.</div><div><br></div><div>I haven't encountered this problem myself in awhile, but maybe a longer term fix would be to add lock files to the download mechanism in JANA.</div><div><br></div><div>Cheers,</div><div>Sean</div><div><br></div><br><div class="gmail_quote"><div dir="ltr">On Wed, Oct 19, 2016 at 4:17 PM Mitchell, Ryan Edward <<a href="mailto:remitche@indiana.edu">remitche@indiana.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word" class="gmail_msg">
Hi David and all,
<div class="gmail_msg"><br class="gmail_msg">
</div>
<div class="gmail_msg">I've run into some strange errors that seem possibly related to the email chain I dug up and attached below.</div>
<div class="gmail_msg"><br class="gmail_msg">
</div>
<div class="gmail_msg">I'm running 25 analysis jobs over ~100 rest hddm files in a single run (using the karst machine at IU).</div>
<div class="gmail_msg"><br class="gmail_msg">
</div>
<div class="gmail_msg">20 finished fine, but 5 crashed with this error:</div>
<div class="gmail_msg"><br class="gmail_msg">
</div>
<div class="gmail_msg">
<div style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo" class="gmail_msg">
<span style="font-variant-ligatures:no-common-ligatures" class="gmail_msg">JANA ERROR>>-- ERROR: md5 checksum for the following resource file does not match expected</span></div>
<div style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo" class="gmail_msg">
<span style="font-variant-ligatures:no-common-ligatures" class="gmail_msg">JANA ERROR>>-- /tmp/remitche/resources/Magnets/Solenoid/solenoid_1200A_poisson_20160222</span></div>
<div style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo" class="gmail_msg">
<span style="font-variant-ligatures:no-common-ligatures" class="gmail_msg">JANA ERROR>>-- for the resource: </span></div>
<div style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo" class="gmail_msg">
<span style="font-variant-ligatures:no-common-ligatures" class="gmail_msg">JANA ERROR>>-- URL_base =
<a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__halldweb.jlab.org_resources&d=CwMGaQ&c=yHlS04HhBraes5BQ9ueu5zKhE7rtNXt_d012z2PA6ws&r=y4ZD58I4nPR6tZqjerSGt-WlWVAhqa3FHDMXqQ_5aUc&m=3mGfeOL4DrxBWZ9bwJi0dT9WjwctIuk6aDBus5jBy8c&s=neH7lf0djm3Rb5nH94rZE125gp4S3pCDAB0hhTtkfZ0&e=" class="gmail_msg" target="_blank">
https://halldweb.jlab.org/resources</a></span></div>
<div style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo" class="gmail_msg">
<span style="font-variant-ligatures:no-common-ligatures" class="gmail_msg">JANA ERROR>>-- path = Magnets/Solenoid/solenoid_1200A_poisson_20160222</span></div>
<div style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo" class="gmail_msg">
<span style="font-variant-ligatures:no-common-ligatures" class="gmail_msg">JANA ERROR>>-- md5 = 9a70615a1a3e42236cf9c9dabd8cf546</span></div>
<div style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo" class="gmail_msg">
<span style="font-variant-ligatures:no-common-ligatures" class="gmail_msg">JANA ERROR>>-- The md5sum for the existing file is: 902c0923bd13f626fe8cc4dc2e41f222</span></div>
<div style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo" class="gmail_msg">
<span style="font-variant-ligatures:no-common-ligatures" class="gmail_msg">JANA ERROR>>--</span></div>
<div style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo" class="gmail_msg">
<span style="font-variant-ligatures:no-common-ligatures" class="gmail_msg">JANA ERROR>>-- This can happen if the resource download was previously interrupted.</span></div>
<div style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo" class="gmail_msg">
<span style="font-variant-ligatures:no-common-ligatures" class="gmail_msg">JANA ERROR>>-- Try removing the existing file and re-running to trigger a re-download.</span></div>
<div style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo" class="gmail_msg">
<span style="font-variant-ligatures:no-common-ligatures" class="gmail_msg">JANA ERROR>>--</span></div>
<div style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo" class="gmail_msg">
<span style="font-variant-ligatures:no-common-ligatures" class="gmail_msg">JANA ERROR>>-- This is a fatal error and the program will stop now. To bypass checking</span></div>
<div style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo" class="gmail_msg">
<span style="font-variant-ligatures:no-common-ligatures" class="gmail_msg">JANA ERROR>>-- the md5sum, set the JANA:RESOURCE_CHECK_MD5 config. parameter to 0.</span></div>
<div style="margin:0px;font-size:11px;line-height:normal;font-family:Menlo" class="gmail_msg">
<span style="font-variant-ligatures:no-common-ligatures" class="gmail_msg">JANA ERROR>>--</span></div>
</div>
<div class="gmail_msg"><br class="gmail_msg">
</div>
<div class="gmail_msg">The message is the same for all 5 crashed jobs, but the "md5sum for the existing file" (line 7) is different for each.</div>
<div class="gmail_msg"><br class="gmail_msg">
</div>
<div class="gmail_msg">Was maybe one job downloading the file while others were trying to read it? Is there a way to avoid this?</div>
<div class="gmail_msg"><br class="gmail_msg">
</div>
<div class="gmail_msg">Thanks,</div>
<div class="gmail_msg">Ryan</div>
<div class="gmail_msg"><br class="gmail_msg">
</div>
<div class="gmail_msg"><br class="gmail_msg">
</div>
<div class="gmail_msg"><br class="gmail_msg">
</div>
<div class="gmail_msg"><br class="gmail_msg">
<div class="gmail_msg">
<blockquote type="cite" class="gmail_msg">
<div class="gmail_msg">On Jul 18, 2014, at 11:52 AM, Matthew Shepherd <<a href="mailto:mashephe@indiana.edu" class="gmail_msg" target="_blank">mashephe@indiana.edu</a>> wrote:</div>
<br class="m_7272553508580952553Apple-interchange-newline gmail_msg">
<div class="gmail_msg">
<div class="gmail_msg"><br class="gmail_msg">
Great -- thanks.<br class="gmail_msg">
<br class="gmail_msg">
It was going to catch someone at some point. (I'm<br class="gmail_msg">
assuming I'm not the only one that hits ctrl-c when I realize<br class="gmail_msg">
that I've just executed a long program with <br class="gmail_msg">
incorrect arguments.) The field download takes<br class="gmail_msg">
a while and happens right at the time you realize you<br class="gmail_msg">
made an execution mistake.<br class="gmail_msg">
<br class="gmail_msg">
For such a large system with so many users and<br class="gmail_msg">
possible permutations of use it is really important<br class="gmail_msg">
to make every sanity check possible and abort<br class="gmail_msg">
immediately when something doesn't seem right.<br class="gmail_msg">
<br class="gmail_msg">
Matt<br class="gmail_msg">
<br class="gmail_msg">
---------------------------------------------------------------------<br class="gmail_msg">
Matthew Shepherd, Associate Professor<br class="gmail_msg">
Department of Physics, Indiana University, Swain West 265<br class="gmail_msg">
727 East Third Street, Bloomington, IN 47405<br class="gmail_msg">
<br class="gmail_msg">
Office Phone: <a href="tel:(812)%20856-5808" value="+18128565808" class="gmail_msg" target="_blank">+1 812 856 5808</a><br class="gmail_msg">
<br class="gmail_msg">
On Jul 18, 2014, at 10:23 AM, David Lawrence <<a href="mailto:davidl@jlab.org" class="gmail_msg" target="_blank">davidl@jlab.org</a>> wrote:<br class="gmail_msg">
<br class="gmail_msg">
<blockquote type="cite" class="gmail_msg"><br class="gmail_msg">
Hi Matt,<br class="gmail_msg">
<br class="gmail_msg">
Sorry this has cost you so much time. You’re the first person to<br class="gmail_msg">
report this as an issue in the 6 months since it was deployed. I’ll<br class="gmail_msg">
go ahead and put in the code to generate and check the md5<br class="gmail_msg">
checksum whenever a program is started so that an error can<br class="gmail_msg">
be flagged if there is a mismatch.<br class="gmail_msg">
<br class="gmail_msg">
Regards,<br class="gmail_msg">
-David<br class="gmail_msg">
<br class="gmail_msg">
On Jul 18, 2014, at 10:15 AM, Matthew Shepherd <<a href="mailto:mashephe@indiana.edu" class="gmail_msg" target="_blank">mashephe@indiana.edu</a>> wrote:<br class="gmail_msg">
<br class="gmail_msg">
<blockquote type="cite" class="gmail_msg"><br class="gmail_msg">
With bits and pieces from 3 different people, I think <br class="gmail_msg">
I've figured this out... gggrrrrrr!<br class="gmail_msg">
<br class="gmail_msg">
Mike Staib suggested my log seemed to indicate my field doesn't have<br class="gmail_msg">
enough z points.<br class="gmail_msg">
<br class="gmail_msg">
It seems like jana is using JANA_RESOURCE_DIR to cache fields. (I got<br class="gmail_msg">
this from Paul.. I've never set this, but grep and reading jana source <br class="gmail_msg">
tells me it is set by default to /tmp/username/resources.)<br class="gmail_msg">
<br class="gmail_msg">
If I go to that directory and delete it and rerun bfiled2root, I get <br class="gmail_msg">
a full field.<br class="gmail_msg">
<br class="gmail_msg">
What happened? Here's my theory:<br class="gmail_msg">
<br class="gmail_msg">
I ran a job and that job chose to download the field. This<br class="gmail_msg">
is the first time consuming thing that happens in a job. If, <br class="gmail_msg">
during the field download you kill the job with ctrl-c then <br class="gmail_msg">
you are left with a partial field.<br class="gmail_msg">
(I tested this out and was able to repeat it.)<br class="gmail_msg">
<br class="gmail_msg">
I must have been unlucky and hit ctrl-c on the job that<br class="gmail_msg">
tried to download the field. Note this is normal for must users.<br class="gmail_msg">
It is easy to execute a command accidentally or realize just<br class="gmail_msg">
after you press return that you didn't specify all the arguments<br class="gmail_msg">
that you wanted.<br class="gmail_msg">
<br class="gmail_msg">
This is a really nasty behavior because every subsequent job<br class="gmail_msg">
then just reads this partial field and never prints any message<br class="gmail_msg">
or error.<br class="gmail_msg">
<br class="gmail_msg">
Can we put some sort of check in the field? Write the number<br class="gmail_msg">
of points in the file first. And then on read back when there<br class="gmail_msg">
isn't than many points abort with an error.<br class="gmail_msg">
<br class="gmail_msg">
nasty nasty nasty... that cost Paul and me a ton of time<br class="gmail_msg">
this week<br class="gmail_msg">
<br class="gmail_msg">
Matt<br class="gmail_msg">
<br class="gmail_msg">
---------------------------------------------------------------------<br class="gmail_msg">
Matthew Shepherd, Associate Professor<br class="gmail_msg">
Department of Physics, Indiana University, Swain West 265<br class="gmail_msg">
727 East Third Street, Bloomington, IN 47405<br class="gmail_msg">
<br class="gmail_msg">
Office Phone: <a href="tel:(812)%20856-5808" value="+18128565808" class="gmail_msg" target="_blank">+1 812 856 5808</a><br class="gmail_msg">
<br class="gmail_msg">
On Jul 18, 2014, at 7:18 AM, David Lawrence <<a href="mailto:davidl@jlab.org" class="gmail_msg" target="_blank">davidl@jlab.org</a>> wrote:<br class="gmail_msg">
<br class="gmail_msg">
<blockquote type="cite" class="gmail_msg"><br class="gmail_msg">
Hi Matt,<br class="gmail_msg">
<br class="gmail_msg">
The output looks right. What happens if you run the bfield2root utility? This should<br class="gmail_msg">
be built as part of the default sim-recon build. (Source is in $HALLD_HOME/src/programs/Utilities/bfield2root)<br class="gmail_msg">
<br class="gmail_msg">
Draw the field map in ROOT using:<br class="gmail_msg">
<br class="gmail_msg">
<blockquote type="cite" class="gmail_msg">bfield2root<br class="gmail_msg">
root bfield.root<br class="gmail_msg">
</blockquote>
root [1] Bz_vs_r_vs_z->Draw("colz")<br class="gmail_msg">
<br class="gmail_msg">
Also, how are you checking that the field map is returning zeros?<br class="gmail_msg">
<br class="gmail_msg">
Regards,<br class="gmail_msg">
-David<br class="gmail_msg">
<br class="gmail_msg">
On Jul 17, 2014, at 9:49 PM, Matthew Shepherd <<a href="mailto:mashephe@indiana.edu" class="gmail_msg" target="_blank">mashephe@indiana.edu</a>> wrote:<br class="gmail_msg">
<br class="gmail_msg">
<blockquote type="cite" class="gmail_msg"><br class="gmail_msg">
Hi all,<br class="gmail_msg">
<br class="gmail_msg">
Paul and I have been trying to understand why I cannot<br class="gmail_msg">
get the example analysis software working and we seem<br class="gmail_msg">
to have traced the problem down to the fact that the <br class="gmail_msg">
magnetic field map is returning a zero field.<br class="gmail_msg">
<br class="gmail_msg">
Does anyone know how to debug this? The startup<br class="gmail_msg">
of the job suggests all is OK (I think):<br class="gmail_msg">
<br class="gmail_msg">
JANA >>URL: <a class="gmail_msg">sqlite:////home/s4/mashephe/gluex/ccdb.sqlite</a><br class="gmail_msg">
JANA >>context: default<br class="gmail_msg">
JANA >>Reading Magnetic field map from Magnets/Solenoid/solenoid_1350_poisson_20<br class="gmail_msg">
130925 ...<br class="gmail_msg">
Nx=251 Ny=1 Nz=43 ) at 0x7f46720a0ce0<br class="gmail_msg">
Fine-mesh evio file does not exist.<br class="gmail_msg">
Constructing the fine-mesh B-field map...<br class="gmail_msg">
rmin: 0 rmax: 88.5 dr: 0.1 zmin: 0 zmax: 600 dz: 0.1vg.: 0.0Hz) <br class="gmail_msg">
Number of points in z = 6000<br class="gmail_msg">
Number of points in r = 885<br class="gmail_msg">
JANA >>10599 entries found (Created Magnetic field map of type DMagneticFieldMap<br class="gmail_msg">
<br class="gmail_msg">
I'm using:<br class="gmail_msg">
<br class="gmail_msg">
sim-recon-2014-06-30<br class="gmail_msg">
ccdb_1.02<br class="gmail_msg">
hdds-2.1<br class="gmail_msg">
jana_0.7.1p3<br class="gmail_msg">
<br class="gmail_msg">
and a ccdb.sqlite file from July 14 (copied from Mark's web page link that day). I've also<br class="gmail_msg">
tried the ccdb.sqlite file from dc2 conditions.<br class="gmail_msg">
<br class="gmail_msg">
Matt<br class="gmail_msg">
<br class="gmail_msg">
<br class="gmail_msg">
---------------------------------------------------------------------<br class="gmail_msg">
Matthew Shepherd, Associate Professor<br class="gmail_msg">
Department of Physics, Indiana University, Swain West 265<br class="gmail_msg">
727 East Third Street, Bloomington, IN 47405<br class="gmail_msg">
<br class="gmail_msg">
Office Phone: <a href="tel:(812)%20856-5808" value="+18128565808" class="gmail_msg" target="_blank">+1 812 856 5808</a><br class="gmail_msg">
<br class="gmail_msg">
<br class="gmail_msg">
_______________________________________________<br class="gmail_msg">
Halld-offline mailing list<br class="gmail_msg">
<a href="mailto:Halld-offline@jlab.org" class="gmail_msg" target="_blank">Halld-offline@jlab.org</a><br class="gmail_msg">
<a href="https://mailman.jlab.org/mailman/listinfo/halld-offline" class="gmail_msg" target="_blank">https://mailman.jlab.org/mailman/listinfo/halld-offline</a><br class="gmail_msg">
</blockquote>
<br class="gmail_msg">
</blockquote>
<br class="gmail_msg">
</blockquote>
<br class="gmail_msg">
</blockquote>
<br class="gmail_msg">
<br class="gmail_msg">
_______________________________________________<br class="gmail_msg">
Halld-offline mailing list<br class="gmail_msg">
<a href="mailto:Halld-offline@jlab.org" class="gmail_msg" target="_blank">Halld-offline@jlab.org</a><br class="gmail_msg">
<a href="https://mailman.jlab.org/mailman/listinfo/halld-offline" class="gmail_msg" target="_blank">https://mailman.jlab.org/mailman/listinfo/halld-offline</a><br class="gmail_msg">
<br class="gmail_msg">
<br class="gmail_msg">
</div>
</div>
</blockquote>
</div>
<br class="gmail_msg">
</div>
</div>
</blockquote></div></div>