<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
Hi All,<br>
<br>
I just committed a change to fix this. Sorry for the confusion over
"readcessed".<br>
<br>
-Dave<br>
<br>
On 3/3/11 8:47 AM, Beni Zihlmann wrote:
<blockquote cite="mid:4D6F9BE7.70602@jlab.org" type="cite">
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
Hi All,<br>
concernint the readcessed word:<br>
mcsmear prints out how many events are processed and the rate but
no carriage return<br>
and overwrites it continuously throughout the run. At the end of
the run<br>
in prints out how many events are prcessed in total overwriting
again the previous messeage<br>
of "blablabla .... processed". That string is longer than "nnnnn
events read" and you see<br>
the last letters "cessed" from the word "processed".<br>
<br>
cheers,<br>
Beni<br>
<br>
<br>
<blockquote cite="mid:4D6F99FF.7010607@uconn.edu" type="cite">Matt,
<br>
<br>
I just repeated a fresh checkout and build on stanley. You can
look for it under ~jonesrt (fresh install of jana, calib,
sim-recon, hdds) using as close to your procedure as I know
how. I do not get any crashing when I run mcsmear on the input
file sim_p_pip_pim_0099.hddm. Here are a couple of logs, one
from stanley and the other from c0-0. I am not claiming that
you are not seeing segfaults on stanley, I just don't know how
to reproduce it. <br>
<br>
-Richard Jones <br>
<br>
PS. Can someone explain the meaning of a mysterious message
printed at the end of each mcsmear run, stating "nnnnnn events
readcessed". readcessed??? <br>
<br>
[jonesrt@stanley gluex.d]$ mcsmear sim_p_pip_pim_0099.hddm <br>
Warning in <TUnixSystem::SetDisplay>: DISPLAY not set,
setting it to gryphn.phys.uconn.edu:0.0 <br>
BCAL values will be smeared <br>
BCAL values will be added <br>
Read 26 values from FDC/drift_smear_parms in calibDB <br>
Columns: h0 h1 h2 m0 m1 m2 s0 s1 s2 <br>
get TOF/tof_parms parameters from calibDB <br>
get BCAL/bcal_parms parameters from calibDB <br>
get FCAL/fcal_parms parameters from calibDB <br>
get CDC/cdc_parms parameters from calibDB <br>
get FDC/fdc_parms parameters from calibDB <br>
get START_COUNTER/start_parms parameters from calibDB <br>
input file: sim_p_pip_pim_0099.hddm <br>
output file: sim_p_pip_pim_0099_smeared.hddm <br>
300 events readcessed <br>
[jonesrt@stanley gluex.d]$ <br>
<br>
[ now I ssh to slave node c0-0 ] <br>
<br>
[jonesrt@compute-0-0 gluex.d]$ mcsmear sim_p_pip_pim_0099.hddm <br>
Warning in <TUnixSystem::SetDisplay>: DISPLAY not set,
setting it to stanley.local:0.0 <br>
BCAL values will be smeared <br>
BCAL values will be added <br>
Read 26 values from FDC/drift_smear_parms in calibDB <br>
Columns: h0 h1 h2 m0 m1 m2 s0 s1 s2 <br>
get TOF/tof_parms parameters from calibDB <br>
get BCAL/bcal_parms parameters from calibDB <br>
get FCAL/fcal_parms parameters from calibDB <br>
get CDC/cdc_parms parameters from calibDB <br>
get FDC/fdc_parms parameters from calibDB <br>
get START_COUNTER/start_parms parameters from calibDB <br>
input file: sim_p_pip_pim_0099.hddm <br>
output file: sim_p_pip_pim_0099_smeared.hddm <br>
300 events readcessed <br>
[jonesrt@compute-0-0 gluex.d]$ <br>
<br>
<br>
<br>
<br>
On 3/2/2011 10:14 PM, Matthew Shepherd wrote: <br>
<blockquote type="cite">Hi Richard, <br>
<br>
Are you implying a mismatch between the capabilities of
stanley and the nodes? I see the failure when running on
stanley. <br>
<br>
Matt <br>
<br>
---- <br>
This message was sent from my iPhone. <br>
<br>
On Mar 2, 2011, at 9:30 PM, Richard Jones<<a
moz-do-not-send="true" class="moz-txt-link-abbreviated"
href="mailto:richard.t.jones@uconn.edu">richard.t.jones@uconn.edu</a><a
moz-do-not-send="true" class="moz-txt-link-rfc2396E"
href="mailto:richard.t.jones@uconn.edu"><mailto:richard.t.jones@uconn.edu></a>>
wrote: <br>
<br>
Matt, <br>
<br>
Having seen and worked through a dozen or so SIMD-related
issues in building this software stack on different hardware
for the grid, I have seen no evidence of any "alignment
problem", as suspected by Simon. At any rate, a detailed
diagnosis is better than a suspicion, so here is a detailed
diagnosis of the problem you are seeing, building on
stanley.physics.indiana.edu<a moz-do-not-send="true"
class="moz-txt-link-rfc2396E"
href="http://stanley.physics.indiana.edu"><http://stanley.physics.indiana.edu></a>
and running on the K7 nodes of the stan cluster. <br>
<br>
Stanley.physics.indiana.edu<a moz-do-not-send="true"
class="moz-txt-link-rfc2396E"
href="http://Stanley.physics.indiana.edu"><http://Stanley.physics.indiana.edu></a>
is: <br>
<br>
* dual 4-core Intel(R) Xeon(R) CPU X5482 @ 3.20GHz <br>
* x86_64 instruction set, 64-bit architecture <br>
* running a 32-bit kernel (2.6.18-194.32.1.el5PAE) <br>
* supports SIMD extensions: mmx sse sse2 sse3 ssse3
sse4_1 <br>
<br>
worker nodes on the stan cluster are: <br>
<br>
* dual single-core AMD K7 athlon CPUs @ 1667MHz <br>
* i686 instruction set, 32-bit architecture <br>
* running a 32-bit kernel (2.6.18-194.32.1.el5) <br>
* supports SIMD extensions: mmx sse mmxext <br>
<br>
In case you would like to verify, I have attached a miniature
c++ program that queries the processor for all of the common
SIMD extensions that it supports. You can compile and run
this on any node, and verify the kinds of SIMD extensions that
it can execute. <br>
<br>
This Stanley is a bit of an odd-ball: a 64-bit processor
running a 32-bit OS. What happens during the build is that
the Makefile.SIMD tries to discover what SIMD support is
present in the hardware. You are building on Stanley, so it
queries the processor on Stanley, and finds it supports sse,
sse2, ssse3, and sse4_1. After that, it builds an executable
that exploits all of these features, and that is what you want
-- if you run on Stanley. If you look in the build logs, you
should see the gcc/g++ flags "-mfpmath=sse -msse -DUSE_SSE2
-msse2" which enables both sse and sse2 instructions. That
code runs fine on stanley, but try to run it on c0-0, and
bang, the parts of the code that try to use sse2 extensions
are going to hit a wall. The xmm registers used by the sse2
extensions are 128 bits, compared with the 64-bit registers
used by the sse extensions, which leads to the segfault you
are seeing. <br>
<br>
The immediate solution for you on the stanley cluster is to
redo the build (make clean;make) on one of the worker nodes.
Then it will recognize that sse2 is not supported, and set the
flags to "-mfpmath=sse -msse -mno-sse2", so sse math will be
used (consistent answers) but not the sse2 extensions (no
segfaults). This code will run both on the K7 nodes and on
the head node, and will give answers that are consistent with
running full Simon-supercharged code on a 64-bit node, but
without the super-charged performance. <br>
<br>
For the future, we should have a run-time-startup check in our
applications that verifies that the options used during the
build are supported by the cpu running the code. This is easy
to do, and for code that uses DANA, it is now a part of the
Init() method of the DApplication class -- I added it last
week. It would be trivial to copy that code into the main()
of non-DANA apps like mcsmear and hdgeant. I would support
that, but will hold back on checking in more SIMD-related
changes until some of this dust settles and we know that
things are under control. <br>
<br>
-Richard J. <br>
<br>
<br>
<br>
<br>
<br>
On 3/2/2011 6:08 PM, Simon Taylor wrote: <br>
<br>
Hi. <br>
<br>
Our suspicion is that there is an alignment problem on 32-bit
systems <br>
with regard to the SIMD instructions; Dave looked into this
some time <br>
ago and it is not clear to us how to fix it. <br>
<br>
I've checked in a change to Makefile.SIMD that changes the
default from <br>
"SIMD on" to "SIMD off". To get the SIMD instructions, one
would <br>
now need to do "make ENABLE_SIMD=yes". <br>
<br>
Simon <br>
<br>
Matthew Shepherd wrote: <br>
<br>
<br>
Hi all, <br>
<br>
It seems that the BMS system doesn't properly understand our
SIMD capabilities on the machines here at Indiana. If we do a
default build, then we get a segmentation fault at the first
DVector2 operation. If we build with DISABLE_SIMD=1 then this
segfault is avoided. <br>
<br>
This seems to point to two causes: <br>
<br>
(1) there is a bug in the SIMD implementation of DVector2 <br>
(2) our machines are not capable of handling current SIMD code
<br>
<br>
(1) seems unlikely since other people are using the code.
Assuming it is (2), how do we properly diagnose and fix it? <br>
<br>
-Matt <br>
<br>
<br>
<br>
_______________________________________________ <br>
Halld-offline mailing list <br>
<a moz-do-not-send="true" class="moz-txt-link-rfc2396E"
href="mailto:Halld-offline@jlab.org"><mailto:Halld-offline@jlab.org></a><a
moz-do-not-send="true" class="moz-txt-link-abbreviated"
href="mailto:Halld-offline@jlab.org">Halld-offline@jlab.org</a><a
moz-do-not-send="true" class="moz-txt-link-rfc2396E"
href="mailto:Halld-offline@jlab.org"><mailto:Halld-offline@jlab.org></a>
<br>
<a moz-do-not-send="true" class="moz-txt-link-rfc2396E"
href="https://mailman.jlab.org/mailman/listinfo/halld-offline"><https://mailman.jlab.org/mailman/listinfo/halld-offline></a><a
moz-do-not-send="true" class="moz-txt-link-freetext"
href="https://mailman.jlab.org/mailman/listinfo/halld-offline">https://mailman.jlab.org/mailman/listinfo/halld-offline</a>
<br>
<br>
<br>
<br>
_______________________________________________ <br>
Halld-offline mailing list <br>
<a moz-do-not-send="true" class="moz-txt-link-rfc2396E"
href="mailto:Halld-offline@jlab.org"><mailto:Halld-offline@jlab.org></a><a
moz-do-not-send="true" class="moz-txt-link-abbreviated"
href="mailto:Halld-offline@jlab.org">Halld-offline@jlab.org</a><a
moz-do-not-send="true" class="moz-txt-link-rfc2396E"
href="mailto:Halld-offline@jlab.org"><mailto:Halld-offline@jlab.org></a>
<br>
<a moz-do-not-send="true" class="moz-txt-link-rfc2396E"
href="https://mailman.jlab.org/mailman/listinfo/halld-offline"><https://mailman.jlab.org/mailman/listinfo/halld-offline></a><a
moz-do-not-send="true" class="moz-txt-link-freetext"
href="https://mailman.jlab.org/mailman/listinfo/halld-offline">https://mailman.jlab.org/mailman/listinfo/halld-offline</a>
<br>
<br>
<br>
<cpuid.cc> <br>
_______________________________________________ <br>
Halld-offline mailing list <br>
<a moz-do-not-send="true" class="moz-txt-link-abbreviated"
href="mailto:Halld-offline@jlab.org">Halld-offline@jlab.org</a><a
moz-do-not-send="true" class="moz-txt-link-rfc2396E"
href="mailto:Halld-offline@jlab.org"><mailto:Halld-offline@jlab.org></a>
<br>
<a moz-do-not-send="true" class="moz-txt-link-freetext"
href="https://mailman.jlab.org/mailman/listinfo/halld-offline">https://mailman.jlab.org/mailman/listinfo/halld-offline</a>
<br>
</blockquote>
<br>
<br>
<pre wrap=""><fieldset class="mimeAttachmentHeader"></fieldset>
_______________________________________________
Halld-offline mailing list
<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:Halld-offline@jlab.org">Halld-offline@jlab.org</a>
<a moz-do-not-send="true" class="moz-txt-link-freetext" href="https://mailman.jlab.org/mailman/listinfo/halld-offline">https://mailman.jlab.org/mailman/listinfo/halld-offline</a></pre>
</blockquote>
<br>
<pre wrap="">
<fieldset class="mimeAttachmentHeader"></fieldset>
_______________________________________________
Halld-offline mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Halld-offline@jlab.org">Halld-offline@jlab.org</a>
<a class="moz-txt-link-freetext" href="https://mailman.jlab.org/mailman/listinfo/halld-offline">https://mailman.jlab.org/mailman/listinfo/halld-offline</a></pre>
</blockquote>
</body>
</html>