<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
Hi All,<br>
concernint the readcessed word:<br>
mcsmear prints out how many events are processed and the rate but no
carriage return<br>
and overwrites it continuously throughout the run. At the end of the
run<br>
in prints out how many events are prcessed in total overwriting
again the previous messeage<br>
of "blablabla .... processed". That string is longer than "nnnnn
events read" and you see<br>
the last letters "cessed" from the word "processed".<br>
<br>
cheers,<br>
Beni<br>
<br>
<br>
<blockquote cite="mid:4D6F99FF.7010607@uconn.edu" type="cite">Matt,
<br>
<br>
I just repeated a fresh checkout and build on stanley. You can
look for it under ~jonesrt (fresh install of jana, calib,
sim-recon, hdds) using as close to your procedure as I know how.
I do not get any crashing when I run mcsmear on the input file
sim_p_pip_pim_0099.hddm. Here are a couple of logs, one from
stanley and the other from c0-0. I am not claiming that you are
not seeing segfaults on stanley, I just don't know how to
reproduce it.
<br>
<br>
-Richard Jones
<br>
<br>
PS. Can someone explain the meaning of a mysterious message
printed at the end of each mcsmear run, stating "nnnnnn events
readcessed". readcessed???
<br>
<br>
[jonesrt@stanley gluex.d]$ mcsmear sim_p_pip_pim_0099.hddm
<br>
Warning in <TUnixSystem::SetDisplay>: DISPLAY not set,
setting it to gryphn.phys.uconn.edu:0.0
<br>
BCAL values will be smeared
<br>
BCAL values will be added
<br>
Read 26 values from FDC/drift_smear_parms in calibDB
<br>
Columns: h0 h1 h2 m0 m1 m2 s0 s1 s2
<br>
get TOF/tof_parms parameters from calibDB
<br>
get BCAL/bcal_parms parameters from calibDB
<br>
get FCAL/fcal_parms parameters from calibDB
<br>
get CDC/cdc_parms parameters from calibDB
<br>
get FDC/fdc_parms parameters from calibDB
<br>
get START_COUNTER/start_parms parameters from calibDB
<br>
input file: sim_p_pip_pim_0099.hddm
<br>
output file: sim_p_pip_pim_0099_smeared.hddm
<br>
300 events readcessed
<br>
[jonesrt@stanley gluex.d]$
<br>
<br>
[ now I ssh to slave node c0-0 ]
<br>
<br>
[jonesrt@compute-0-0 gluex.d]$ mcsmear sim_p_pip_pim_0099.hddm
<br>
Warning in <TUnixSystem::SetDisplay>: DISPLAY not set,
setting it to stanley.local:0.0
<br>
BCAL values will be smeared
<br>
BCAL values will be added
<br>
Read 26 values from FDC/drift_smear_parms in calibDB
<br>
Columns: h0 h1 h2 m0 m1 m2 s0 s1 s2
<br>
get TOF/tof_parms parameters from calibDB
<br>
get BCAL/bcal_parms parameters from calibDB
<br>
get FCAL/fcal_parms parameters from calibDB
<br>
get CDC/cdc_parms parameters from calibDB
<br>
get FDC/fdc_parms parameters from calibDB
<br>
get START_COUNTER/start_parms parameters from calibDB
<br>
input file: sim_p_pip_pim_0099.hddm
<br>
output file: sim_p_pip_pim_0099_smeared.hddm
<br>
300 events readcessed
<br>
[jonesrt@compute-0-0 gluex.d]$
<br>
<br>
<br>
<br>
<br>
On 3/2/2011 10:14 PM, Matthew Shepherd wrote:
<br>
<blockquote type="cite">Hi Richard,
<br>
<br>
Are you implying a mismatch between the capabilities of stanley
and the nodes? I see the failure when running on stanley.
<br>
<br>
Matt
<br>
<br>
----
<br>
This message was sent from my iPhone.
<br>
<br>
On Mar 2, 2011, at 9:30 PM, Richard
Jones<<a class="moz-txt-link-abbreviated" href="mailto:richard.t.jones@uconn.edu">richard.t.jones@uconn.edu</a><a class="moz-txt-link-rfc2396E" href="mailto:richard.t.jones@uconn.edu"><mailto:richard.t.jones@uconn.edu></a>>
wrote:
<br>
<br>
Matt,
<br>
<br>
Having seen and worked through a dozen or so SIMD-related issues
in building this software stack on different hardware for the
grid, I have seen no evidence of any "alignment problem", as
suspected by Simon. At any rate, a detailed diagnosis is better
than a suspicion, so here is a detailed diagnosis of the problem
you are seeing, building on
stanley.physics.indiana.edu<a class="moz-txt-link-rfc2396E" href="http://stanley.physics.indiana.edu"><http://stanley.physics.indiana.edu></a>
and running on the K7 nodes of the stan cluster.
<br>
<br>
Stanley.physics.indiana.edu<a class="moz-txt-link-rfc2396E" href="http://Stanley.physics.indiana.edu"><http://Stanley.physics.indiana.edu></a>
is:
<br>
<br>
* dual 4-core Intel(R) Xeon(R) CPU X5482 @ 3.20GHz
<br>
* x86_64 instruction set, 64-bit architecture
<br>
* running a 32-bit kernel (2.6.18-194.32.1.el5PAE)
<br>
* supports SIMD extensions: mmx sse sse2 sse3 ssse3 sse4_1
<br>
<br>
worker nodes on the stan cluster are:
<br>
<br>
* dual single-core AMD K7 athlon CPUs @ 1667MHz
<br>
* i686 instruction set, 32-bit architecture
<br>
* running a 32-bit kernel (2.6.18-194.32.1.el5)
<br>
* supports SIMD extensions: mmx sse mmxext
<br>
<br>
In case you would like to verify, I have attached a miniature
c++ program that queries the processor for all of the common
SIMD extensions that it supports. You can compile and run this
on any node, and verify the kinds of SIMD extensions that it can
execute.
<br>
<br>
This Stanley is a bit of an odd-ball: a 64-bit processor running
a 32-bit OS. What happens during the build is that the
Makefile.SIMD tries to discover what SIMD support is present in
the hardware. You are building on Stanley, so it queries the
processor on Stanley, and finds it supports sse, sse2, ssse3,
and sse4_1. After that, it builds an executable that exploits
all of these features, and that is what you want -- if you run
on Stanley. If you look in the build logs, you should see the
gcc/g++ flags "-mfpmath=sse -msse -DUSE_SSE2 -msse2" which
enables both sse and sse2 instructions. That code runs fine on
stanley, but try to run it on c0-0, and bang, the parts of the
code that try to use sse2 extensions are going to hit a wall.
The xmm registers used by the sse2 extensions are 128 bits,
compared with the 64-bit registers used by the sse extensions,
which leads to the segfault you are seeing.
<br>
<br>
The immediate solution for you on the stanley cluster is to redo
the build (make clean;make) on one of the worker nodes. Then it
will recognize that sse2 is not supported, and set the flags to
"-mfpmath=sse -msse -mno-sse2", so sse math will be used
(consistent answers) but not the sse2 extensions (no
segfaults). This code will run both on the K7 nodes and on the
head node, and will give answers that are consistent with
running full Simon-supercharged code on a 64-bit node, but
without the super-charged performance.
<br>
<br>
For the future, we should have a run-time-startup check in our
applications that verifies that the options used during the
build are supported by the cpu running the code. This is easy
to do, and for code that uses DANA, it is now a part of the
Init() method of the DApplication class -- I added it last
week. It would be trivial to copy that code into the main() of
non-DANA apps like mcsmear and hdgeant. I would support that,
but will hold back on checking in more SIMD-related changes
until some of this dust settles and we know that things are
under control.
<br>
<br>
-Richard J.
<br>
<br>
<br>
<br>
<br>
<br>
On 3/2/2011 6:08 PM, Simon Taylor wrote:
<br>
<br>
Hi.
<br>
<br>
Our suspicion is that there is an alignment problem on 32-bit
systems
<br>
with regard to the SIMD instructions; Dave looked into this
some time
<br>
ago and it is not clear to us how to fix it.
<br>
<br>
I've checked in a change to Makefile.SIMD that changes the
default from
<br>
"SIMD on" to "SIMD off". To get the SIMD instructions, one
would
<br>
now need to do "make ENABLE_SIMD=yes".
<br>
<br>
Simon
<br>
<br>
Matthew Shepherd wrote:
<br>
<br>
<br>
Hi all,
<br>
<br>
It seems that the BMS system doesn't properly understand our
SIMD capabilities on the machines here at Indiana. If we do a
default build, then we get a segmentation fault at the first
DVector2 operation. If we build with DISABLE_SIMD=1 then this
segfault is avoided.
<br>
<br>
This seems to point to two causes:
<br>
<br>
(1) there is a bug in the SIMD implementation of DVector2
<br>
(2) our machines are not capable of handling current SIMD code
<br>
<br>
(1) seems unlikely since other people are using the code.
Assuming it is (2), how do we properly diagnose and fix it?
<br>
<br>
-Matt
<br>
<br>
<br>
<br>
_______________________________________________
<br>
Halld-offline mailing list
<br>
<a class="moz-txt-link-rfc2396E" href="mailto:Halld-offline@jlab.org"><mailto:Halld-offline@jlab.org></a><a class="moz-txt-link-abbreviated" href="mailto:Halld-offline@jlab.org">Halld-offline@jlab.org</a><a class="moz-txt-link-rfc2396E" href="mailto:Halld-offline@jlab.org"><mailto:Halld-offline@jlab.org></a>
<br>
<a class="moz-txt-link-rfc2396E" href="https://mailman.jlab.org/mailman/listinfo/halld-offline"><https://mailman.jlab.org/mailman/listinfo/halld-offline></a><a class="moz-txt-link-freetext" href="https://mailman.jlab.org/mailman/listinfo/halld-offline">https://mailman.jlab.org/mailman/listinfo/halld-offline</a>
<br>
<br>
<br>
<br>
_______________________________________________
<br>
Halld-offline mailing list
<br>
<a class="moz-txt-link-rfc2396E" href="mailto:Halld-offline@jlab.org"><mailto:Halld-offline@jlab.org></a><a class="moz-txt-link-abbreviated" href="mailto:Halld-offline@jlab.org">Halld-offline@jlab.org</a><a class="moz-txt-link-rfc2396E" href="mailto:Halld-offline@jlab.org"><mailto:Halld-offline@jlab.org></a>
<br>
<a class="moz-txt-link-rfc2396E" href="https://mailman.jlab.org/mailman/listinfo/halld-offline"><https://mailman.jlab.org/mailman/listinfo/halld-offline></a><a class="moz-txt-link-freetext" href="https://mailman.jlab.org/mailman/listinfo/halld-offline">https://mailman.jlab.org/mailman/listinfo/halld-offline</a>
<br>
<br>
<br>
<cpuid.cc>
<br>
_______________________________________________
<br>
Halld-offline mailing list
<br>
<a class="moz-txt-link-abbreviated" href="mailto:Halld-offline@jlab.org">Halld-offline@jlab.org</a><a class="moz-txt-link-rfc2396E" href="mailto:Halld-offline@jlab.org"><mailto:Halld-offline@jlab.org></a>
<br>
<a class="moz-txt-link-freetext" href="https://mailman.jlab.org/mailman/listinfo/halld-offline">https://mailman.jlab.org/mailman/listinfo/halld-offline</a>
<br>
</blockquote>
<br>
<br>
<pre wrap="">
<fieldset class="mimeAttachmentHeader"></fieldset>
_______________________________________________
Halld-offline mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Halld-offline@jlab.org">Halld-offline@jlab.org</a>
<a class="moz-txt-link-freetext" href="https://mailman.jlab.org/mailman/listinfo/halld-offline">https://mailman.jlab.org/mailman/listinfo/halld-offline</a></pre>
</blockquote>
<br>
</body>
</html>