[Clas12_software] Meeting today: Thursday, April 18, 2013
William Phelps
wphelps at jlab.org
Thu Apr 18 00:00:11 EDT 2013
Hello Everyone,
This is a reminder that there is a CLAS Offline Meeting in room F224/225 (Subject to change, Check the Front Desk Display!) today at 09:00 am EST.
http://clasweb.jlab.org/wiki/index.php/Clas12_software_meetings_2013.04
-Will
Latest information from the wiki:
Agenda April 11, 2013
-(Johann) BOS and 64bit compatibility
-It took more than a week to get a working version of gsim_bat to Silvia (eg1dvcs experiment)
-Much of the problem was with the 32/64bit change over and the volatile nature of the software in /site and /apps in conjunction with changing operating systems in tandem with the fact that ifarm1101 and ifarm1102 are running different versions of CentOS and therefore have different directories mounted in /apps
-So really, there are two major issues which need to be solved:
-CERN libraries have "problems" when used as shared libraries (.so files). They do seem to work well when used as static libraries (.a files)
-The BOSIO library has many int->pointer->int conversions which can cause crashes if the pointer is at a location in memory larger than the int can handle. This problem is compounded because the ints are passed into and out of fortran code where controlling the size of integers can be done but is non-trivial.
-Paul Mattione has done some investigation into BOS and the int->pointer->int conversions that take place therein. Here is an edited version of the emails Paul and I have exchanged in the first week of April, 2013:
-From Paul Mattione to Johann Goetz:
It is necessary to do a 32-bit compilation of the CLAS software
in order to analyze CLAS BOS data in a multithreaded program (JANA).
There are a ton of integer-to-pointer conversions integrated into
the heart of the CLAS software (for reading the BOS experimental
data) that would be a nightmare for me to fix. This happens to
not be a problem for most people because objects in single-threaded
code are (almost) always allocated in the address space that 32-bit
pointers can reach. However, since JANA is multithreaded it often
creates objects in memory where full 64-bit pointers are needed,
crashing the program during the ensuing pointer -> int -> pointer
conversions.
Until and unless the CLAS software group is willing to undertake
the task of making the BOS IO code 64-bit compliant, I would
greatly appreciate the Jefferson Lab computer center continuing to
support 32-bit builds of software. JANA is a very powerful physics
analysis framework, and I would hate to be restricted from using it
due to these legacy-software issues.
- Paul
-From Dave Lawrence to Paul:
Hi Paul,
I have a working theory on what is causing the seg faults with reading
BOS IO from JANA. In fpack.c there is a routine FParm which has two
variables:
int descriptor;
BOSIOptr LUN;
The first is a 4-byte number while the second is an 8-byte address.
Down around line 340, the first is set equal to the second:
descriptor = (int) LUN;
In an instance I ran that resulted in a segfault, the values ended up
being:
LUN = 0x7ff3cc000ef0
descriptor = cc000ef0
The address of "descriptor" is then passed into bosOPEN (defined in
bosopen.c). At around line 280, it casts the value of descriptor as
a BOSIOptr:
Blun = (BOSIOptr) *descriptor;
and the value of Blun ends up being: 0xffffffffcc000ef0 It then tries
accessing a member of the BOSIO structure a couple of lines later:
reopen = Blun- reopen;
This is where the seg. fault happens. It is because the high-order
4 bytes of the address have been stripped off making the value of
Blun bogus. The behavior was different in different threads (or other
circumstances) because the original structure was allocated at an address
whose value fit in a 32-bit number.
Unfortunately, I'm not sure if there is an easy way to fix this since I
would assume this might show up in any number of places in the BOS code.
I also thought that this was working on 64bit machines, for CLAS (???) It's
also possible that this is not what is causing *your* crashes, but it seems
like it is consistent with what you described.
Regards,
-Dave
-From Johann to Paul:
I wonder if changing it to "size_t descriptor" would solve this
specific problem. I might be able to write a script to search for
casts from a pointer to an int (or maybe there is a script somewhere
on the internetweb we could use...)
-From Paul to Johann:
size_t looks like it would effectively work (be pretty rare if it
didn't):
http://en.cppreference.com/w/cpp/types/size_t
You may want to define your own type in bos.h or somewhere that's
called "pointer_type," default it to "unsigned long long," and use
that one. That way if we ever need to change it for some god-awful
reason, we just change it in one spot.
Of course, if there is no pointer math, then you should never even
need to convert the pointers to int in the first place. Or just
convert them to void* everywhere.
-From Paul to Johann:
The pointers are converted to int so that they can be passed into
and out-of the fortran routines. Now you're talking about 64-bit
fortran integers all throughout the fortran code (wherever mbank,
etc. is called). Not exactly a fun problem to fix...
However, I just now got my 32-bit code built and running on centos62.
I had to have the computer center do a special build of cernlib with
gcc 4.7.2 for me. They couldn't compile it 32-bits directly on ifarm,
so they had to build it on cni-rhel6, which is a 32 bit Centos 6.4
machine. In case anyone else is crazy enough to need it, they installed
it to:
/site/cernlib/i386_rhel6_4.7.2/2005
So I think getting BOS to work long-term on 64-bit is going to require a
summer student or something. Unless you or someone else can afford to
lose a few weeks to sift through all of the ugly fortran & bosio code...
<!--
NewPP limit report
Preprocessor visited node count: 2/1000000
Preprocessor generated node count: 8/1000000
Post-expand include size: 0/2097152 bytes
Template argument size: 0/2097152 bytes
Highest expansion depth: 2/40
Expensive parser function count: 0/100
-->
To begin your audio conference:
1. Dial Toll-Free Number: 866-740-1260 (U.S. & Canada)
2. International participants dial:
Toll Number: 303-248-0285
Or International Toll-Free Number: http://www.readytalk.com/intl
3. Enter the 7-digit access code 7911212, followed by "#"
For more information on ReadyTalk, visit http://www.ecs.es.net, call 800-333-7638
More information about the Clas12_software
mailing list