[Clas_offline] gsim_bat does not run in batch farm

maurik maurik at physics.unh.edu
Mon Oct 1 14:00:20 EDT 2018


Hello Jixie,

The information that you provide here isn’t quite sufficient to get a reasonable diagnostic on what is happening, and for some reason the https://userweb.jlab.org/~jixie/gsim_bat_problem.txt <https://userweb.jlab.org/~jixie/gsim_bat_problem.txt> page doesn’t load for me.

I assume that when you run the code in a directory where it fails, the code is able to create files in the local directory (for the login that is running the code)? If not that would be a reason it fails. 

The CTIMEF function call takes an encoded time and returns it as an ASCII string. In the gsimpar_2_bos_ bit of code, it is used to get the creation date of the executable and store this in the “PARAMS” output of the BOS file, so that there is a record of which version of the executable that was used. That is around line 266 in gsimpar_2_bos.F. If it is really just this access that is preventing the code from running properly, you could simply remove those 2 calls and fill in the string by hand on your own copy of the executable and run that way. It is however likely that there is some deeper reason this code is crashing. The big problem with CERNLIB has been that it was created as 32-bit code, and did tricks to make FORTRAN work that were not so compatible with 64-bit code. All those issues should have been resolved, but there is always the possibility that there is some other issue that comes up when it is compiled with a newer compiler. I just don’t know in this case.
 
If you want to/need to dig deeper, perhaps you could create a version of gsim_bat that is compiled with the “-g” compiler switch, but not with any -DDEBUG flags, so that the code retains the names of the functions, but does not add in any additional FORTRAN debug lines. Compilation will be slightly different, but much of the code will be identical. You can go one step further and also compile with -o0, turning off any optimization. Since the crash happens somewhere in CERNLIB, it would be useful to link against a version of CERNLIB that also has some level of debug information turned on. This will then hopefully point to where there may be an issue with the code. 
Another thing you could try is to compile the gsimpar_2_bos.F code with -DEBUG turned on, but not the rest of the code, and see it that works.

Sorry that I cannot be more helpful here. 

Best,
	Maurik




> On Sep 30, 2018, at 5:24 AM, Jixie Zhang <jixie at jlab.org> wrote:
> 
> Hi CLAS members,
>   gsim_bat  give me some trouble that I could not understand. I am wondering
> if any of you have similar experience.  The clas_package version I am using is 
> a pretty old version: release-4-14, but I made necessary modification to have it
> compiled and run in RHEL7.
> 
>   gsim_bat does not run in batch farm since August 2018, before that the same executable
> ran smoothly. 
> gsim_bat encountered with "segmentation fault" when run in batch farm.  
> It runs normally in interactive farm (ifarm1401 and ifarm1402) but only
> in some specified work disk. I recompiled the whole CLAS PACKAGE but this 
> problem stays.
> 
> After careful investigation, I found that gsim_bat can not be run in the following path:
> 
> /work/halla/solid/*
> /work/hallc/sane/*
> /scratch/jixie/*
> /cache/halla/solid/*
> /home/jixie/*
> (I only tested the above disk|drive. the list could be even longer...)
> The error message is:
> --------------error message start---------------------
> Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
> 
> Backtrace for this error:
> #0  0x7FB9ADAC3467
> #1  0x7FB9ADAC3AAE
> #2  0x7FB9ACFCA66F
> #3  0x7FB9AD02A13F
> #4  0x625AD5 in ctimef_
> #5  0x41AA65 in gsimpar_2_bos_
> #6  0x4076CF in uginit_
> #7  0x4058D3 in MAIN__ at gsim_bat.F:?
> Segmentation fault
> ifarm1401.jlab.org <http://ifarm1401.jlab.org/>>
> --------------error message end---------------------
> The error message told that the problem happen at calling ctimef() subroutine, which comes from cernlib.  I do not trust this error message too much.
> 
> However, the same executable can run in 
> /work/clas/claseg4/jixie/*
> /work/halla/g2p/disk1/jixie/*
> 
> Surprisingly, gsim_bat_debug works well in everywhere. gsim_bat_debug was compiled with -DEBUG defined, which means the source code is totally different.
> 
> See more details of my test here: https://userweb.jlab.org/~jixie/gsim_bat_problem.txt <https://userweb.jlab.org/~jixie/gsim_bat_problem.txt>
> 
> 
> -- 
> With Best Regards,
> Jixie Zhang
> 757-269-7735 
> _______________________________________________
> _______________________________________________
> Clas_offline mailing list
> Clas_offline at jlab.org
> https://mailman.jlab.org/mailman/listinfo/clas_offline

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/clas_offline/attachments/20181001/e3a72eda/attachment.html>


More information about the Clas_offline mailing list