[Halld-cpp] CPP REST production

Fri Jun 28 12:15:44 EDT 2024

Excellent. Something must be wrong on my side with the sizes. When I 
list my directory I get this output, which seems consistent with you:

-rw-r--r-- 1 aaustreg halld   43688763 Jun 25 15:59 converted_random.hddm
-rw-r--r-- 1 aaustreg halld 1244538131 Jun 25 15:59 dana_rest.hddm
-rw-r--r-- 1 aaustreg halld   16751296 Jun 25 15:59 
hd_rawdata_101586_000.BCAL-LED.evio
-rw-r--r-- 1 aaustreg halld     322128 Jun 25 15:59 
hd_rawdata_101586_000.CCAL-LED.evio
-rw-r--r-- 1 aaustreg halld  237368676 Jun 25 15:59 
hd_rawdata_101586_000.cpp_2c.evio
-rw-r--r-- 1 aaustreg halld  814874380 Jun 25 15:59 
hd_rawdata_101586_000.ctof.evio
-rw-r--r-- 1 aaustreg halld     322128 Jun 25 15:59 
hd_rawdata_101586_000.DIRC-LED.evio
-rw-r--r-- 1 aaustreg halld   19690620 Jun 25 15:59 
hd_rawdata_101586_000.FCAL-LED.evio
-rw-r--r-- 1 aaustreg halld  110784828 Jun 25 15:59 
hd_rawdata_101586_000.npp_2g.evio
-rw-r--r-- 1 aaustreg halld  132611736 Jun 25 15:59 
hd_rawdata_101586_000.npp_2pi0.evio
-rw-r--r-- 1 aaustreg halld 2804807604 Jun 25 15:59 
hd_rawdata_101586_000.ps.evio
-rw-r--r-- 1 aaustreg halld  188496932 Jun 25 15:59 
hd_rawdata_101586_000.random.evio
-rw-r--r-- 1 aaustreg halld     314564 Jun 25 15:59 
hd_rawdata_101586_000.sync.evio
-rw-r--r-- 1 aaustreg halld   44638598 Jun 25 15:59 hd_root.root
-rw-r--r-- 1 aaustreg halld       1329 Jun 27 14:59 
jana_recon_2022_05_ver01.config
-rw-r--r-- 1 aaustreg halld       1343 Jun 26 15:38 
jana_recon_2022_05_ver01.config~
-rw-r--r-- 1 aaustreg halld       7594 Jun 25 15:59 syncskim.root
-rw-r--r-- 1 aaustreg halld       7370 Jun 25 15:59 
tree_bcal_hadronic_eff.root
-rw-r--r-- 1 aaustreg halld     130996 Jun 25 15:59 
tree_fcal_hadronic_eff.root
-rw-r--r-- 1 aaustreg halld   96654396 Jun 25 15:59 tree_PSFlux.root
-rw-r--r-- 1 aaustreg halld    1063333 Jun 25 15:59 tree_tof_eff.root
-rw-r--r-- 1 aaustreg halld   50428784 Jun 25 15:59 tree_TPOL.root
-rw-r--r-- 1 aaustreg halld       8617 Jun 25 15:59 tree_TS_scaler.root

But when I ask the shell for the size of individual files, I get a 
different result:

ls -s hd_rawdata_101586_000.ps.evio
1743356 hd_rawdata_101586_000.ps.evio

That is super weird, but I don't think any of the lock fixes have 
anything to do with that. It is a difference between disk allocation and 
byte count.

 From my side, I would just update the config file from teh group disk 
to get rid of the ctof and ps skims.

I also noticed that you have tree_sc_eff.root as an output file in the 
launch_nersc.py script, even though it is not produced by the job. Maybe 
you want to update this list.

On 6/28/24 00:09, Igal Jaegle wrote:
> The second attempt worked, the job took slightly more than 3h with 
> 100% success rate but the skims are almost all larger than the one 
> produced locally. So, there is still an issue.
>
> The files are on volatile and will be moved tomorrow on cache
>
> /volatile/halld/offsite_prod/RunPeriod-2022-05/recon/ver99-perl/RUN101586/
>
> tks ig.
>
> s -lrth 
> /volatile/halld/offsite_prod/RunPeriod-2022-05/recon/ver99-perl/RUN101586/FILE000/RUN101586/FILE000/
> total 5.2G
> -rw-r--r-- 1 gxproj4 halld  784 Jun 25 05:42 tree_sc_eff.root
> -rw-r--r-- 1 gxproj4 halld 315K Jun 27 22:13 
> hd_rawdata_101586_000.CCAL-LED.evio
> -rw-r--r-- 1 gxproj4 halld  16M Jun 27 22:13 
> hd_rawdata_101586_000.BCAL-LED.evio
> -rw-r--r-- 1 gxproj4 halld 7.5K Jun 27 22:13 syncskim.root
> -rw-r--r-- 1 gxproj4 halld 174K Jun 27 22:13 job_info_101586_000.tgz
> -rw-r--r-- 1 gxproj4 halld 227M Jun 27 22:14 
> hd_rawdata_101586_000.cpp_2c.evio
> -rw-r--r-- 1 gxproj4 halld 778M Jun 27 22:14 
> hd_rawdata_101586_000.ctof.evio
> -rw-r--r-- 1 gxproj4 halld  42M Jun 27 22:15 
> converted_random_101586_000.hddm
> -rw-r--r-- 1 gxproj4 halld  49M Jun 27 22:15 tree_TPOL.root
> -rw-r--r-- 1 gxproj4 halld 8.5K Jun 27 22:15 tree_TS_scaler.root
> -rw-r--r-- 1 gxproj4 halld 1.2G Jun 27 22:15 dana_rest.hddm
> -rw-r--r-- 1 gxproj4 halld  93M Jun 27 22:15 tree_PSFlux.root
> -rw-r--r-- 1 gxproj4 halld 129K Jun 27 22:15 tree_fcal_hadronic_eff.root
> -rw-r--r-- 1 gxproj4 halld 7.2K Jun 27 22:15 tree_bcal_hadronic_eff.root
> -rw-r--r-- 1 gxproj4 halld  37M Jun 27 22:15 hd_root.root
> -rw-r--r-- 1 gxproj4 halld 308K Jun 27 22:16 
> hd_rawdata_101586_000.sync.evio
> -rw-r--r-- 1 gxproj4 halld 315K Jun 27 22:16 
> hd_rawdata_101586_000.DIRC-LED.evio
> -rw-r--r-- 1 gxproj4 halld 127M Jun 27 22:18 
> hd_rawdata_101586_000.npp_2pi0.evio
> -rw-r--r-- 1 gxproj4 halld 106M Jun 27 22:18 
> hd_rawdata_101586_000.npp_2g.evio
> -rw-r--r-- 1 gxproj4 halld  19M Jun 27 22:18 
> hd_rawdata_101586_000.FCAL-LED.evio
> -rw-r--r-- 1 gxproj4 halld 180M Jun 27 22:21 
> hd_rawdata_101586_000.random.evio
> -rw-r--r-- 1 gxproj4 halld 2.7G Jun 27 22:21 hd_rawdata_101586_000.ps.evio
> -rw-r--r-- 1 gxproj4 halld 1.1M Jun 27 22:27 tree_tof_eff.root
> ------------------------------------------------------------------------
> *From:* Igal Jaegle <ijaegle at jlab.org>
> *Sent:* Wednesday, June 26, 2024 10:38 AM
> *To:* Alexander Austregesilo <aaustreg at jlab.org>; halld-cpp at jlab.org 
> <halld-cpp at jlab.org>
> *Cc:* Naomi Jarvis <nsj at cmu.edu>; Sean Dobbs <sdobbs at jlab.org>
> *Subject:* Re: CPP REST production
> PERLMUTTER is down due to a maintenance day. But tomorrow I can grab 
> the other logs.
>
> tks ig.
> ------------------------------------------------------------------------
> *From:* Alexander Austregesilo <aaustreg at jlab.org>
> *Sent:* Wednesday, June 26, 2024 10:32 AM
> *To:* Igal Jaegle <ijaegle at jlab.org>; halld-cpp at jlab.org 
> <halld-cpp at jlab.org>
> *Cc:* Naomi Jarvis <nsj at cmu.edu>; Sean Dobbs <sdobbs at jlab.org>
> *Subject:* Re: CPP REST production
>
> Hi Igal,
>
> This failure rate is pretty bad. Can you point me to the log files of 
> the failed jobs? The messages you sent concerning root dictionaries 
> and ps_counts_thresholds are not fatal, they will not cause a 
> failures. You should still confirm with Sasha if the 
> ps_counts_thresholds are important.
>
>
> The file /work/halld/home/gxproj4/public/ForAlexAndSean/std.out seems 
> to be cut off. We may want to add this option to the config file to 
> remove the number of processed events:
>
>
> JANA:BATCH_MODE 1
>
>
> You can find my output files here: 
> /work/halld2/home/aaustreg/Analysis/cpp/REST/
>
> I don't have the logs, but all files were closed at exactly the same time.
>
>
> Cheers,
>
> Alex
>
>
>
>
> On 6/26/24 08:59, Igal Jaegle wrote:
>> Alex,
>>
>> Could you provide the path to your output files and most importantly 
>> the logs?
>>
>> The results of the test on NERSC for the same run are as follows
>>
>> 108 evio files were cooked properly out of 126, thus 15% failure rate 
>> which is way too much to proceed with the cooking.
>>
>> tks ig.
>> ------------------------------------------------------------------------
>> *From:* Alexander Austregesilo <aaustreg at jlab.org> 
>> <mailto:aaustreg at jlab.org>
>> *Sent:* Monday, June 24, 2024 6:06 PM
>> *To:* halld-cpp at jlab.org <mailto:halld-cpp at jlab.org> 
>> <halld-cpp at jlab.org> <mailto:halld-cpp at jlab.org>
>> *Cc:* Naomi Jarvis <nsj at cmu.edu> <mailto:nsj at cmu.edu>; Igal Jaegle 
>> <ijaegle at jlab.org> <mailto:ijaegle at jlab.org>; Sean Dobbs 
>> <sdobbs at jlab.org> <mailto:sdobbs at jlab.org>
>> *Subject:* CPP REST production
>> Dear Colleagues,
>>
>> I processed one single file of a typical CPP production run on Pb
>> target. Here is a list of all files and their sizes which will be
>> produced suring the REST prodution:
>>
>> 1.7G hd_rawdata_101586_000.ps.evio
>> 1.2G dana_rest.hddm
>> 495M hd_rawdata_101586_000.ctof.evio
>> 160M hd_rawdata_101586_000.cpp_2c.evio
>> 115M hd_rawdata_101586_000.random.evio
>>   93M tree_PSFlux.root
>>   90M hd_rawdata_101586_000.npp_2pi0.evio
>>   72M hd_rawdata_101586_000.npp_2g.evio
>>   49M tree_TPOL.root
>>   42M converted_random.hddm
>>   41M hd_root.root
>>   16M hd_rawdata_101586_000.FCAL-LED.evio
>>   13M hd_rawdata_101586_000.BCAL-LED.evio
>> 1.1M tree_tof_eff.root
>> 164K tree_fcal_hadronic_eff.root
>> 105K hd_rawdata_101586_000.DIRC-LED.evio
>> 105K hd_rawdata_101586_000.CCAL-LED.evio
>>   94K hd_rawdata_101586_000.sync.evio
>>   24K tree_TS_scaler.root
>>   24K tree_bcal_hadronic_eff.root
>>   24K syncskim.root
>>
>> The trigger skims for ps (with thick converter) and ctof are quite
>> large. Do we actually need them or were they maybe already produced
>> during the calibration stages?
>>
>> As far as I understand, Igal has started a test of the REST production
>> at NERSC. We are getting closer to launch!
>>
>> Cheers,
>>
>> Alex
>>
>>
>>
>>
>>
>> -- 
>> Alexander Austregesilo
>>
>> Staff Scientist - Experimental Nuclear Physics
>> Thomas Jefferson National Accelerator Facility
>> Newport News, VA
>> aaustreg at jlab.org <mailto:aaustreg at jlab.org>
>> (757) 269-6982
>>
> -- 
> Alexander Austregesilo
>
> Staff Scientist - Experimental Nuclear Physics
> Thomas Jefferson National Accelerator Facility
> Newport News, VA
> aaustreg at jlab.org  <mailto:aaustreg at jlab.org>
> (757) 269-6982

-- 
Alexander Austregesilo

Staff Scientist - Experimental Nuclear Physics
Thomas Jefferson National Accelerator Facility
Newport News, VA
aaustreg at jlab.org
(757) 269-6982
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/halld-cpp/attachments/20240628/7a21df6c/attachment-0001.html>