[Halld-cpp] CPP REST production

Fri Jun 28 13:10:45 EDT 2024

Maybe you can run another quick test with the updated config file and 
launch_nersc.py script over the weekend?

I may have a new release on Monday.

On 6/28/24 13:04, Igal Jaegle wrote:
> No problem from my side. Just let me know when they are loaded so that 
> I can start the production a day or two later.
>
> tks ig.
>
> ------------------------------------------------------------------------
> *From:* Ilya Larin <ilarin at jlab.org>
> *Sent:* Friday, June 28, 2024 1:01 PM
> *To:* Alexander Austregesilo <aaustreg at jlab.org>; halld-cpp at jlab.org 
> <halld-cpp at jlab.org>; Igal Jaegle <ijaegle at jlab.org>
> *Cc:* Sean Dobbs <sdobbs at jlab.org>
> *Subject:* Re: CPP REST production
>
> I have few more tables for dead counters if you think it would be good 
> to update.
>
>  Ilya
>
> ------------------------------------------------------------------------
> *From:* Halld-cpp <halld-cpp-bounces at jlab.org> on behalf of Igal 
> Jaegle via Halld-cpp <halld-cpp at jlab.org>
> *Sent:* Friday, June 28, 2024 13:00
> *To:* Alexander Austregesilo <aaustreg at jlab.org>; halld-cpp at jlab.org 
> <halld-cpp at jlab.org>
> *Cc:* Sean Dobbs <sdobbs at jlab.org>
> *Subject:* Re: [Halld-cpp] CPP REST production
> ok so I have the green light to start the production - ctof & ps evio 
> skims?
>
> tks ig.
> ------------------------------------------------------------------------
> *From:* Alexander Austregesilo <aaustreg at jlab.org>
> *Sent:* Friday, June 28, 2024 12:15 PM
> *To:* Igal Jaegle <ijaegle at jlab.org>; halld-cpp at jlab.org 
> <halld-cpp at jlab.org>
> *Cc:* Naomi Jarvis <nsj at cmu.edu>; Sean Dobbs <sdobbs at jlab.org>
> *Subject:* Re: CPP REST production
>
> Excellent. Something must be wrong on my side with the sizes. When I 
> list my directory I get this output, which seems consistent with you:
>
>
> -rw-r--r-- 1 aaustreg halld   43688763 Jun 25 15:59 converted_random.hddm
> -rw-r--r-- 1 aaustreg halld 1244538131 Jun 25 15:59 dana_rest.hddm
> -rw-r--r-- 1 aaustreg halld   16751296 Jun 25 15:59 
> hd_rawdata_101586_000.BCAL-LED.evio
> -rw-r--r-- 1 aaustreg halld     322128 Jun 25 15:59 
> hd_rawdata_101586_000.CCAL-LED.evio
> -rw-r--r-- 1 aaustreg halld  237368676 Jun 25 15:59 
> hd_rawdata_101586_000.cpp_2c.evio
> -rw-r--r-- 1 aaustreg halld  814874380 Jun 25 15:59 
> hd_rawdata_101586_000.ctof.evio
> -rw-r--r-- 1 aaustreg halld     322128 Jun 25 15:59 
> hd_rawdata_101586_000.DIRC-LED.evio
> -rw-r--r-- 1 aaustreg halld   19690620 Jun 25 15:59 
> hd_rawdata_101586_000.FCAL-LED.evio
> -rw-r--r-- 1 aaustreg halld  110784828 Jun 25 15:59 
> hd_rawdata_101586_000.npp_2g.evio
> -rw-r--r-- 1 aaustreg halld  132611736 Jun 25 15:59 
> hd_rawdata_101586_000.npp_2pi0.evio
> -rw-r--r-- 1 aaustreg halld 2804807604 Jun 25 15:59 
> hd_rawdata_101586_000.ps.evio
> -rw-r--r-- 1 aaustreg halld  188496932 Jun 25 15:59 
> hd_rawdata_101586_000.random.evio
> -rw-r--r-- 1 aaustreg halld     314564 Jun 25 15:59 
> hd_rawdata_101586_000.sync.evio
> -rw-r--r-- 1 aaustreg halld   44638598 Jun 25 15:59 hd_root.root
> -rw-r--r-- 1 aaustreg halld       1329 Jun 27 14:59 
> jana_recon_2022_05_ver01.config
> -rw-r--r-- 1 aaustreg halld       1343 Jun 26 15:38 
> jana_recon_2022_05_ver01.config~
> -rw-r--r-- 1 aaustreg halld       7594 Jun 25 15:59 syncskim.root
> -rw-r--r-- 1 aaustreg halld       7370 Jun 25 15:59 
> tree_bcal_hadronic_eff.root
> -rw-r--r-- 1 aaustreg halld     130996 Jun 25 15:59 
> tree_fcal_hadronic_eff.root
> -rw-r--r-- 1 aaustreg halld   96654396 Jun 25 15:59 tree_PSFlux.root
> -rw-r--r-- 1 aaustreg halld    1063333 Jun 25 15:59 tree_tof_eff.root
> -rw-r--r-- 1 aaustreg halld   50428784 Jun 25 15:59 tree_TPOL.root
> -rw-r--r-- 1 aaustreg halld       8617 Jun 25 15:59 tree_TS_scaler.root
>
>
> But when I ask the shell for the size of individual files, I get a 
> different result:
>
>
> ls -s hd_rawdata_101586_000.ps.evio
> 1743356 hd_rawdata_101586_000.ps.evio
>
>
> That is super weird, but I don't think any of the lock fixes have 
> anything to do with that. It is a difference between disk allocation 
> and byte count.
>
>
> From my side, I would just update the config file from teh group disk 
> to get rid of the ctof and ps skims.
>
>
> I also noticed that you have tree_sc_eff.root as an output file in the 
> launch_nersc.py script, even though it is not produced by the job. 
> Maybe you want to update this list.
>
>
>
> On 6/28/24 00:09, Igal Jaegle wrote:
>> The second attempt worked, the job took slightly more than 3h with 
>> 100% success rate but the skims are almost all larger than the one 
>> produced locally. So, there is still an issue.
>>
>> The files are on volatile and will be moved tomorrow on cache
>>
>> /volatile/halld/offsite_prod/RunPeriod-2022-05/recon/ver99-perl/RUN101586/
>>
>> tks ig.
>>
>> s -lrth 
>> /volatile/halld/offsite_prod/RunPeriod-2022-05/recon/ver99-perl/RUN101586/FILE000/RUN101586/FILE000/
>> total 5.2G
>> -rw-r--r-- 1 gxproj4 halld  784 Jun 25 05:42 tree_sc_eff.root
>> -rw-r--r-- 1 gxproj4 halld 315K Jun 27 22:13 
>> hd_rawdata_101586_000.CCAL-LED.evio
>> -rw-r--r-- 1 gxproj4 halld  16M Jun 27 22:13 
>> hd_rawdata_101586_000.BCAL-LED.evio
>> -rw-r--r-- 1 gxproj4 halld 7.5K Jun 27 22:13 syncskim.root
>> -rw-r--r-- 1 gxproj4 halld 174K Jun 27 22:13 job_info_101586_000.tgz
>> -rw-r--r-- 1 gxproj4 halld 227M Jun 27 22:14 
>> hd_rawdata_101586_000.cpp_2c.evio
>> -rw-r--r-- 1 gxproj4 halld 778M Jun 27 22:14 
>> hd_rawdata_101586_000.ctof.evio
>> -rw-r--r-- 1 gxproj4 halld  42M Jun 27 22:15 
>> converted_random_101586_000.hddm
>> -rw-r--r-- 1 gxproj4 halld  49M Jun 27 22:15 tree_TPOL.root
>> -rw-r--r-- 1 gxproj4 halld 8.5K Jun 27 22:15 tree_TS_scaler.root
>> -rw-r--r-- 1 gxproj4 halld 1.2G Jun 27 22:15 dana_rest.hddm
>> -rw-r--r-- 1 gxproj4 halld  93M Jun 27 22:15 tree_PSFlux.root
>> -rw-r--r-- 1 gxproj4 halld 129K Jun 27 22:15 tree_fcal_hadronic_eff.root
>> -rw-r--r-- 1 gxproj4 halld 7.2K Jun 27 22:15 tree_bcal_hadronic_eff.root
>> -rw-r--r-- 1 gxproj4 halld  37M Jun 27 22:15 hd_root.root
>> -rw-r--r-- 1 gxproj4 halld 308K Jun 27 22:16 
>> hd_rawdata_101586_000.sync.evio
>> -rw-r--r-- 1 gxproj4 halld 315K Jun 27 22:16 
>> hd_rawdata_101586_000.DIRC-LED.evio
>> -rw-r--r-- 1 gxproj4 halld 127M Jun 27 22:18 
>> hd_rawdata_101586_000.npp_2pi0.evio
>> -rw-r--r-- 1 gxproj4 halld 106M Jun 27 22:18 
>> hd_rawdata_101586_000.npp_2g.evio
>> -rw-r--r-- 1 gxproj4 halld  19M Jun 27 22:18 
>> hd_rawdata_101586_000.FCAL-LED.evio
>> -rw-r--r-- 1 gxproj4 halld 180M Jun 27 22:21 
>> hd_rawdata_101586_000.random.evio
>> -rw-r--r-- 1 gxproj4 halld 2.7G Jun 27 22:21 
>> hd_rawdata_101586_000.ps.evio
>> -rw-r--r-- 1 gxproj4 halld 1.1M Jun 27 22:27 tree_tof_eff.root
>> ------------------------------------------------------------------------
>> *From:* Igal Jaegle <ijaegle at jlab.org> <mailto:ijaegle at jlab.org>
>> *Sent:* Wednesday, June 26, 2024 10:38 AM
>> *To:* Alexander Austregesilo <aaustreg at jlab.org> 
>> <mailto:aaustreg at jlab.org>; halld-cpp at jlab.org 
>> <mailto:halld-cpp at jlab.org> <halld-cpp at jlab.org> 
>> <mailto:halld-cpp at jlab.org>
>> *Cc:* Naomi Jarvis <nsj at cmu.edu> <mailto:nsj at cmu.edu>; Sean Dobbs 
>> <sdobbs at jlab.org> <mailto:sdobbs at jlab.org>
>> *Subject:* Re: CPP REST production
>> PERLMUTTER is down due to a maintenance day. But tomorrow I can grab 
>> the other logs.
>>
>> tks ig.
>> ------------------------------------------------------------------------
>> *From:* Alexander Austregesilo <aaustreg at jlab.org> 
>> <mailto:aaustreg at jlab.org>
>> *Sent:* Wednesday, June 26, 2024 10:32 AM
>> *To:* Igal Jaegle <ijaegle at jlab.org> <mailto:ijaegle at jlab.org>; 
>> halld-cpp at jlab.org <mailto:halld-cpp at jlab.org> <halld-cpp at jlab.org> 
>> <mailto:halld-cpp at jlab.org>
>> *Cc:* Naomi Jarvis <nsj at cmu.edu> <mailto:nsj at cmu.edu>; Sean Dobbs 
>> <sdobbs at jlab.org> <mailto:sdobbs at jlab.org>
>> *Subject:* Re: CPP REST production
>>
>> Hi Igal,
>>
>> This failure rate is pretty bad. Can you point me to the log files of 
>> the failed jobs? The messages you sent concerning root dictionaries 
>> and ps_counts_thresholds are not fatal, they will not cause a 
>> failures. You should still confirm with Sasha if the 
>> ps_counts_thresholds are important.
>>
>>
>> The file /work/halld/home/gxproj4/public/ForAlexAndSean/std.out seems 
>> to be cut off. We may want to add this option to the config file to 
>> remove the number of processed events:
>>
>>
>> JANA:BATCH_MODE 1
>>
>>
>> You can find my output files here: 
>> /work/halld2/home/aaustreg/Analysis/cpp/REST/
>>
>> I don't have the logs, but all files were closed at exactly the same 
>> time.
>>
>>
>> Cheers,
>>
>> Alex
>>
>>
>>
>>
>> On 6/26/24 08:59, Igal Jaegle wrote:
>>> Alex,
>>>
>>> Could you provide the path to your output files and most importantly 
>>> the logs?
>>>
>>> The results of the test on NERSC for the same run are as follows
>>>
>>> 108 evio files were cooked properly out of 126, thus 15% failure 
>>> rate which is way too much to proceed with the cooking.
>>>
>>> tks ig.
>>> ------------------------------------------------------------------------
>>> *From:* Alexander Austregesilo <aaustreg at jlab.org> 
>>> <mailto:aaustreg at jlab.org>
>>> *Sent:* Monday, June 24, 2024 6:06 PM
>>> *To:* halld-cpp at jlab.org <mailto:halld-cpp at jlab.org> 
>>> <halld-cpp at jlab.org> <mailto:halld-cpp at jlab.org>
>>> *Cc:* Naomi Jarvis <nsj at cmu.edu> <mailto:nsj at cmu.edu>; Igal Jaegle 
>>> <ijaegle at jlab.org> <mailto:ijaegle at jlab.org>; Sean Dobbs 
>>> <sdobbs at jlab.org> <mailto:sdobbs at jlab.org>
>>> *Subject:* CPP REST production
>>> Dear Colleagues,
>>>
>>> I processed one single file of a typical CPP production run on Pb
>>> target. Here is a list of all files and their sizes which will be
>>> produced suring the REST prodution:
>>>
>>> 1.7G hd_rawdata_101586_000.ps.evio
>>> 1.2G dana_rest.hddm
>>> 495M hd_rawdata_101586_000.ctof.evio
>>> 160M hd_rawdata_101586_000.cpp_2c.evio
>>> 115M hd_rawdata_101586_000.random.evio
>>>   93M tree_PSFlux.root
>>>   90M hd_rawdata_101586_000.npp_2pi0.evio
>>>   72M hd_rawdata_101586_000.npp_2g.evio
>>>   49M tree_TPOL.root
>>>   42M converted_random.hddm
>>>   41M hd_root.root
>>>   16M hd_rawdata_101586_000.FCAL-LED.evio
>>>   13M hd_rawdata_101586_000.BCAL-LED.evio
>>> 1.1M tree_tof_eff.root
>>> 164K tree_fcal_hadronic_eff.root
>>> 105K hd_rawdata_101586_000.DIRC-LED.evio
>>> 105K hd_rawdata_101586_000.CCAL-LED.evio
>>>   94K hd_rawdata_101586_000.sync.evio
>>>   24K tree_TS_scaler.root
>>>   24K tree_bcal_hadronic_eff.root
>>>   24K syncskim.root
>>>
>>> The trigger skims for ps (with thick converter) and ctof are quite
>>> large. Do we actually need them or were they maybe already produced
>>> during the calibration stages?
>>>
>>> As far as I understand, Igal has started a test of the REST production
>>> at NERSC. We are getting closer to launch!
>>>
>>> Cheers,
>>>
>>> Alex
>>>
>>>
>>>
>>>
>>>
>>> -- 
>>> Alexander Austregesilo
>>>
>>> Staff Scientist - Experimental Nuclear Physics
>>> Thomas Jefferson National Accelerator Facility
>>> Newport News, VA
>>> aaustreg at jlab.org <mailto:aaustreg at jlab.org>
>>> (757) 269-6982
>>>
>> -- 
>> Alexander Austregesilo
>>
>> Staff Scientist - Experimental Nuclear Physics
>> Thomas Jefferson National Accelerator Facility
>> Newport News, VA
>> aaustreg at jlab.org  <mailto:aaustreg at jlab.org>
>> (757) 269-6982
> -- 
> Alexander Austregesilo
>
> Staff Scientist - Experimental Nuclear Physics
> Thomas Jefferson National Accelerator Facility
> Newport News, VA
> aaustreg at jlab.org  <mailto:aaustreg at jlab.org>
> (757) 269-6982

-- 
Alexander Austregesilo

Staff Scientist - Experimental Nuclear Physics
Thomas Jefferson National Accelerator Facility
Newport News, VA
aaustreg at jlab.org
(757) 269-6982
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/halld-cpp/attachments/20240628/be0bba4e/attachment-0001.html>