[Halld-cpp] CPP REST production
Alexander Austregesilo
aaustreg at jlab.org
Fri Jun 28 12:15:44 EDT 2024
Excellent. Something must be wrong on my side with the sizes. When I
list my directory I get this output, which seems consistent with you:
-rw-r--r-- 1 aaustreg halld 43688763 Jun 25 15:59 converted_random.hddm
-rw-r--r-- 1 aaustreg halld 1244538131 Jun 25 15:59 dana_rest.hddm
-rw-r--r-- 1 aaustreg halld 16751296 Jun 25 15:59
hd_rawdata_101586_000.BCAL-LED.evio
-rw-r--r-- 1 aaustreg halld 322128 Jun 25 15:59
hd_rawdata_101586_000.CCAL-LED.evio
-rw-r--r-- 1 aaustreg halld 237368676 Jun 25 15:59
hd_rawdata_101586_000.cpp_2c.evio
-rw-r--r-- 1 aaustreg halld 814874380 Jun 25 15:59
hd_rawdata_101586_000.ctof.evio
-rw-r--r-- 1 aaustreg halld 322128 Jun 25 15:59
hd_rawdata_101586_000.DIRC-LED.evio
-rw-r--r-- 1 aaustreg halld 19690620 Jun 25 15:59
hd_rawdata_101586_000.FCAL-LED.evio
-rw-r--r-- 1 aaustreg halld 110784828 Jun 25 15:59
hd_rawdata_101586_000.npp_2g.evio
-rw-r--r-- 1 aaustreg halld 132611736 Jun 25 15:59
hd_rawdata_101586_000.npp_2pi0.evio
-rw-r--r-- 1 aaustreg halld 2804807604 Jun 25 15:59
hd_rawdata_101586_000.ps.evio
-rw-r--r-- 1 aaustreg halld 188496932 Jun 25 15:59
hd_rawdata_101586_000.random.evio
-rw-r--r-- 1 aaustreg halld 314564 Jun 25 15:59
hd_rawdata_101586_000.sync.evio
-rw-r--r-- 1 aaustreg halld 44638598 Jun 25 15:59 hd_root.root
-rw-r--r-- 1 aaustreg halld 1329 Jun 27 14:59
jana_recon_2022_05_ver01.config
-rw-r--r-- 1 aaustreg halld 1343 Jun 26 15:38
jana_recon_2022_05_ver01.config~
-rw-r--r-- 1 aaustreg halld 7594 Jun 25 15:59 syncskim.root
-rw-r--r-- 1 aaustreg halld 7370 Jun 25 15:59
tree_bcal_hadronic_eff.root
-rw-r--r-- 1 aaustreg halld 130996 Jun 25 15:59
tree_fcal_hadronic_eff.root
-rw-r--r-- 1 aaustreg halld 96654396 Jun 25 15:59 tree_PSFlux.root
-rw-r--r-- 1 aaustreg halld 1063333 Jun 25 15:59 tree_tof_eff.root
-rw-r--r-- 1 aaustreg halld 50428784 Jun 25 15:59 tree_TPOL.root
-rw-r--r-- 1 aaustreg halld 8617 Jun 25 15:59 tree_TS_scaler.root
But when I ask the shell for the size of individual files, I get a
different result:
ls -s hd_rawdata_101586_000.ps.evio
1743356 hd_rawdata_101586_000.ps.evio
That is super weird, but I don't think any of the lock fixes have
anything to do with that. It is a difference between disk allocation and
byte count.
From my side, I would just update the config file from teh group disk
to get rid of the ctof and ps skims.
I also noticed that you have tree_sc_eff.root as an output file in the
launch_nersc.py script, even though it is not produced by the job. Maybe
you want to update this list.
On 6/28/24 00:09, Igal Jaegle wrote:
> The second attempt worked, the job took slightly more than 3h with
> 100% success rate but the skims are almost all larger than the one
> produced locally. So, there is still an issue.
>
> The files are on volatile and will be moved tomorrow on cache
>
> /volatile/halld/offsite_prod/RunPeriod-2022-05/recon/ver99-perl/RUN101586/
>
> tks ig.
>
> s -lrth
> /volatile/halld/offsite_prod/RunPeriod-2022-05/recon/ver99-perl/RUN101586/FILE000/RUN101586/FILE000/
> total 5.2G
> -rw-r--r-- 1 gxproj4 halld 784 Jun 25 05:42 tree_sc_eff.root
> -rw-r--r-- 1 gxproj4 halld 315K Jun 27 22:13
> hd_rawdata_101586_000.CCAL-LED.evio
> -rw-r--r-- 1 gxproj4 halld 16M Jun 27 22:13
> hd_rawdata_101586_000.BCAL-LED.evio
> -rw-r--r-- 1 gxproj4 halld 7.5K Jun 27 22:13 syncskim.root
> -rw-r--r-- 1 gxproj4 halld 174K Jun 27 22:13 job_info_101586_000.tgz
> -rw-r--r-- 1 gxproj4 halld 227M Jun 27 22:14
> hd_rawdata_101586_000.cpp_2c.evio
> -rw-r--r-- 1 gxproj4 halld 778M Jun 27 22:14
> hd_rawdata_101586_000.ctof.evio
> -rw-r--r-- 1 gxproj4 halld 42M Jun 27 22:15
> converted_random_101586_000.hddm
> -rw-r--r-- 1 gxproj4 halld 49M Jun 27 22:15 tree_TPOL.root
> -rw-r--r-- 1 gxproj4 halld 8.5K Jun 27 22:15 tree_TS_scaler.root
> -rw-r--r-- 1 gxproj4 halld 1.2G Jun 27 22:15 dana_rest.hddm
> -rw-r--r-- 1 gxproj4 halld 93M Jun 27 22:15 tree_PSFlux.root
> -rw-r--r-- 1 gxproj4 halld 129K Jun 27 22:15 tree_fcal_hadronic_eff.root
> -rw-r--r-- 1 gxproj4 halld 7.2K Jun 27 22:15 tree_bcal_hadronic_eff.root
> -rw-r--r-- 1 gxproj4 halld 37M Jun 27 22:15 hd_root.root
> -rw-r--r-- 1 gxproj4 halld 308K Jun 27 22:16
> hd_rawdata_101586_000.sync.evio
> -rw-r--r-- 1 gxproj4 halld 315K Jun 27 22:16
> hd_rawdata_101586_000.DIRC-LED.evio
> -rw-r--r-- 1 gxproj4 halld 127M Jun 27 22:18
> hd_rawdata_101586_000.npp_2pi0.evio
> -rw-r--r-- 1 gxproj4 halld 106M Jun 27 22:18
> hd_rawdata_101586_000.npp_2g.evio
> -rw-r--r-- 1 gxproj4 halld 19M Jun 27 22:18
> hd_rawdata_101586_000.FCAL-LED.evio
> -rw-r--r-- 1 gxproj4 halld 180M Jun 27 22:21
> hd_rawdata_101586_000.random.evio
> -rw-r--r-- 1 gxproj4 halld 2.7G Jun 27 22:21 hd_rawdata_101586_000.ps.evio
> -rw-r--r-- 1 gxproj4 halld 1.1M Jun 27 22:27 tree_tof_eff.root
> ------------------------------------------------------------------------
> *From:* Igal Jaegle <ijaegle at jlab.org>
> *Sent:* Wednesday, June 26, 2024 10:38 AM
> *To:* Alexander Austregesilo <aaustreg at jlab.org>; halld-cpp at jlab.org
> <halld-cpp at jlab.org>
> *Cc:* Naomi Jarvis <nsj at cmu.edu>; Sean Dobbs <sdobbs at jlab.org>
> *Subject:* Re: CPP REST production
> PERLMUTTER is down due to a maintenance day. But tomorrow I can grab
> the other logs.
>
> tks ig.
> ------------------------------------------------------------------------
> *From:* Alexander Austregesilo <aaustreg at jlab.org>
> *Sent:* Wednesday, June 26, 2024 10:32 AM
> *To:* Igal Jaegle <ijaegle at jlab.org>; halld-cpp at jlab.org
> <halld-cpp at jlab.org>
> *Cc:* Naomi Jarvis <nsj at cmu.edu>; Sean Dobbs <sdobbs at jlab.org>
> *Subject:* Re: CPP REST production
>
> Hi Igal,
>
> This failure rate is pretty bad. Can you point me to the log files of
> the failed jobs? The messages you sent concerning root dictionaries
> and ps_counts_thresholds are not fatal, they will not cause a
> failures. You should still confirm with Sasha if the
> ps_counts_thresholds are important.
>
>
> The file /work/halld/home/gxproj4/public/ForAlexAndSean/std.out seems
> to be cut off. We may want to add this option to the config file to
> remove the number of processed events:
>
>
> JANA:BATCH_MODE 1
>
>
> You can find my output files here:
> /work/halld2/home/aaustreg/Analysis/cpp/REST/
>
> I don't have the logs, but all files were closed at exactly the same time.
>
>
> Cheers,
>
> Alex
>
>
>
>
> On 6/26/24 08:59, Igal Jaegle wrote:
>> Alex,
>>
>> Could you provide the path to your output files and most importantly
>> the logs?
>>
>> The results of the test on NERSC for the same run are as follows
>>
>> 108 evio files were cooked properly out of 126, thus 15% failure rate
>> which is way too much to proceed with the cooking.
>>
>> tks ig.
>> ------------------------------------------------------------------------
>> *From:* Alexander Austregesilo <aaustreg at jlab.org>
>> <mailto:aaustreg at jlab.org>
>> *Sent:* Monday, June 24, 2024 6:06 PM
>> *To:* halld-cpp at jlab.org <mailto:halld-cpp at jlab.org>
>> <halld-cpp at jlab.org> <mailto:halld-cpp at jlab.org>
>> *Cc:* Naomi Jarvis <nsj at cmu.edu> <mailto:nsj at cmu.edu>; Igal Jaegle
>> <ijaegle at jlab.org> <mailto:ijaegle at jlab.org>; Sean Dobbs
>> <sdobbs at jlab.org> <mailto:sdobbs at jlab.org>
>> *Subject:* CPP REST production
>> Dear Colleagues,
>>
>> I processed one single file of a typical CPP production run on Pb
>> target. Here is a list of all files and their sizes which will be
>> produced suring the REST prodution:
>>
>> 1.7G hd_rawdata_101586_000.ps.evio
>> 1.2G dana_rest.hddm
>> 495M hd_rawdata_101586_000.ctof.evio
>> 160M hd_rawdata_101586_000.cpp_2c.evio
>> 115M hd_rawdata_101586_000.random.evio
>> 93M tree_PSFlux.root
>> 90M hd_rawdata_101586_000.npp_2pi0.evio
>> 72M hd_rawdata_101586_000.npp_2g.evio
>> 49M tree_TPOL.root
>> 42M converted_random.hddm
>> 41M hd_root.root
>> 16M hd_rawdata_101586_000.FCAL-LED.evio
>> 13M hd_rawdata_101586_000.BCAL-LED.evio
>> 1.1M tree_tof_eff.root
>> 164K tree_fcal_hadronic_eff.root
>> 105K hd_rawdata_101586_000.DIRC-LED.evio
>> 105K hd_rawdata_101586_000.CCAL-LED.evio
>> 94K hd_rawdata_101586_000.sync.evio
>> 24K tree_TS_scaler.root
>> 24K tree_bcal_hadronic_eff.root
>> 24K syncskim.root
>>
>> The trigger skims for ps (with thick converter) and ctof are quite
>> large. Do we actually need them or were they maybe already produced
>> during the calibration stages?
>>
>> As far as I understand, Igal has started a test of the REST production
>> at NERSC. We are getting closer to launch!
>>
>> Cheers,
>>
>> Alex
>>
>>
>>
>>
>>
>> --
>> Alexander Austregesilo
>>
>> Staff Scientist - Experimental Nuclear Physics
>> Thomas Jefferson National Accelerator Facility
>> Newport News, VA
>> aaustreg at jlab.org <mailto:aaustreg at jlab.org>
>> (757) 269-6982
>>
> --
> Alexander Austregesilo
>
> Staff Scientist - Experimental Nuclear Physics
> Thomas Jefferson National Accelerator Facility
> Newport News, VA
> aaustreg at jlab.org <mailto:aaustreg at jlab.org>
> (757) 269-6982
--
Alexander Austregesilo
Staff Scientist - Experimental Nuclear Physics
Thomas Jefferson National Accelerator Facility
Newport News, VA
aaustreg at jlab.org
(757) 269-6982
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/halld-cpp/attachments/20240628/7a21df6c/attachment-0001.html>
More information about the Halld-cpp
mailing list