[Halld-cpp] CPP REST production
Alexander Austregesilo
aaustreg at jlab.org
Fri Jun 28 13:10:45 EDT 2024
Maybe you can run another quick test with the updated config file and
launch_nersc.py script over the weekend?
I may have a new release on Monday.
On 6/28/24 13:04, Igal Jaegle wrote:
> No problem from my side. Just let me know when they are loaded so that
> I can start the production a day or two later.
>
> tks ig.
>
> ------------------------------------------------------------------------
> *From:* Ilya Larin <ilarin at jlab.org>
> *Sent:* Friday, June 28, 2024 1:01 PM
> *To:* Alexander Austregesilo <aaustreg at jlab.org>; halld-cpp at jlab.org
> <halld-cpp at jlab.org>; Igal Jaegle <ijaegle at jlab.org>
> *Cc:* Sean Dobbs <sdobbs at jlab.org>
> *Subject:* Re: CPP REST production
>
> I have few more tables for dead counters if you think it would be good
> to update.
>
> Ilya
>
> ------------------------------------------------------------------------
> *From:* Halld-cpp <halld-cpp-bounces at jlab.org> on behalf of Igal
> Jaegle via Halld-cpp <halld-cpp at jlab.org>
> *Sent:* Friday, June 28, 2024 13:00
> *To:* Alexander Austregesilo <aaustreg at jlab.org>; halld-cpp at jlab.org
> <halld-cpp at jlab.org>
> *Cc:* Sean Dobbs <sdobbs at jlab.org>
> *Subject:* Re: [Halld-cpp] CPP REST production
> ok so I have the green light to start the production - ctof & ps evio
> skims?
>
> tks ig.
> ------------------------------------------------------------------------
> *From:* Alexander Austregesilo <aaustreg at jlab.org>
> *Sent:* Friday, June 28, 2024 12:15 PM
> *To:* Igal Jaegle <ijaegle at jlab.org>; halld-cpp at jlab.org
> <halld-cpp at jlab.org>
> *Cc:* Naomi Jarvis <nsj at cmu.edu>; Sean Dobbs <sdobbs at jlab.org>
> *Subject:* Re: CPP REST production
>
> Excellent. Something must be wrong on my side with the sizes. When I
> list my directory I get this output, which seems consistent with you:
>
>
> -rw-r--r-- 1 aaustreg halld 43688763 Jun 25 15:59 converted_random.hddm
> -rw-r--r-- 1 aaustreg halld 1244538131 Jun 25 15:59 dana_rest.hddm
> -rw-r--r-- 1 aaustreg halld 16751296 Jun 25 15:59
> hd_rawdata_101586_000.BCAL-LED.evio
> -rw-r--r-- 1 aaustreg halld 322128 Jun 25 15:59
> hd_rawdata_101586_000.CCAL-LED.evio
> -rw-r--r-- 1 aaustreg halld 237368676 Jun 25 15:59
> hd_rawdata_101586_000.cpp_2c.evio
> -rw-r--r-- 1 aaustreg halld 814874380 Jun 25 15:59
> hd_rawdata_101586_000.ctof.evio
> -rw-r--r-- 1 aaustreg halld 322128 Jun 25 15:59
> hd_rawdata_101586_000.DIRC-LED.evio
> -rw-r--r-- 1 aaustreg halld 19690620 Jun 25 15:59
> hd_rawdata_101586_000.FCAL-LED.evio
> -rw-r--r-- 1 aaustreg halld 110784828 Jun 25 15:59
> hd_rawdata_101586_000.npp_2g.evio
> -rw-r--r-- 1 aaustreg halld 132611736 Jun 25 15:59
> hd_rawdata_101586_000.npp_2pi0.evio
> -rw-r--r-- 1 aaustreg halld 2804807604 Jun 25 15:59
> hd_rawdata_101586_000.ps.evio
> -rw-r--r-- 1 aaustreg halld 188496932 Jun 25 15:59
> hd_rawdata_101586_000.random.evio
> -rw-r--r-- 1 aaustreg halld 314564 Jun 25 15:59
> hd_rawdata_101586_000.sync.evio
> -rw-r--r-- 1 aaustreg halld 44638598 Jun 25 15:59 hd_root.root
> -rw-r--r-- 1 aaustreg halld 1329 Jun 27 14:59
> jana_recon_2022_05_ver01.config
> -rw-r--r-- 1 aaustreg halld 1343 Jun 26 15:38
> jana_recon_2022_05_ver01.config~
> -rw-r--r-- 1 aaustreg halld 7594 Jun 25 15:59 syncskim.root
> -rw-r--r-- 1 aaustreg halld 7370 Jun 25 15:59
> tree_bcal_hadronic_eff.root
> -rw-r--r-- 1 aaustreg halld 130996 Jun 25 15:59
> tree_fcal_hadronic_eff.root
> -rw-r--r-- 1 aaustreg halld 96654396 Jun 25 15:59 tree_PSFlux.root
> -rw-r--r-- 1 aaustreg halld 1063333 Jun 25 15:59 tree_tof_eff.root
> -rw-r--r-- 1 aaustreg halld 50428784 Jun 25 15:59 tree_TPOL.root
> -rw-r--r-- 1 aaustreg halld 8617 Jun 25 15:59 tree_TS_scaler.root
>
>
> But when I ask the shell for the size of individual files, I get a
> different result:
>
>
> ls -s hd_rawdata_101586_000.ps.evio
> 1743356 hd_rawdata_101586_000.ps.evio
>
>
> That is super weird, but I don't think any of the lock fixes have
> anything to do with that. It is a difference between disk allocation
> and byte count.
>
>
> From my side, I would just update the config file from teh group disk
> to get rid of the ctof and ps skims.
>
>
> I also noticed that you have tree_sc_eff.root as an output file in the
> launch_nersc.py script, even though it is not produced by the job.
> Maybe you want to update this list.
>
>
>
> On 6/28/24 00:09, Igal Jaegle wrote:
>> The second attempt worked, the job took slightly more than 3h with
>> 100% success rate but the skims are almost all larger than the one
>> produced locally. So, there is still an issue.
>>
>> The files are on volatile and will be moved tomorrow on cache
>>
>> /volatile/halld/offsite_prod/RunPeriod-2022-05/recon/ver99-perl/RUN101586/
>>
>> tks ig.
>>
>> s -lrth
>> /volatile/halld/offsite_prod/RunPeriod-2022-05/recon/ver99-perl/RUN101586/FILE000/RUN101586/FILE000/
>> total 5.2G
>> -rw-r--r-- 1 gxproj4 halld 784 Jun 25 05:42 tree_sc_eff.root
>> -rw-r--r-- 1 gxproj4 halld 315K Jun 27 22:13
>> hd_rawdata_101586_000.CCAL-LED.evio
>> -rw-r--r-- 1 gxproj4 halld 16M Jun 27 22:13
>> hd_rawdata_101586_000.BCAL-LED.evio
>> -rw-r--r-- 1 gxproj4 halld 7.5K Jun 27 22:13 syncskim.root
>> -rw-r--r-- 1 gxproj4 halld 174K Jun 27 22:13 job_info_101586_000.tgz
>> -rw-r--r-- 1 gxproj4 halld 227M Jun 27 22:14
>> hd_rawdata_101586_000.cpp_2c.evio
>> -rw-r--r-- 1 gxproj4 halld 778M Jun 27 22:14
>> hd_rawdata_101586_000.ctof.evio
>> -rw-r--r-- 1 gxproj4 halld 42M Jun 27 22:15
>> converted_random_101586_000.hddm
>> -rw-r--r-- 1 gxproj4 halld 49M Jun 27 22:15 tree_TPOL.root
>> -rw-r--r-- 1 gxproj4 halld 8.5K Jun 27 22:15 tree_TS_scaler.root
>> -rw-r--r-- 1 gxproj4 halld 1.2G Jun 27 22:15 dana_rest.hddm
>> -rw-r--r-- 1 gxproj4 halld 93M Jun 27 22:15 tree_PSFlux.root
>> -rw-r--r-- 1 gxproj4 halld 129K Jun 27 22:15 tree_fcal_hadronic_eff.root
>> -rw-r--r-- 1 gxproj4 halld 7.2K Jun 27 22:15 tree_bcal_hadronic_eff.root
>> -rw-r--r-- 1 gxproj4 halld 37M Jun 27 22:15 hd_root.root
>> -rw-r--r-- 1 gxproj4 halld 308K Jun 27 22:16
>> hd_rawdata_101586_000.sync.evio
>> -rw-r--r-- 1 gxproj4 halld 315K Jun 27 22:16
>> hd_rawdata_101586_000.DIRC-LED.evio
>> -rw-r--r-- 1 gxproj4 halld 127M Jun 27 22:18
>> hd_rawdata_101586_000.npp_2pi0.evio
>> -rw-r--r-- 1 gxproj4 halld 106M Jun 27 22:18
>> hd_rawdata_101586_000.npp_2g.evio
>> -rw-r--r-- 1 gxproj4 halld 19M Jun 27 22:18
>> hd_rawdata_101586_000.FCAL-LED.evio
>> -rw-r--r-- 1 gxproj4 halld 180M Jun 27 22:21
>> hd_rawdata_101586_000.random.evio
>> -rw-r--r-- 1 gxproj4 halld 2.7G Jun 27 22:21
>> hd_rawdata_101586_000.ps.evio
>> -rw-r--r-- 1 gxproj4 halld 1.1M Jun 27 22:27 tree_tof_eff.root
>> ------------------------------------------------------------------------
>> *From:* Igal Jaegle <ijaegle at jlab.org> <mailto:ijaegle at jlab.org>
>> *Sent:* Wednesday, June 26, 2024 10:38 AM
>> *To:* Alexander Austregesilo <aaustreg at jlab.org>
>> <mailto:aaustreg at jlab.org>; halld-cpp at jlab.org
>> <mailto:halld-cpp at jlab.org> <halld-cpp at jlab.org>
>> <mailto:halld-cpp at jlab.org>
>> *Cc:* Naomi Jarvis <nsj at cmu.edu> <mailto:nsj at cmu.edu>; Sean Dobbs
>> <sdobbs at jlab.org> <mailto:sdobbs at jlab.org>
>> *Subject:* Re: CPP REST production
>> PERLMUTTER is down due to a maintenance day. But tomorrow I can grab
>> the other logs.
>>
>> tks ig.
>> ------------------------------------------------------------------------
>> *From:* Alexander Austregesilo <aaustreg at jlab.org>
>> <mailto:aaustreg at jlab.org>
>> *Sent:* Wednesday, June 26, 2024 10:32 AM
>> *To:* Igal Jaegle <ijaegle at jlab.org> <mailto:ijaegle at jlab.org>;
>> halld-cpp at jlab.org <mailto:halld-cpp at jlab.org> <halld-cpp at jlab.org>
>> <mailto:halld-cpp at jlab.org>
>> *Cc:* Naomi Jarvis <nsj at cmu.edu> <mailto:nsj at cmu.edu>; Sean Dobbs
>> <sdobbs at jlab.org> <mailto:sdobbs at jlab.org>
>> *Subject:* Re: CPP REST production
>>
>> Hi Igal,
>>
>> This failure rate is pretty bad. Can you point me to the log files of
>> the failed jobs? The messages you sent concerning root dictionaries
>> and ps_counts_thresholds are not fatal, they will not cause a
>> failures. You should still confirm with Sasha if the
>> ps_counts_thresholds are important.
>>
>>
>> The file /work/halld/home/gxproj4/public/ForAlexAndSean/std.out seems
>> to be cut off. We may want to add this option to the config file to
>> remove the number of processed events:
>>
>>
>> JANA:BATCH_MODE 1
>>
>>
>> You can find my output files here:
>> /work/halld2/home/aaustreg/Analysis/cpp/REST/
>>
>> I don't have the logs, but all files were closed at exactly the same
>> time.
>>
>>
>> Cheers,
>>
>> Alex
>>
>>
>>
>>
>> On 6/26/24 08:59, Igal Jaegle wrote:
>>> Alex,
>>>
>>> Could you provide the path to your output files and most importantly
>>> the logs?
>>>
>>> The results of the test on NERSC for the same run are as follows
>>>
>>> 108 evio files were cooked properly out of 126, thus 15% failure
>>> rate which is way too much to proceed with the cooking.
>>>
>>> tks ig.
>>> ------------------------------------------------------------------------
>>> *From:* Alexander Austregesilo <aaustreg at jlab.org>
>>> <mailto:aaustreg at jlab.org>
>>> *Sent:* Monday, June 24, 2024 6:06 PM
>>> *To:* halld-cpp at jlab.org <mailto:halld-cpp at jlab.org>
>>> <halld-cpp at jlab.org> <mailto:halld-cpp at jlab.org>
>>> *Cc:* Naomi Jarvis <nsj at cmu.edu> <mailto:nsj at cmu.edu>; Igal Jaegle
>>> <ijaegle at jlab.org> <mailto:ijaegle at jlab.org>; Sean Dobbs
>>> <sdobbs at jlab.org> <mailto:sdobbs at jlab.org>
>>> *Subject:* CPP REST production
>>> Dear Colleagues,
>>>
>>> I processed one single file of a typical CPP production run on Pb
>>> target. Here is a list of all files and their sizes which will be
>>> produced suring the REST prodution:
>>>
>>> 1.7G hd_rawdata_101586_000.ps.evio
>>> 1.2G dana_rest.hddm
>>> 495M hd_rawdata_101586_000.ctof.evio
>>> 160M hd_rawdata_101586_000.cpp_2c.evio
>>> 115M hd_rawdata_101586_000.random.evio
>>> 93M tree_PSFlux.root
>>> 90M hd_rawdata_101586_000.npp_2pi0.evio
>>> 72M hd_rawdata_101586_000.npp_2g.evio
>>> 49M tree_TPOL.root
>>> 42M converted_random.hddm
>>> 41M hd_root.root
>>> 16M hd_rawdata_101586_000.FCAL-LED.evio
>>> 13M hd_rawdata_101586_000.BCAL-LED.evio
>>> 1.1M tree_tof_eff.root
>>> 164K tree_fcal_hadronic_eff.root
>>> 105K hd_rawdata_101586_000.DIRC-LED.evio
>>> 105K hd_rawdata_101586_000.CCAL-LED.evio
>>> 94K hd_rawdata_101586_000.sync.evio
>>> 24K tree_TS_scaler.root
>>> 24K tree_bcal_hadronic_eff.root
>>> 24K syncskim.root
>>>
>>> The trigger skims for ps (with thick converter) and ctof are quite
>>> large. Do we actually need them or were they maybe already produced
>>> during the calibration stages?
>>>
>>> As far as I understand, Igal has started a test of the REST production
>>> at NERSC. We are getting closer to launch!
>>>
>>> Cheers,
>>>
>>> Alex
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Alexander Austregesilo
>>>
>>> Staff Scientist - Experimental Nuclear Physics
>>> Thomas Jefferson National Accelerator Facility
>>> Newport News, VA
>>> aaustreg at jlab.org <mailto:aaustreg at jlab.org>
>>> (757) 269-6982
>>>
>> --
>> Alexander Austregesilo
>>
>> Staff Scientist - Experimental Nuclear Physics
>> Thomas Jefferson National Accelerator Facility
>> Newport News, VA
>> aaustreg at jlab.org <mailto:aaustreg at jlab.org>
>> (757) 269-6982
> --
> Alexander Austregesilo
>
> Staff Scientist - Experimental Nuclear Physics
> Thomas Jefferson National Accelerator Facility
> Newport News, VA
> aaustreg at jlab.org <mailto:aaustreg at jlab.org>
> (757) 269-6982
--
Alexander Austregesilo
Staff Scientist - Experimental Nuclear Physics
Thomas Jefferson National Accelerator Facility
Newport News, VA
aaustreg at jlab.org
(757) 269-6982
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/halld-cpp/attachments/20240628/be0bba4e/attachment-0001.html>
More information about the Halld-cpp
mailing list