[Halld-cpp] CPP REST production

Fri Jun 28 13:39:19 EDT 2024

 I had an impression, that blocks with bad quality are switched off at the hit reconstruction stage.
Maybe I am wrong:
FCAL/DFCALHit_factory.cc lines 178-181:

        // throw away hits from bad or noisy channels
        fcal_quality_state quality =
            static_cast<fcal_quality_state>(block_qualities[digihit->row][digihit->column]);
        if ( (quality==BAD) || (quality==NOISY) ) continue;

 Ilya

________________________________
From: Igal Jaegle <ijaegle at jlab.org>
Sent: Friday, June 28, 2024 13:04
To: Ilya Larin <ilarin at jlab.org>; Alexander Austregesilo <aaustreg at jlab.org>; halld-cpp at jlab.org <halld-cpp at jlab.org>
Cc: Sean Dobbs <sdobbs at jlab.org>
Subject: Re: CPP REST production

No problem from my side. Just let me know when they are loaded so that I can start the production a day or two later.

tks ig.

________________________________
From: Ilya Larin <ilarin at jlab.org>
Sent: Friday, June 28, 2024 1:01 PM
To: Alexander Austregesilo <aaustreg at jlab.org>; halld-cpp at jlab.org <halld-cpp at jlab.org>; Igal Jaegle <ijaegle at jlab.org>
Cc: Sean Dobbs <sdobbs at jlab.org>
Subject: Re: CPP REST production

I have few more tables for dead counters if you think it would be good to update.

 Ilya

________________________________
From: Halld-cpp <halld-cpp-bounces at jlab.org> on behalf of Igal Jaegle via Halld-cpp <halld-cpp at jlab.org>
Sent: Friday, June 28, 2024 13:00
To: Alexander Austregesilo <aaustreg at jlab.org>; halld-cpp at jlab.org <halld-cpp at jlab.org>
Cc: Sean Dobbs <sdobbs at jlab.org>
Subject: Re: [Halld-cpp] CPP REST production

ok so I have the green light to start the production - ctof & ps evio skims?

tks ig.
________________________________
From: Alexander Austregesilo <aaustreg at jlab.org>
Sent: Friday, June 28, 2024 12:15 PM
To: Igal Jaegle <ijaegle at jlab.org>; halld-cpp at jlab.org <halld-cpp at jlab.org>
Cc: Naomi Jarvis <nsj at cmu.edu>; Sean Dobbs <sdobbs at jlab.org>
Subject: Re: CPP REST production

Excellent. Something must be wrong on my side with the sizes. When I list my directory I get this output, which seems consistent with you:

-rw-r--r-- 1 aaustreg halld   43688763 Jun 25 15:59 converted_random.hddm
-rw-r--r-- 1 aaustreg halld 1244538131 Jun 25 15:59 dana_rest.hddm
-rw-r--r-- 1 aaustreg halld   16751296 Jun 25 15:59 hd_rawdata_101586_000.BCAL-LED.evio
-rw-r--r-- 1 aaustreg halld     322128 Jun 25 15:59 hd_rawdata_101586_000.CCAL-LED.evio
-rw-r--r-- 1 aaustreg halld  237368676 Jun 25 15:59 hd_rawdata_101586_000.cpp_2c.evio
-rw-r--r-- 1 aaustreg halld  814874380 Jun 25 15:59 hd_rawdata_101586_000.ctof.evio
-rw-r--r-- 1 aaustreg halld     322128 Jun 25 15:59 hd_rawdata_101586_000.DIRC-LED.evio
-rw-r--r-- 1 aaustreg halld   19690620 Jun 25 15:59 hd_rawdata_101586_000.FCAL-LED.evio
-rw-r--r-- 1 aaustreg halld  110784828 Jun 25 15:59 hd_rawdata_101586_000.npp_2g.evio
-rw-r--r-- 1 aaustreg halld  132611736 Jun 25 15:59 hd_rawdata_101586_000.npp_2pi0.evio
-rw-r--r-- 1 aaustreg halld 2804807604 Jun 25 15:59 hd_rawdata_101586_000.ps.evio
-rw-r--r-- 1 aaustreg halld  188496932 Jun 25 15:59 hd_rawdata_101586_000.random.evio
-rw-r--r-- 1 aaustreg halld     314564 Jun 25 15:59 hd_rawdata_101586_000.sync.evio
-rw-r--r-- 1 aaustreg halld   44638598 Jun 25 15:59 hd_root.root
-rw-r--r-- 1 aaustreg halld       1329 Jun 27 14:59 jana_recon_2022_05_ver01.config
-rw-r--r-- 1 aaustreg halld       1343 Jun 26 15:38 jana_recon_2022_05_ver01.config~
-rw-r--r-- 1 aaustreg halld       7594 Jun 25 15:59 syncskim.root
-rw-r--r-- 1 aaustreg halld       7370 Jun 25 15:59 tree_bcal_hadronic_eff.root
-rw-r--r-- 1 aaustreg halld     130996 Jun 25 15:59 tree_fcal_hadronic_eff.root
-rw-r--r-- 1 aaustreg halld   96654396 Jun 25 15:59 tree_PSFlux.root
-rw-r--r-- 1 aaustreg halld    1063333 Jun 25 15:59 tree_tof_eff.root
-rw-r--r-- 1 aaustreg halld   50428784 Jun 25 15:59 tree_TPOL.root
-rw-r--r-- 1 aaustreg halld       8617 Jun 25 15:59 tree_TS_scaler.root

But when I ask the shell for the size of individual files, I get a different result:

ls -s hd_rawdata_101586_000.ps.evio
1743356 hd_rawdata_101586_000.ps.evio

That is super weird, but I don't think any of the lock fixes have anything to do with that. It is a difference between disk allocation and byte count.

>From my side, I would just update the config file from teh group disk to get rid of the ctof and ps skims.

I also noticed that you have tree_sc_eff.root as an output file in the launch_nersc.py script, even though it is not produced by the job. Maybe you want to update this list.

On 6/28/24 00:09, Igal Jaegle wrote:
The second attempt worked, the job took slightly more than 3h with 100% success rate but the skims are almost all larger than the one produced locally. So, there is still an issue.

The files are on volatile and will be moved tomorrow on cache

/volatile/halld/offsite_prod/RunPeriod-2022-05/recon/ver99-perl/RUN101586/

tks ig.

s -lrth /volatile/halld/offsite_prod/RunPeriod-2022-05/recon/ver99-perl/RUN101586/FILE000/RUN101586/FILE000/
total 5.2G
-rw-r--r-- 1 gxproj4 halld  784 Jun 25 05:42 tree_sc_eff.root
-rw-r--r-- 1 gxproj4 halld 315K Jun 27 22:13 hd_rawdata_101586_000.CCAL-LED.evio
-rw-r--r-- 1 gxproj4 halld  16M Jun 27 22:13 hd_rawdata_101586_000.BCAL-LED.evio
-rw-r--r-- 1 gxproj4 halld 7.5K Jun 27 22:13 syncskim.root
-rw-r--r-- 1 gxproj4 halld 174K Jun 27 22:13 job_info_101586_000.tgz
-rw-r--r-- 1 gxproj4 halld 227M Jun 27 22:14 hd_rawdata_101586_000.cpp_2c.evio
-rw-r--r-- 1 gxproj4 halld 778M Jun 27 22:14 hd_rawdata_101586_000.ctof.evio
-rw-r--r-- 1 gxproj4 halld  42M Jun 27 22:15 converted_random_101586_000.hddm
-rw-r--r-- 1 gxproj4 halld  49M Jun 27 22:15 tree_TPOL.root
-rw-r--r-- 1 gxproj4 halld 8.5K Jun 27 22:15 tree_TS_scaler.root
-rw-r--r-- 1 gxproj4 halld 1.2G Jun 27 22:15 dana_rest.hddm
-rw-r--r-- 1 gxproj4 halld  93M Jun 27 22:15 tree_PSFlux.root
-rw-r--r-- 1 gxproj4 halld 129K Jun 27 22:15 tree_fcal_hadronic_eff.root
-rw-r--r-- 1 gxproj4 halld 7.2K Jun 27 22:15 tree_bcal_hadronic_eff.root
-rw-r--r-- 1 gxproj4 halld  37M Jun 27 22:15 hd_root.root
-rw-r--r-- 1 gxproj4 halld 308K Jun 27 22:16 hd_rawdata_101586_000.sync.evio
-rw-r--r-- 1 gxproj4 halld 315K Jun 27 22:16 hd_rawdata_101586_000.DIRC-LED.evio
-rw-r--r-- 1 gxproj4 halld 127M Jun 27 22:18 hd_rawdata_101586_000.npp_2pi0.evio
-rw-r--r-- 1 gxproj4 halld 106M Jun 27 22:18 hd_rawdata_101586_000.npp_2g.evio
-rw-r--r-- 1 gxproj4 halld  19M Jun 27 22:18 hd_rawdata_101586_000.FCAL-LED.evio
-rw-r--r-- 1 gxproj4 halld 180M Jun 27 22:21 hd_rawdata_101586_000.random.evio
-rw-r--r-- 1 gxproj4 halld 2.7G Jun 27 22:21 hd_rawdata_101586_000.ps.evio
-rw-r--r-- 1 gxproj4 halld 1.1M Jun 27 22:27 tree_tof_eff.root
________________________________
From: Igal Jaegle <ijaegle at jlab.org><mailto:ijaegle at jlab.org>
Sent: Wednesday, June 26, 2024 10:38 AM
To: Alexander Austregesilo <aaustreg at jlab.org><mailto:aaustreg at jlab.org>; halld-cpp at jlab.org<mailto:halld-cpp at jlab.org> <halld-cpp at jlab.org><mailto:halld-cpp at jlab.org>
Cc: Naomi Jarvis <nsj at cmu.edu><mailto:nsj at cmu.edu>; Sean Dobbs <sdobbs at jlab.org><mailto:sdobbs at jlab.org>
Subject: Re: CPP REST production

PERLMUTTER is down due to a maintenance day. But tomorrow I can grab the other logs.

tks ig.
________________________________
From: Alexander Austregesilo <aaustreg at jlab.org><mailto:aaustreg at jlab.org>
Sent: Wednesday, June 26, 2024 10:32 AM
To: Igal Jaegle <ijaegle at jlab.org><mailto:ijaegle at jlab.org>; halld-cpp at jlab.org<mailto:halld-cpp at jlab.org> <halld-cpp at jlab.org><mailto:halld-cpp at jlab.org>
Cc: Naomi Jarvis <nsj at cmu.edu><mailto:nsj at cmu.edu>; Sean Dobbs <sdobbs at jlab.org><mailto:sdobbs at jlab.org>
Subject: Re: CPP REST production

Hi Igal,

This failure rate is pretty bad. Can you point me to the log files of the failed jobs? The messages you sent concerning root dictionaries and ps_counts_thresholds are not fatal, they will not cause a failures. You should still confirm with Sasha if the ps_counts_thresholds are important.

The file /work/halld/home/gxproj4/public/ForAlexAndSean/std.out seems to be cut off. We may want to add this option to the config file to remove the number of processed events:

JANA:BATCH_MODE 1

You can find my output files here: /work/halld2/home/aaustreg/Analysis/cpp/REST/

I don't have the logs, but all files were closed at exactly the same time.

Cheers,

Alex

On 6/26/24 08:59, Igal Jaegle wrote:
Alex,

Could you provide the path to your output files and most importantly the logs?

The results of the test on NERSC for the same run are as follows

108 evio files were cooked properly out of 126, thus 15% failure rate which is way too much to proceed with the cooking.

tks ig.
________________________________
From: Alexander Austregesilo <aaustreg at jlab.org><mailto:aaustreg at jlab.org>
Sent: Monday, June 24, 2024 6:06 PM
To: halld-cpp at jlab.org<mailto:halld-cpp at jlab.org> <halld-cpp at jlab.org><mailto:halld-cpp at jlab.org>
Cc: Naomi Jarvis <nsj at cmu.edu><mailto:nsj at cmu.edu>; Igal Jaegle <ijaegle at jlab.org><mailto:ijaegle at jlab.org>; Sean Dobbs <sdobbs at jlab.org><mailto:sdobbs at jlab.org>
Subject: CPP REST production

Dear Colleagues,

I processed one single file of a typical CPP production run on Pb
target. Here is a list of all files and their sizes which will be
produced suring the REST prodution:

1.7G hd_rawdata_101586_000.ps.evio
1.2G dana_rest.hddm
495M hd_rawdata_101586_000.ctof.evio
160M hd_rawdata_101586_000.cpp_2c.evio
115M hd_rawdata_101586_000.random.evio
  93M tree_PSFlux.root
  90M hd_rawdata_101586_000.npp_2pi0.evio
  72M hd_rawdata_101586_000.npp_2g.evio
  49M tree_TPOL.root
  42M converted_random.hddm
  41M hd_root.root
  16M hd_rawdata_101586_000.FCAL-LED.evio
  13M hd_rawdata_101586_000.BCAL-LED.evio
1.1M tree_tof_eff.root
164K tree_fcal_hadronic_eff.root
105K hd_rawdata_101586_000.DIRC-LED.evio
105K hd_rawdata_101586_000.CCAL-LED.evio
  94K hd_rawdata_101586_000.sync.evio
  24K tree_TS_scaler.root
  24K tree_bcal_hadronic_eff.root
  24K syncskim.root

The trigger skims for ps (with thick converter) and ctof are quite
large. Do we actually need them or were they maybe already produced
during the calibration stages?

As far as I understand, Igal has started a test of the REST production
at NERSC. We are getting closer to launch!

Cheers,

Alex

--
Alexander Austregesilo

Staff Scientist - Experimental Nuclear Physics
Thomas Jefferson National Accelerator Facility
Newport News, VA
aaustreg at jlab.org<mailto:aaustreg at jlab.org>
(757) 269-6982

--
Alexander Austregesilo

Staff Scientist - Experimental Nuclear Physics
Thomas Jefferson National Accelerator Facility
Newport News, VA
aaustreg at jlab.org<mailto:aaustreg at jlab.org>
(757) 269-6982

--
Alexander Austregesilo

Staff Scientist - Experimental Nuclear Physics
Thomas Jefferson National Accelerator Facility
Newport News, VA
aaustreg at jlab.org<mailto:aaustreg at jlab.org>
(757) 269-6982

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/halld-cpp/attachments/20240628/95bc54ed/attachment-0001.html>