[clas12_rgk] [EXTERNAL] Re: RG-K PASS1 cooking status

Francois-Xavier Girod fxgirod at jlab.org
Fri Jul 31 17:32:13 EDT 2020


Dear Raffaella

To clarify, I do not think it matters much to RGK whether we get our data
cooking complete in 4 days or even a couple weeks
This is not what I am mainly concerned about right now

I am concerned about how we will manage the situation when 5 or 6 run
groups simultaneously want not only to recook, but also to recalibrate
At that point in time, I think it will be essential that we be able to
better measure and tune the usage of difference accounts

For instance, I would like to know more precisely how many CPU hours were
dedicated last week to RGK vs RGB
I am not aware that there is a simple way to get this information right
now, if there is a tool that can already provide this answer, I would
appreciate if you could point me to it

 At the moment when I check this page for instance
https://scicomp.jlab.org/scicomp/#/usageStat
It gives me the overall farm usage for hallb-pro without any further detail
Same thing for this page
https://scicomp.jlab.org/scicomp/#/jobHistory

Also being able to create such types of graphs may help us better
understand how the scheduler itself functions, and whether it even makes
sense to adjust priorities if a group request it

Best regards
FX

On Fri, Jul 31, 2020 at 10:49 PM Raffaella De Vita <
Raffaella.Devita at ge.infn.it> wrote:

> Dear FX,
> We have the capability monitoring the resources used by individual
> accounts as part of the SciComp tools.  These are tools accessible to
> everyone from theSciComp page. The monitoring is indicating the resources
> used by RG-K over the last week exceed largely what was used by RG-B. I
> don’t think we are lacking in terms of tools here and I’m not sure I
> understand the motivation of this concern.
> We can discuss this further but I’m not sure this aspect is really central
> to decide how the RG-K and RG-B cooking should continue. The plan defined
> by the CCC for the data processing was 1) to run in parallel for about a
> week, while RGK was a still in testing phase, 2) continue with RG-K alone
> until the 50% was reached, 3) continue in parallel till the end. It seems
> to me that we basically skipped 1 because the fairshare algorithm did what
> it is designed to do, we are in 2, and the natural thing would be to
> continue with 3. This would be compliant to the CCC indications and the
> most efficient solution from the technical point of view.
> Obviously the CCC can reconsider the situation and provide different
> indications: I have cc’ed Kyungseon and Marco if they want to comment.
> Best regards,
> Raffaella
>
>
> On 31 Jul 2020, at 16:12, Francois-Xavier Girod <fxgirod at jlab.org> wrote:
>
> Dear Raffaella
>
> > During last week, the RG-B cooking, even if not fully stopped,
> progressed at a very slow rate because the farm fairshare algorithm favored
> the RG-K account that was not used in a while. Because of this, RG-K has
> been basically running alone for most of the time as for example it’s
> happening right now.
>
> Do we have the tools to monitor how many CPU hours were used by what
> account?
> If we do not have these tools, is the only monitoring available the
> throughput accomplished by one group vs another?
>
> It seems to me that this is not a good situation for the collaboration. We
> should anticipate that there will be requests in the future to cook again,
> possibly even RGA, once we have improvement to the software released, such
> as ongoing work on the tracking.
>
> I had understood that the RGK - RGB parallel cooking was a dry run to
> improve our preparedness against these issues. If we are not making
> progress on understanding the details of parallel cooking, it appears
> mysterious what the point of refusing our request. Whether RGK gets 100% of
> the ressources for 4 days or 50% of the resources for the 8 days, or even
> 25% of the resources for 16 days should make 0 difference whatsoever to
> RGB, unless they will be done in less than 16 days.
>
> What is essential however is that we develop better tools to monitor the
> shared usage of resources, because these issues will only get worse.
>
> Best regards
> FX
>
> On Fri, Jul 31, 2020 at 10:05 PM Raffaella De Vita <
> Raffaella.Devita at ge.infn.it> wrote:
>
>> Dear Annalisa,
>> As you mentioned the data processing has been progressing very well and,
>> in about a week, we are already very closed to  50% of whole RG-K with the
>> 7.5 GeV almost completed and the 6.5 GeV already started. At this pace, it
>> looks like the 50% will be reached and exceeded over the weekend.
>>
>> During last week, the RG-B cooking, even if not fully stopped, progressed
>> at a very slow rate because the farm fairshare algorithm favored the RG-K
>> account that was not used in a while. Because of this, RG-K has been
>> basically running alone for most of the time as for example it’s happening
>> right now.
>>
>> Given this and considering the overall status, I think the 50% goal will
>> be exceeded shortly, entering in what was defined as phase3 where both RG-K
>> and RG-B can continue data processing in parallel.
>> Best regards,
>>         Raffaella
>>
>>
>> > On 31 Jul 2020, at 13:57, Annalisa D'Angelo <
>> annalisa.dangelo at roma2.infn.it> wrote:
>> >
>> > Dear Nathan and Raffaella,
>> >
>> > as you may see from the monitoring information at:
>> >
>> > https://clas12mon.jlab.org/files/?RGK7.5GeV
>> >
>> > the pass1 cooking process of 7.5 GeV data has started last Friday night
>> July 24th in parallel with RG-B.
>> >
>> > It been going quite efficiently this week so that we already have
>> produced 88% of RG-K data at 7.5 GeV, corresponding to 44% of the full RG-K
>> set of collected data.
>> >
>> > The agreement was that RG-K would run in parallel with RG-B for one
>> week, and then alone for about another week to obtain 50% of the processed
>> data.
>> >
>> > How should we interpret the next step ?  May we run alone ?
>> >
>> > Thank you for your help and support, in particular to Nathan, who
>> kindly agreed to launch and monitor the 6.5 GeV work flow.
>> >
>> > All the best
>> >
>> > Annalisa
>> >
>> > --
>> >
>> > ================================================
>> > Prof. Annalisa D'Angelo
>> > Dip. Fisica, Universita' di Roma "Tor Vergata"
>> > Istituto Nazionale di Fisica Nucleare
>> > Sezione di Roma Tor Vergata, Rome Italy
>> > email:annalisa.dangelo at roma2.infn.it
>> > Jefferson Laboratory, Newport News, VA USA
>> > Email: annalisa at jlab.org
>> > Tel: + 39 06 72594562
>> >
>> >
>> >
>>
>>
>>
>> _______________________________________________
>> clas12_rgk mailing list
>> clas12_rgk at jlab.org
>> https://mailman.jlab.org/mailman/listinfo/clas12_rgk
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/clas12_rgk/attachments/20200731/49fd67d0/attachment-0001.html>


More information about the clas12_rgk mailing list