[Halld-offline] effects of CCal mutex on scaling in jana

Thu Mar 7 09:21:34 EST 2019

Hi Richard,

  I have not done a scaling test with the mutex protected fortran code. You are of course correct that
this could eat into our efficiency when running with many threads. I think we will have a little time
though before we need to worry about it. The point in the code where the mutex is locked comes after
a check that there are a minimum number of DCCALHit objects. Since these only exist in the PrimEx
data at the moment, it will not affect the scaling when reconstructing GlueX data taken up to now.
If the CCAL is read out in the Fall GlueX run, we will need to worry about it then, but probably not too
much before we do a recon launch. This is because all other launches tend to use a smaller number of
threads(<=12).

  I’ve added a scaling test to my list so we can see if this is an issue. The best  we’ll be able to do though
is time the serial code and estimate the additional inefficiency since we don’t have any data with both
the CCAL and tracking chambers on with field. (Though I suppose you could simulate some for me ;) )

Regards,
-David

-------------------------------------------------------------
David Lawrence Ph.D.
Staff Scientist, Thomas Jefferson National Accelerator Facility
Newport News, VA
davidl at jlab.org
(757) 269-5567 W
(757) 746-6697 C

> On Mar 7, 2019, at 9:03 AM, Sean Dobbs <sdobbs at fsu.edu> wrote:
>
> Richard,
>
> Some anecdotal numbers:  Running with a few plugins for calibration &
> compton analysis gives a reconstruction rate ~400 Hz with one thread.
> Both 15 and 30 threads give ~2.8kHz.  However, when we are running a
> monitoring launch, the limiting factor is filling all of the
> histograms, so the scaling is better (but the rate lower).  But
> pressure should still be applied to remove this Fortran code from
> halld_recon.
>
> ---Sean
>
> On Thu, Mar 7, 2019 at 8:38 AM Richard Jones <richard.t.jones at uconn.edu> wrote:
>>
>> Hello David,
>>
>> Do you have any measurements of the multi-thread scaling of our reconstruction code with this CCal mutex in place? Prior to this, you showed this very nice scaling up to >100 cores in this plot that was shown at the November review and elsewhere. We need to keep Amdahl's Law in mind here, which says that at some point it is the single-threaded piece of the code that limits performance, even if it is a relatively minor fraction of the total event processing. If there is a substantial change in the scaling behavior of our reconstruction code coming from this CCal mutex, it will give an idea how urgent the need is to multithread this portion of our codebase, where "multithreading" does not mean merely mutexing it.
>>
>> Knowing how everyone has more than they can handle on their plate, this tedious job of rewriting existing code (I know, I have done it plenty of times) may never get done. It would be helpful to know how much that matters, if at all.
>>
>> -Richard Jones
>> _______________________________________________
>> Halld-offline mailing list
>> Halld-offline at jlab.org
>> https://mailman.jlab.org/mailman/listinfo/halld-offline