<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Hi All,<div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>coincidentally (or not) Chip walked into my office this morning to talk about this topic and the CLAS12 job workflow in general. It may be a small compute load but it is a significant I/O load because of the small files. I said that we will discuss it internally. I think it would be very useful to come up with a clear plan since there seem to be several options and it is bothering people. </div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>Regards,</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>Graham<br class=""><div><br class=""><blockquote type="cite" class=""><div class="">On Jul 3, 2018, at 11:20, Vardan Gyurjyan <<a href="mailto:gurjyan@jlab.org" class="">gurjyan@jlab.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="content-type" content="text/html; charset=utf-8" class=""><div dir="auto" class="">Hi,<div class="">I want to mention that decoding is outside of the Clara framework. I do not see a reason not to perform decoding within the framework that will provide multithreading and lean memory footprint. </div><div class="">Best,</div><div class="">Vardan<br class=""><br class=""><div class="">Sent from my iPhone</div><div class=""><br class="">On Jul 3, 2018, at 5:05 AM, Andrew Puckett <<a href="mailto:puckett@jlab.org" class="">puckett@jlab.org</a>> wrote:<br class=""><br class=""></div><blockquote type="cite" class=""><div class=""><meta http-equiv="content-type" content="text/html; charset=utf-8" class="">Thanks for the clarification, if it is a small fraction of the CLAS12 computing footprint as you say, then I agree that it probably doesn’t have a large effect on the other halls.<div class=""><br class=""></div><div class="">Cheers,</div><div class="">Andrew<br class=""><br class=""><div class=""><span style="background-color: rgba(255, 255, 255, 0);" class=""><a href="http://puckett.physics.uconn.edu/" class="">puckett.physics.uconn.edu</a></span></div><div class=""><br class="">On Jul 2, 2018, at 8:52 PM, Francois-Xavier Girod <<a href="mailto:fxgirod@jlab.org" class="">fxgirod@jlab.org</a>> wrote:<br class=""><br class=""></div><blockquote type="cite" class=""><div class=""><div dir="ltr" class="">Dear Andrew<br class=""><div class=""><br class=""></div><div class="">As I said, the issue with the decoding jobs memory footprint is that the CC monitors virtual memory as well as physical memory, and I do not think this issue applies to Geant (C++) jobs. </div><div class=""><br class=""></div><div class="">We had numerous discussion with the CC about this, and they insist to monitor virtual memory. This is (a bit) less crucial for CLAS12 recon multithreaded jobs, which take most or all of the individual nodes they run on, and comprise the large bulk of CLAS12 resources compared to decoding, so I do not think this is a serious for the other halls actually.</div><div class=""><br class=""></div><div class="">Best regards</div><div class="">FX</div></div><br class=""><div class="gmail_quote"><div dir="ltr" class="">On Tue, Jul 3, 2018 at 9:41 AM Andrew Puckett <<a href="mailto:puckett@jlab.org" class="">puckett@jlab.org</a>> wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="auto" class="">Hi FX, <div class=""><br class=""></div><div class="">That would seem to be inconsistent with these complaint I got a few years ago from Sandy Philpott about “excessive” memory usage of my single-threaded GEANT4 simulation jobs: </div><div class=""><br class=""></div><div class="">“<span style="background-color:rgba(255,255,255,0)" class="">Hi Andrew,<br class=""><br class="">We're trying to understand the 8GB memory requirement of your SBS farm jobs... The farm nodes are configured at 32 GB RAM for 24 cores, so 1.5 GB per core. Since your jobs request so much memory, it blocks other jobs' access to the systems. Why are these jobs such an exception to the farm job norm -- why do they need so much memory for just a single core job? Can your code run on multiple cores and use memory more efficiently?<br class=""><br class="">All insight helpful and appreciated, as other users' jobs are backed up in the queue although many cores sit idle.<br class=""><br class="">Regards,<br class="">Sandy”</span></div><div class=""><br class=""></div><div class="">What makes CLAS12 decoding jobs different in this regard? Unless something has changed since then? This question should probably be looked into at a higher level. If in fact these decode jobs are de facto forcing 5/6 cores per job to sit idle, that would be a significant issue for scientific computing for all four halls.</div><div class=""><br class=""></div><div class="">Best regards,</div><div class="">Andrew<br class=""><br class=""><div id="m_-7470791540029305082AppleMailSignature" class=""><span style="background-color:rgba(255,255,255,0)" class=""><a href="http://puckett.physics.uconn.edu/" target="_blank" class="">puckett.physics.uconn.edu</a></span></div><div class=""><br class="">On Jul 2, 2018, at 8:23 PM, Francois-Xavier Girod <<a href="mailto:fxgirod@jlab.org" target="_blank" class="">fxgirod@jlab.org</a>> wrote:<br class=""><br class=""></div><blockquote type="cite" class=""><div class=""><div dir="ltr" class="">The decoding is not multithreaded. The decoding of several evio files merged into one hipo file currently does require this much memory, however this is largely due to the CC insisting on monitoring virtual memory allocation. None of this is a problem, those are all features.<br class=""><div class=""><br class=""></div><div class="">That being said, I do not think those jobs grab 6 cores. They only need and only use one.</div></div><br class=""><div class="gmail_quote"><div dir="ltr" class="">On Tue, Jul 3, 2018 at 9:18 AM Andrew Puckett <<a href="mailto:puckett@jlab.org" target="_blank" class="">puckett@jlab.org</a>> wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="auto" class="">The much bigger problem that I thought the farm admins would’ve noticed first appears to be that a single-core job is requesting 9 GB memory allocation, which is wildly inefficient given the batch farm architecture of approximately 1.5 GB per core. Unless I’ve misunderstood something, or unless the job is actually multithreaded even though it appears not to be, each job with those parameters will grab six cores while using only one, causing five cores to sit idle ***per job***. While I am by no means an expert, I thought that the CLAS12 software framework was supposed to be fully multi-thread capable? I only bring it up as someone with a vested interest in the efficient use of the JLab scientific computing facilities...<br class=""><br class=""><div id="m_-7470791540029305082m_-8296534818626272229AppleMailSignature" class=""><span style="background-color:rgba(255,255,255,0)" class=""><a href="http://puckett.physics.uconn.edu/" target="_blank" class="">puckett.physics.uconn.edu</a></span></div><div class=""><br class="">On Jul 2, 2018, at 7:03 PM, Francois-Xavier Girod <<a href="mailto:fxgirod@jlab.org" target="_blank" class="">fxgirod@jlab.org</a>> wrote:<br class=""><br class=""></div><blockquote type="cite" class=""><div class=""><div class=""><div dir="auto" class="">The I/O to those jobs is defined as per CC guidelines, there is no “small” I/O the hippo files are about 5 GB merging 10 evio files together. I think the CC need to have better diagnostics. </div></div><div class=""><br class=""><div class="gmail_quote"><div dir="ltr" class="">On Tue, Jul 3, 2018 at 7:56 AM Harout Avakian <<a href="mailto:avakian@jlab.org" target="_blank" class="">avakian@jlab.org</a>> wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF" class=""><p class="">FYI</p><p class="">I understood that was fixed. FX could you please check what is
the problem.<br class="">
</p>
<div class="m_-7470791540029305082m_-8296534818626272229m_-5789663251165737513moz-forward-container">Harut<br class="">
<br class="">
-------- Forwarded Message --------
<table class="m_-7470791540029305082m_-8296534818626272229m_-5789663251165737513moz-email-headers-table" border="0" cellspacing="0" cellpadding="0">
<tbody class="">
<tr class="">
<th valign="BASELINE" align="RIGHT" nowrap="" class="">Subject:
</th>
<td class="">class12-2 jobs performing lots of small i/o</td>
</tr>
<tr class="">
<th valign="BASELINE" align="RIGHT" nowrap="" class="">Date: </th>
<td class="">Mon, 2 Jul 2018 08:58:04 -0400 (EDT)</td>
</tr>
<tr class="">
<th valign="BASELINE" align="RIGHT" nowrap="" class="">From: </th>
<td class="">Kurt Strosahl <a class="m_-7470791540029305082m_-8296534818626272229m_-5789663251165737513moz-txt-link-rfc2396E" href="mailto:strosahl@jlab.org" target="_blank"><strosahl@jlab.org></a></td>
</tr>
<tr class="">
<th valign="BASELINE" align="RIGHT" nowrap="" class="">To: </th>
<td class="">Harut Avagyan <a class="m_-7470791540029305082m_-8296534818626272229m_-5789663251165737513moz-txt-link-rfc2396E" href="mailto:avakian@jlab.org" target="_blank"><avakian@jlab.org></a></td>
</tr>
<tr class="">
<th valign="BASELINE" align="RIGHT" nowrap="" class="">CC: </th>
<td class="">sciops <a class="m_-7470791540029305082m_-8296534818626272229m_-5789663251165737513moz-txt-link-rfc2396E" href="mailto:sciops@jlab.org" target="_blank"><sciops@jlab.org></a></td>
</tr>
</tbody>
</table>
<br class="">
<br class="">
<pre class="">Harut,
There are a large number of clas12 jobs running through the farm under user clas12-2, these jobs are performing lots of small i/o.
An example of one of these jobs is:
Job Index: 55141495
User Name: clas12-2
Job Name: R4013_13
Project: clas12
Queue: prod64
Hostname: farm12021
CPU Req: 1 centos7 core requested
MemoryReq: 9 GB
Status: ACTIVE
You can see the small i/o by looking: <a class="m_-7470791540029305082m_-8296534818626272229m_-5789663251165737513moz-txt-link-freetext" href="https://scicomp.jlab.org/scicomp/index.html#/lustre/users" target="_blank">https://scicomp.jlab.org/scicomp/index.html#/lustre/users</a>
w/r,
Kurt J. Strosahl
System Administrator: Lustre, HPC
Scientific Computing Group, Thomas Jefferson National Accelerator Facility
</pre>
</div>
</div>
_______________________________________________<br class="">
Clas12_software mailing list<br class="">
<a href="mailto:Clas12_software@jlab.org" target="_blank" class="">Clas12_software@jlab.org</a><br class="">
<a href="https://mailman.jlab.org/mailman/listinfo/clas12_software" rel="noreferrer" target="_blank" class="">https://mailman.jlab.org/mailman/listinfo/clas12_software</a></blockquote></div></div>
</div></blockquote><blockquote type="cite" class=""><div class=""><span class="">_______________________________________________</span><br class=""><span class="">Clas12_software mailing list</span><br class=""><span class=""><a href="mailto:Clas12_software@jlab.org" target="_blank" class="">Clas12_software@jlab.org</a></span><br class=""><span class=""><a href="https://mailman.jlab.org/mailman/listinfo/clas12_software" target="_blank" class="">https://mailman.jlab.org/mailman/listinfo/clas12_software</a></span></div></blockquote></div></blockquote></div>
</div></blockquote></div></div></blockquote></div>
</div></blockquote></div></div></blockquote><blockquote type="cite" class=""><div class=""><span class="">_______________________________________________</span><br class=""><span class="">Clas12_software mailing list</span><br class=""><span class=""><a href="mailto:Clas12_software@jlab.org" class="">Clas12_software@jlab.org</a></span><br class=""><span class=""><a href="https://mailman.jlab.org/mailman/listinfo/clas12_software" class="">https://mailman.jlab.org/mailman/listinfo/clas12_software</a></span></div></blockquote></div></div>_______________________________________________<br class="">Clas12_software mailing list<br class=""><a href="mailto:Clas12_software@jlab.org" class="">Clas12_software@jlab.org</a><br class="">https://mailman.jlab.org/mailman/listinfo/clas12_software</div></blockquote></div><br class=""></div></body></html>