[Halld-offline] ENP consumption of disk space under /work

Thu Jun 8 11:42:49 EDT 2017

Mark,

Sorry, you don't get off that easy :P.  You have to choose, and not 
choosing means you choose what follows.

I'll offer one more option: we can use 10TB drives instead of 8TB.  
Probably slightly longer rebuild times, but actually higher streaming 
bandwidth.  Cost will go up, maybe a few $K.  I think we can find a way 
to squeeze that out of the system.

Default choice if no one else votes is now this: new /work server is 
procured with 44 drives, 10 TB HGST He10 enterprise drives (top rated, 
new big brother to our He8 drives).  If Physics is poor, we can defer 
buying them the read cache SSD to compensate for the higher price of 
10TB (Graham, can you swing an extra $3K?).

         21 drives for ENP, 21 for LQCD, 2 hot spares in a 44 drive 
enclosure (SSDs in hosts)

         ENP and LQCD each get total quota (enforced) of 108 TB

         GlueX, CLAS-12 each get 44 TB, A and C get 10 TB (enforced)

CLAS-12 says they can live within this, and C believes they are already 
there.  GlueX and A will need to figure out what can be moved into 
/volatile.

If requested, we can give you modest lifetime "pin" for /volatile so 
some large data files that should not go to tape can have longer 
lifetimes in Lustre /volatile than they can today.

Next year, for about $10K we can add a second cascaded JBOD next year 
with one more RAID stripe for ENP.  We'll also need to spend $25K to 
replace the Lustre storage that ages out in FY18 (or suffer the reduction).

regards,

Chip

On 6/8/17 9:40 AM, Mark Ito wrote:
>
> That was just an observation. I would not interpret it as a vote for 
> anything... :-)
>
>
> On 06/08/2017 09:33 AM, Chip Watson wrote:
>>
>> I'll take this as a vote by GlueX to have more work and reduce cache.
>>
>> Do A,B,C concur?
>>
>>
>> On 6/8/17 9:28 AM, Mark Ito wrote:
>>>
>>> In my previous estimate, of the cache portion, 278 TB, only 105 TB 
>>> of that is pinned. The unpinned part is presumeably old files that 
>>> should be gone, but have not been deleted since there happens to be 
>>> no demand for the space. If we use 105 TB as our cache usage then 
>>> re-doing your estimate gives 555 TB, which means in 9 months we will 
>>> have 270 TB of unused space. Which would mean that we have room to 
>>> increase our usage without buying anything!
>>>
>>>
>>> On 06/07/2017 05:54 PM, Chip Watson wrote:
>>>>
>>>> Mark,
>>>>
>>>> I still need you to answer the question of how to further reduce 
>>>> usage and how to configure.  Your usage as you report it is about 
>>>> 370 TB.  Assuming that Hall B needs the same within 9 months, and 
>>>> that A+C need half as much, then that leads to a total of 925TB 
>>>> which is more than Physics owns, by 100 TB (NOT CURRENT USAGE, JUST 
>>>> PROJECTION BASED ON GLUEX USAGE).
>>>>
>>>> There is also the question of how to split the storage budget.  In 
>>>> budget, you can have new half a JBOD: 21 disks configured as 3 RAID 
>>>> z2 stripes of 5+2 disks, 8TB, thus 120 raw data, 108 in a file 
>>>> system, and 86 TB at 80% -- for all of GlueX, CLAS-12, A and C.  If 
>>>> GlueX is 40% of the total, that makes 35TB, and you are still high 
>>>> by 70%.
>>>>
>>>> The other low cost option is to re-purpose a 2016 Lustre node so 
>>>> that /work is twice this size (one full JBOD), and GlueX can use 
>>>> 70TB as /work.  But then you must reduce /cache + /volatile by a 
>>>> comparable amount since we have to pull a node out of production.  
>>>> And this still isn't free since we'll need a total of 4 RAID cards 
>>>> instead of 2 to provide correct performance, and we'll need to add 
>>>> SSD's to the mix.
>>>>
>>>> So, in the absence of money (which clearly seems to be the case), 
>>>> do you choose (a) reduce your use of work by 1.7x, or (b) reduce 
>>>> your use of cache + volatile by 25%. There is no middle case.
>>>>
>>>> thanks,
>>>>
>>>> Chip
>>>>
>>>>
>>>> On 6/7/17 5:30 PM, Mark Ito wrote:
>>>>>
>>>>> Summarizing Hall D work disk usage (/work/halld only):
>>>>>
>>>>> o using du, today 2017-06-06, 59 TB
>>>>>
>>>>> o from our disk-management database, a couple of days ago, 
>>>>> 2017-06-04, 86 TB
>>>>>
>>>>> I also know that one of our students got rid of about 20 TB of 
>>>>> unneeded files yesterday. That accounts for part of the drop.
>>>>>
>>>>> We produce a report from that database 
>>>>> <https://halldweb.jlab.org/disk_management/work_report.html> that 
>>>>> is updated every few days.
>>>>>
>>>>> From the SciComp pages, Hall D is using 287 TB on cache and 21 TB 
>>>>> on volatile.
>>>>>
>>>>> My view is that this level of work disk usage is more or less as 
>>>>> expected, consistent with our previous estimates, and not 
>>>>> particularly abusive. That having been said, I am sure there is a 
>>>>> lot that can be cleaned up. But as Ole pointed out, disk usage 
>>>>> grows naturally and we were not aware that this was a problem. I 
>>>>> seem to recall that we agreed to respond to emails that would be 
>>>>> sent when we reached 90% of too much, no? Was the email sent out?
>>>>>
>>>>> One mystery: when I ask Lustre what we are using I get:
>>>>>
>>>>> ifarm1402:marki:marki>   lfs quota -gh halld /lustre
>>>>> Disk quotas for group halld (gid 267):
>>>>>       Filesystem    used   quota   limit   grace   files   quota   limit   grace
>>>>>          /lustre    290T    470T    500T       - 15106047       0       0       -
>>>>> which is less than cache + volatile, not to mention work. I 
>>>>> thought that to a good approximation this 290 TB should be the sum 
>>>>> of all three. What am I missing?
>>>>>
>>>>> On 05/31/2017 10:35 AM, Chip Watson wrote:
>>>>>> All,
>>>>>>
>>>>>> As I have started on the procurement of the new /work file 
>>>>>> server, I have discovered that Physics' use of /work has grown 
>>>>>> unrestrained over the last year or two.
>>>>>>
>>>>>> "Unrestrained" because there is no way under Lustre to restrain 
>>>>>> it except via a very unfriendly Lustre quota system.  As we leave 
>>>>>> some quota headroom to accommodate large swings in usage for each 
>>>>>> hall for cache and volatile, then /work continues to grow.
>>>>>>
>>>>>> Total /work has now reached 260 TB, several times larger than I 
>>>>>> was anticipating.  This constitutes more than 25% of Physics' 
>>>>>> share of Lustre, compared to LQCD which uses less than 5% of its 
>>>>>> disk space on the un-managed /work.
>>>>>>
>>>>>> It would cost Physics an extra $25K (total $35K - $40K) to treat 
>>>>>> the 260 TB as a requirement.
>>>>>>
>>>>>> There are 3 paths forward:
>>>>>>
>>>>>> (1) Physics cuts its use of /work by a factor of 4-5.
>>>>>> (2) Physics increases funding to $40K
>>>>>> (3) We pull a server out of Lustre, decreasing Physics' share of 
>>>>>> the system, and use that as half of the new active-active pair, 
>>>>>> beefing it up with SSDs and perhaps additional memory; this would 
>>>>>> actually shrink Physics near term costs, but puts higher pressure 
>>>>>> on the file system for the farm
>>>>>>
>>>>>> The decision is clearly Physics', but I do need a VERY FAST 
>>>>>> response to this question, as I need to move quickly now for 
>>>>>> LQCD's needs.
>>>>>>
>>>>>> Hall D + GlueX,  96 TB
>>>>>> CLAS + CLAS12, 98 TB
>>>>>> Hall C,                35 TB
>>>>>> Hall A <unknown, still scanning>
>>>>>>
>>>>>> Email, call (x7101), or drop by today 1:30-3:00 p.m. for discussion.
>>>>>>
>>>>>> thanks,
>>>>>> Chip
>>>>>>
>>>>>
>>>>> -- 
>>>>> Mark Ito,marki at jlab.org, (757)269-5295
>>>>
>>>
>>> -- 
>>> Mark Ito,marki at jlab.org, (757)269-5295
>>
>
> -- 
> Mark Ito,marki at jlab.org, (757)269-5295

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/halld-offline/attachments/20170608/aac3a9d9/attachment-0002.html>