[Jlab-scicomp-briefs] Change khugepaged on CentOS7 farm nodes today

Sandy Philpott philpott at jlab.org
Wed Dec 14 10:04:47 EST 2016


Dear JLab Batch Farm Users,

We're planning to make a change on the batch farm CentOS7 nodes today...  

CLAS12 has seen problems on the CentOS 7 farm nodes with their reconstruction, where the khugepaged process consumes large CPU resources and interferes with their jobs.  This process manages huge pages for backing virtual memory, and with the CLAS12 java jobs has a negative impact. On a test node they got more than 10x improvement with khugepaged in "madvise" mode rather than "always".  Note that we've already seen this problem in the past year on the HPC clusters, and have changed its running from "always" to "madvise" on those systems.  We'll do the same today on the farm, and believe it won't have negative impact on other jobs that don't see this problem at all. 

Here's the kernel.org explanation...
 https://www.kernel.org/doc/Documentation/vm/transhuge.txt

So, today at 1pm we'll roll this change out on all CentOS7 nodes, expecting that it will help CLAS12 reconstruction and hurt none.  If that is not the case or anyone believes their jobs need this set differently, please let me and/or your Hall Computing Coordinator know, and we'll revisit the impact of this change.  We'll be watching...

Regards,
Sandy



More information about the Jlab-scicomp-briefs mailing list