[LQCD-GPU] New QUDA and paper release

Robert Edwards edwards at jlab.org
Wed Nov 18 22:10:36 EST 2009


Mike,
  Well done! I suspect Chip is singing your praises over there
in Seattle.

  Also, I'm glad a paper is out. It'll get cited...

             Robert


Mike Clark wrote:
> Greetings,
>
> I thought I might as well post this here.  We've finally posted our  
> GPU solver paper. It should be listed on the archive tomorrow.  To  
> coincide with the paper being made public, we've put up a webpage  
> where the official QUDA releases can be obtained: http://lattice.bu.edu/quda
>
> On the software front, we've finally solved the partition camping  
> problem on the 280/285/1060/1070 that hampered performance of the  
> 24^3x128 lattices.  I have added a padding parameter to the spinor  
> fields which allows one to avoid threads from performing reads through  
> the same partition.  There's a new parameter sp_pad in the invert  
> param struct that must be set:  sp_pad=0 is no padding, and a non-zero  
> positive value means that the 6x float4 sub-arrays of length XYZT are  
> spaced out in memory such that the distance between the start of each  
> of consecutive sub-array is (XYZT+sp_pad).  If this makes no sense I  
> can elaborate.
>
> A sensible value for sp_pad is XYZ/2 or XYZ, i.e., set sp_pad =  
> 12*24*24 or 24*24*24 for the 24^3x128 lattice.
>
> In addition we have also fixed the slow performance of clover half  
> precision.  This fix should bring down the $ / Mflop nicely.  This  
> isn't enabled by default though, to do so you have to compile the  
> dslash_quda.cu file with an extra flag to the nvcc compiler.  Add "- 
> maxrregcount=80" to the NVCCFLAGS, however, you should not compile the  
> other kernels with this flag as this will reduce performance of the  
> blas kernels.  Probably best to compile up the library with the flag  
> included, delete blas_quda.o, remove the maxrregcount flag, and then  
> recompile the blas_quda library.
>
> Other changes are that I've removed the blockDim parameter from the  
> gauge param struct, as this was a redundant feature anyway.
>
> Balint can you test this new release and report some performance  
> numbers please?
>
> I also found a serious bug in the single precision reduction, that  
> makes me wonder if this is why Balint couldn't get it to converge  
> properly.  Probably don't want to use this anyway, but it's good to  
> know that it's fixed.
>
> There may be bugs yet, as this was quite a serious change to some of  
> the blas kernels.  I've tested CG and BiCGstab though, and they seem  
> to give the same answer regardless of the sp_pad value.
>
> I readily admit, the code is beginning to creek.  A complete rewrite  
> is coming real soon now :-)
>
> Cheers,
>
> Mike.
>
>
>
> _______________________________________________
> LQCD-GPU mailing list
> LQCD-GPU at jlab.org
> https://mailman.jlab.org/mailman/listinfo/lqcd-gpu
>   


-- 
Robert G. Edwards                          
phone: (757) 269 7737                      fax:   (757) 269 7002
edwards at jlab.org                           http://www.jlab.org/~edwards 
Jefferson Lab
Theory Group, Cebaf Center, Suite 1
12000 Jefferson Avenue
Newport News, Virginia  23606, USA



More information about the LQCD-GPU mailing list