[LQCD-GPU] New QUDA and paper release
Robert Edwards
edwards at jlab.org
Wed Nov 18 22:10:36 EST 2009
Mike,
Well done! I suspect Chip is singing your praises over there
in Seattle.
Also, I'm glad a paper is out. It'll get cited...
Robert
Mike Clark wrote:
> Greetings,
>
> I thought I might as well post this here. We've finally posted our
> GPU solver paper. It should be listed on the archive tomorrow. To
> coincide with the paper being made public, we've put up a webpage
> where the official QUDA releases can be obtained: http://lattice.bu.edu/quda
>
> On the software front, we've finally solved the partition camping
> problem on the 280/285/1060/1070 that hampered performance of the
> 24^3x128 lattices. I have added a padding parameter to the spinor
> fields which allows one to avoid threads from performing reads through
> the same partition. There's a new parameter sp_pad in the invert
> param struct that must be set: sp_pad=0 is no padding, and a non-zero
> positive value means that the 6x float4 sub-arrays of length XYZT are
> spaced out in memory such that the distance between the start of each
> of consecutive sub-array is (XYZT+sp_pad). If this makes no sense I
> can elaborate.
>
> A sensible value for sp_pad is XYZ/2 or XYZ, i.e., set sp_pad =
> 12*24*24 or 24*24*24 for the 24^3x128 lattice.
>
> In addition we have also fixed the slow performance of clover half
> precision. This fix should bring down the $ / Mflop nicely. This
> isn't enabled by default though, to do so you have to compile the
> dslash_quda.cu file with an extra flag to the nvcc compiler. Add "-
> maxrregcount=80" to the NVCCFLAGS, however, you should not compile the
> other kernels with this flag as this will reduce performance of the
> blas kernels. Probably best to compile up the library with the flag
> included, delete blas_quda.o, remove the maxrregcount flag, and then
> recompile the blas_quda library.
>
> Other changes are that I've removed the blockDim parameter from the
> gauge param struct, as this was a redundant feature anyway.
>
> Balint can you test this new release and report some performance
> numbers please?
>
> I also found a serious bug in the single precision reduction, that
> makes me wonder if this is why Balint couldn't get it to converge
> properly. Probably don't want to use this anyway, but it's good to
> know that it's fixed.
>
> There may be bugs yet, as this was quite a serious change to some of
> the blas kernels. I've tested CG and BiCGstab though, and they seem
> to give the same answer regardless of the sp_pad value.
>
> I readily admit, the code is beginning to creek. A complete rewrite
> is coming real soon now :-)
>
> Cheers,
>
> Mike.
>
>
>
> _______________________________________________
> LQCD-GPU mailing list
> LQCD-GPU at jlab.org
> https://mailman.jlab.org/mailman/listinfo/lqcd-gpu
>
--
Robert G. Edwards
phone: (757) 269 7737 fax: (757) 269 7002
edwards at jlab.org http://www.jlab.org/~edwards
Jefferson Lab
Theory Group, Cebaf Center, Suite 1
12000 Jefferson Avenue
Newport News, Virginia 23606, USA
More information about the LQCD-GPU
mailing list