[LQCD-GPU] New QUDA and paper release
Mike Clark
mikec at seas.harvard.edu
Thu Nov 19 13:03:29 EST 2009
> Mike,
> Well done! I suspect Chip is singing your praises over there
> in Seattle.
Cheers. In Portland though, not Seattle!
> Also, I'm glad a paper is out. It'll get cited...
Hope so........
Mike.
> Robert
>
>
> Mike Clark wrote:
>> Greetings,
>>
>> I thought I might as well post this here. We've finally posted our
>> GPU solver paper. It should be listed on the archive tomorrow. To
>> coincide with the paper being made public, we've put up a webpage
>> where the official QUDA releases can be obtained: http://lattice.bu.edu/quda
>>
>> On the software front, we've finally solved the partition camping
>> problem on the 280/285/1060/1070 that hampered performance of the
>> 24^3x128 lattices. I have added a padding parameter to the spinor
>> fields which allows one to avoid threads from performing reads
>> through
>> the same partition. There's a new parameter sp_pad in the invert
>> param struct that must be set: sp_pad=0 is no padding, and a non-
>> zero
>> positive value means that the 6x float4 sub-arrays of length XYZT are
>> spaced out in memory such that the distance between the start of each
>> of consecutive sub-array is (XYZT+sp_pad). If this makes no sense I
>> can elaborate.
>>
>> A sensible value for sp_pad is XYZ/2 or XYZ, i.e., set sp_pad =
>> 12*24*24 or 24*24*24 for the 24^3x128 lattice.
>>
>> In addition we have also fixed the slow performance of clover half
>> precision. This fix should bring down the $ / Mflop nicely. This
>> isn't enabled by default though, to do so you have to compile the
>> dslash_quda.cu file with an extra flag to the nvcc compiler. Add "-
>> maxrregcount=80" to the NVCCFLAGS, however, you should not compile
>> the
>> other kernels with this flag as this will reduce performance of the
>> blas kernels. Probably best to compile up the library with the flag
>> included, delete blas_quda.o, remove the maxrregcount flag, and then
>> recompile the blas_quda library.
>>
>> Other changes are that I've removed the blockDim parameter from the
>> gauge param struct, as this was a redundant feature anyway.
>>
>> Balint can you test this new release and report some performance
>> numbers please?
>>
>> I also found a serious bug in the single precision reduction, that
>> makes me wonder if this is why Balint couldn't get it to converge
>> properly. Probably don't want to use this anyway, but it's good to
>> know that it's fixed.
>>
>> There may be bugs yet, as this was quite a serious change to some of
>> the blas kernels. I've tested CG and BiCGstab though, and they seem
>> to give the same answer regardless of the sp_pad value.
>>
>> I readily admit, the code is beginning to creek. A complete rewrite
>> is coming real soon now :-)
>>
>> Cheers,
>>
>> Mike.
>>
>>
>>
>> _______________________________________________
>> LQCD-GPU mailing list
>> LQCD-GPU at jlab.org
>> https://mailman.jlab.org/mailman/listinfo/lqcd-gpu
>>
>
>
> --
> Robert G. Edwards
> phone: (757) 269 7737 fax: (757) 269 7002
> edwards at jlab.org http://www.jlab.org/
> ~edwards
> Jefferson Lab
> Theory Group, Cebaf Center, Suite 1
> 12000 Jefferson Avenue
> Newport News, Virginia 23606, USA
>
> _______________________________________________
> LQCD-GPU mailing list
> LQCD-GPU at jlab.org
> https://mailman.jlab.org/mailman/listinfo/lqcd-gpu
More information about the LQCD-GPU
mailing list