[Sbs_daq] Big endian raw data?

Sun Oct 3 22:52:25 EDT 2021

I disassembled the optimized x86-64 code that gcc 4.8 emits on Linux. As 
a pleasant surprise, the optimizer detects the pattern of shift 
operations used in EVIO for byte-swapping and collapses it into a single 
bswap opcode (or a rol operation for 16 bit). The optimized code does 
look efficient. Impressive.

The performance of the optimized EVIO routines is good. I think the days 
when byte-swapping was a major bottleneck (it was in 2000!) can be 
considered gone indeed. Of course, it would be even better if we'd 
vectorize swapping for large buffers, which typically leads to speed 
gains of 5x – 20x according to benchmarks posted online. But it's 
already (almost) good enough. I'll put some numbers together.

So while performance is not a major issue, obviously we don't want to 
create unnecessary work, and moreover, I think it is important to have a 
definitive and consistent data format. Raw data whose endianness is 
variable, possibly requiring external documentation to understand, is 
not something to aim for. To underscore: I have not seen any sort of 
"endian flag" in any of the EVIO headers. If there are such flags, the 
EVIO C-library does not use them at all. Maybe someone can point me to 
documentation where these bits are hiding? Or do we need to use the EVIO 
C++ library to get support for endianness flags?

Ole

On 3.10.21 at 14:57, Ole Hansen wrote:
> Yes, x86 has byte-swap opcodes. But EVIO isn't using them. Using those 
> instructions greatly alleviates the CPU cost. I wrote an 
> assembly-optimized version of the byte-swapping code in the early 
> 2000s, which I can resurrect, although that was 32-bit assembly, not 
> even using MMX, let alone SSE instructions. Somehow I would think the 
> EVIO library should include such optimizations out of the box, like 
> good video codecs do.
>
> Ole
>
> On 3.10.21 at 13:45, Benjamin Raydo wrote:
>> Hmm, Dave Abbott can comment on this...but there is an endianess flag 
>> in the EVIO structure that we should be setting to indicate this - we 
>> may need to check that we are consistent about this use. Anyhow, does 
>> x86 have a CPU instruction that can do this swap for you - so it is 
>> really that much CPU power?
>>
>> Ben
>> ------------------------------------------------------------------------
>> *From:* Sbs_daq <sbs_daq-bounces at jlab.org> on behalf of Alexandre 
>> Camsonne <camsonne at jlab.org>
>> *Sent:* Sunday, October 3, 2021 1:19 PM
>> *To:* Ole Hansen <ole at jlab.org>
>> *Cc:* sbs_daq at jlab.org <sbs_daq at jlab.org>
>> *Subject:* [Sbs_daq] [EXTERNAL] Re: Big endian raw data?
>> I think we might be able to choose.
>>
>> Though now we use mostly intel CPU unless it breaks any software 
>> sounds  like little endian would be more efficient. Not sure 
>> endianness of VTP, it is an ARM processor is it Big Endian ?
>>
>> Alexandre
>>
>>
>> On Sun, Oct 3, 2021, 13:06 Ole Hansen <ole at jlab.org 
>> <mailto:ole at jlab.org>> wrote:
>>
>>     Maybe our various front-ends differ in endianness, so we write
>>     mixed-endian data?!? That would be disastrous since it is not
>>     supported by EVIO. A file can only be one or the other—a very
>>     binary view. (I guess EVIO was written before we became
>>     diversity-aware ;) ).
>>
>>     Ole
>>
>>     On 3.10.21 at 13:03, Andrew Puckett wrote:
>>>
>>>     Hi Ole,
>>>
>>>     This is interesting. The GRINCH data are being read out by the
>>>     new VETROC modules, I don’t know if they differ from the other
>>>     modules in terms of “endian-ness”. Maybe a DAQ expert can weigh
>>>     in here?
>>>
>>>     Andrew
>>>
>>>     *From: *Sbs_daq <sbs_daq-bounces at jlab.org>
>>>     <mailto:sbs_daq-bounces at jlab.org> on behalf of Ole Hansen
>>>     <ole at jlab.org> <mailto:ole at jlab.org>
>>>     *Date: *Sunday, October 3, 2021 at 1:00 PM
>>>     *To: *sbs_daq at jlab.org <mailto:sbs_daq at jlab.org>
>>>     <sbs_daq at jlab.org> <mailto:sbs_daq at jlab.org>
>>>     *Subject: *[Sbs_daq] Big endian raw data?
>>>
>>>     Hi guys,
>>>
>>>     Bradley reported a crash of the replay (actually in EVIO) with
>>>     /adaq1/data1/sbs/grinch_72.evio.0 (see
>>>     https://logbooks.jlab.org/entry/3916105
>>>     <https://logbooks.jlab.org/entry/3916105>).
>>>
>>>     When digging into the cause of this crash, I discovered that
>>>     these raw data are written in big-endian format. How can this
>>>     be? I thought the front-ends are Intel processors. Are we taking
>>>     data with ARM chips that are configured for big-endian mode? Is
>>>     this a mistake, or is there some plan to it?
>>>
>>>     These big-endian data have to be byte-swapped when processing
>>>     them on x86, which is what all our compute nodes run. That's a
>>>     LOT of work. It leads to significant and seemingly completely
>>>     unnecessary overhead. I.e. we're burning CPU cycles for nothing
>>>     good, it seems.
>>>
>>>     Please explain.
>>>
>>>     Ole
>>>
>>
>>     _______________________________________________
>>     Sbs_daq mailing list
>>     Sbs_daq at jlab.org <mailto:Sbs_daq at jlab.org>
>>     https://mailman.jlab.org/mailman/listinfo/sbs_daq
>>     <https://mailman.jlab.org/mailman/listinfo/sbs_daq>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/sbs_daq/attachments/20211003/45db4807/attachment.html>