<html>
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
</head>
<body>
I disassembled the optimized x86-64 code that gcc 4.8 emits on
Linux. As a pleasant surprise, the optimizer detects the pattern of
shift operations used in EVIO for byte-swapping and collapses it
into a single bswap opcode (or a rol operation for 16 bit). The
optimized code does look efficient. Impressive. <br>
<br>
The performance of the optimized EVIO routines is good. I think the
days when byte-swapping was a major bottleneck (it was in 2000!) can
be considered gone indeed. Of course, it would be even better if
we'd vectorize swapping for large buffers, which typically leads to
speed gains of 5x – 20x according to benchmarks posted online. But
it's already (almost) good enough. I'll put some numbers together.<br>
<br>
So while performance is not a major issue, obviously we don't want
to create unnecessary work, and moreover, I think it is important to
have a definitive and consistent data format. Raw data whose
endianness is variable, possibly requiring external documentation to
understand, is not something to aim for. To underscore: I have not
seen any sort of "endian flag" in any of the EVIO headers. If there
are such flags, the EVIO C-library does not use them at all. Maybe
someone can point me to documentation where these bits are hiding?
Or do we need to use the EVIO C++ library to get support for
endianness flags?<br>
<br>
Ole<br>
<br>
<div class="moz-cite-prefix">On 3.10.21 at 14:57, Ole Hansen wrote:<br>
</div>
<blockquote type="cite"
cite="mid:8d7feb8a-2df3-db36-f81f-13cc2ba49d8f@jlab.org">
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
Yes, x86 has byte-swap opcodes. But EVIO isn't using them. Using
those instructions greatly alleviates the CPU cost. I wrote an
assembly-optimized version of the byte-swapping code in the early
2000s, which I can resurrect, although that was 32-bit assembly,
not even using MMX, let alone SSE instructions. Somehow I would
think the EVIO library should include such optimizations out of
the box, like good video codecs do.<br>
<br>
Ole<br>
<br>
<div class="moz-cite-prefix">On 3.10.21 at 13:45, Benjamin Raydo
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:MN2PR09MB57561524563EC5EC864D0F43A8AD9@MN2PR09MB5756.namprd09.prod.outlook.com">
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
<style type="text/css" style="display:none;">P {margin-top:0;margin-bottom:0;}</style>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0);"> Hmm, Dave Abbott can
comment on this...but there is an endianess flag in the EVIO
structure that we should be setting to indicate this - we may
need to check that we are consistent about this use. Anyhow,
does x86 have a CPU instruction that can do this swap for you
- so it is really that much CPU power?<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0);"> <br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0);"> Ben<br>
</div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font style="font-size:11pt"
face="Calibri, sans-serif" color="#000000"><b>From:</b>
Sbs_daq <a class="moz-txt-link-rfc2396E"
href="mailto:sbs_daq-bounces@jlab.org"
moz-do-not-send="true"><sbs_daq-bounces@jlab.org></a>
on behalf of Alexandre Camsonne <a
class="moz-txt-link-rfc2396E"
href="mailto:camsonne@jlab.org" moz-do-not-send="true"><camsonne@jlab.org></a><br>
<b>Sent:</b> Sunday, October 3, 2021 1:19 PM<br>
<b>To:</b> Ole Hansen <a class="moz-txt-link-rfc2396E"
href="mailto:ole@jlab.org" moz-do-not-send="true"><ole@jlab.org></a><br>
<b>Cc:</b> <a class="moz-txt-link-abbreviated"
href="mailto:sbs_daq@jlab.org" moz-do-not-send="true">sbs_daq@jlab.org</a>
<a class="moz-txt-link-rfc2396E"
href="mailto:sbs_daq@jlab.org" moz-do-not-send="true"><sbs_daq@jlab.org></a><br>
<b>Subject:</b> [Sbs_daq] [EXTERNAL] Re: Big endian raw
data?</font>
<div> </div>
</div>
<div>
<div dir="auto">
<div>I think we might be able to choose.
<div dir="auto"><br>
</div>
<div dir="auto">Though now we use mostly intel CPU unless
it breaks any software sounds like little endian would
be more efficient. Not sure endianness of VTP, it is an
ARM processor is it Big Endian ?</div>
<div dir="auto"><br>
</div>
<div dir="auto">Alexandre</div>
<br>
<br>
<div class="x_gmail_quote">
<div dir="ltr" class="x_gmail_attr">On Sun, Oct 3, 2021,
13:06 Ole Hansen <<a href="mailto:ole@jlab.org"
moz-do-not-send="true">ole@jlab.org</a>> wrote:<br>
</div>
<blockquote class="x_gmail_quote" style="margin:0 0 0
.8ex; border-left:1px #ccc solid; padding-left:1ex">
<div>Maybe our various front-ends differ in
endianness, so we write mixed-endian data?!? That
would be disastrous since it is not supported by
EVIO. A file can only be one or the other—a very
binary view. (I guess EVIO was written before we
became diversity-aware ;) ).<br>
<br>
Ole<br>
<br>
<div>On 3.10.21 at 13:03, Andrew Puckett wrote:<br>
</div>
<blockquote type="cite">
<div>
<p class="x_MsoNormal">Hi Ole, </p>
<p class="x_MsoNormal"> </p>
<p class="x_MsoNormal">This is interesting. The
GRINCH data are being read out by the new
VETROC modules, I don’t know if they differ
from the other modules in terms of
“endian-ness”. Maybe a DAQ expert can weigh in
here?</p>
<p class="x_MsoNormal"> </p>
<p class="x_MsoNormal">Andrew </p>
<p class="x_MsoNormal"> </p>
<div style="border:none; border-top:solid
#b5c4df 1.0pt; padding:3.0pt 0in 0in 0in">
<p class="x_MsoNormal"
style="margin-bottom:12.0pt"><b><span
style="font-size:12.0pt; color:black">From:
</span></b><span style="font-size:12.0pt;
color:black">Sbs_daq <a
href="mailto:sbs_daq-bounces@jlab.org"
target="_blank" rel="noreferrer"
moz-do-not-send="true">
<sbs_daq-bounces@jlab.org></a> on
behalf of Ole Hansen <a
href="mailto:ole@jlab.org"
target="_blank" rel="noreferrer"
moz-do-not-send="true">
<ole@jlab.org></a><br>
<b>Date: </b>Sunday, October 3, 2021 at
1:00 PM<br>
<b>To: </b><a
href="mailto:sbs_daq@jlab.org"
target="_blank" rel="noreferrer"
moz-do-not-send="true">sbs_daq@jlab.org</a>
<a href="mailto:sbs_daq@jlab.org"
target="_blank" rel="noreferrer"
moz-do-not-send="true"><sbs_daq@jlab.org></a><br>
<b>Subject: </b>[Sbs_daq] Big endian raw
data?</span></p>
</div>
<p class="x_MsoNormal"
style="margin-bottom:12.0pt">Hi guys,<br>
<br>
Bradley reported a crash of the replay
(actually in EVIO) with
/adaq1/data1/sbs/grinch_72.evio.0 (see <a
href="https://logbooks.jlab.org/entry/3916105"
target="_blank" rel="noreferrer"
moz-do-not-send="true">
https://logbooks.jlab.org/entry/3916105</a>).<br>
<br>
When digging into the cause of this crash, I
discovered that these raw data are written in
big-endian format. How can this be? I thought
the front-ends are Intel processors. Are we
taking data with ARM chips that are configured
for big-endian mode? Is this a mistake, or is
there some plan to it?<br>
<br>
These big-endian data have to be byte-swapped
when processing them on x86, which is what all
our compute nodes run. That's a LOT of work.
It leads to significant and seemingly
completely unnecessary overhead. I.e. we're
burning CPU cycles for nothing good, it seems.<br>
<br>
Please explain.<br>
<br>
Ole</p>
</div>
</blockquote>
<br>
</div>
_______________________________________________<br>
Sbs_daq mailing list<br>
<a href="mailto:Sbs_daq@jlab.org" target="_blank"
rel="noreferrer" moz-do-not-send="true">Sbs_daq@jlab.org</a><br>
<a
href="https://mailman.jlab.org/mailman/listinfo/sbs_daq"
rel="noreferrer noreferrer" target="_blank"
moz-do-not-send="true">https://mailman.jlab.org/mailman/listinfo/sbs_daq</a><br>
</blockquote>
</div>
</div>
</div>
</div>
</blockquote>
<br>
</blockquote>
<br>
</body>
</html>