<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html;

      charset=windows-1252">

  </head>

  <body>

    I disassembled the optimized x86-64 code that gcc 4.8 emits on

    Linux. As a pleasant surprise, the optimizer detects the pattern of

    shift operations used in EVIO for byte-swapping and collapses it

    into a single bswap opcode (or a rol operation for 16 bit). The

    optimized code does look efficient. Impressive. <br>

    <br>

    The performance of the optimized EVIO routines is good. I think the

    days when byte-swapping was a major bottleneck (it was in 2000!) can

    be considered gone indeed. Of course, it would be even better if

    we'd vectorize swapping for large buffers, which typically leads to

    speed gains of 5x – 20x according to benchmarks posted online. But

    it's already (almost) good enough. I'll put some numbers together.<br>

    <br>

    So while performance is not a major issue, obviously we don't want

    to create unnecessary work, and moreover, I think it is important to

    have a definitive and consistent data format. Raw data whose

    endianness is variable, possibly requiring external documentation to

    understand, is not something to aim for. To underscore: I have not

    seen any sort of "endian flag" in any of the EVIO headers. If there

    are such flags, the EVIO C-library does not use them at all. Maybe

    someone can point me to documentation where these bits are hiding?

    Or do we need to use the EVIO C++ library to get support for

    endianness flags?<br>

    <br>

    Ole<br>

    <br>

    <div class="moz-cite-prefix">On 3.10.21 at 14:57, Ole Hansen wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:8d7feb8a-2df3-db36-f81f-13cc2ba49d8f@jlab.org">

      <meta http-equiv="Content-Type" content="text/html;

        charset=windows-1252">

      Yes, x86 has byte-swap opcodes. But EVIO isn't using them. Using

      those instructions greatly alleviates the CPU cost. I wrote an

      assembly-optimized version of the byte-swapping code in the early

      2000s, which I can resurrect, although that was 32-bit assembly,

      not even using MMX, let alone SSE instructions. Somehow I would

      think the EVIO library should include such optimizations out of

      the box, like good video codecs do.<br>

      <br>

      Ole<br>

      <br>

      <div class="moz-cite-prefix">On 3.10.21 at 13:45, Benjamin Raydo

        wrote:<br>

      </div>

      <blockquote type="cite"

cite="mid:MN2PR09MB57561524563EC5EC864D0F43A8AD9@MN2PR09MB5756.namprd09.prod.outlook.com">

        <meta http-equiv="Content-Type" content="text/html;

          charset=windows-1252">

        <style type="text/css" style="display:none;">P {margin-top:0;margin-bottom:0;}</style>

        <div style="font-family: Calibri, Arial, Helvetica, sans-serif;

          font-size: 12pt; color: rgb(0, 0, 0);"> Hmm, Dave Abbott can

          comment on this...but there is an endianess flag in the EVIO

          structure that we should be setting to indicate this - we may

          need to check that we are consistent about this use. Anyhow,

          does x86 have a CPU instruction that can do this swap for you

          - so it is really that much CPU power?<br>

        </div>

        <div style="font-family: Calibri, Arial, Helvetica, sans-serif;

          font-size: 12pt; color: rgb(0, 0, 0);"> <br>

        </div>

        <div style="font-family: Calibri, Arial, Helvetica, sans-serif;

          font-size: 12pt; color: rgb(0, 0, 0);"> Ben<br>

        </div>

        <hr style="display:inline-block;width:98%" tabindex="-1">

        <div id="divRplyFwdMsg" dir="ltr"><font style="font-size:11pt"

            face="Calibri, sans-serif" color="#000000"><b>From:</b>

            Sbs_daq <a class="moz-txt-link-rfc2396E"

              href="mailto:sbs_daq-bounces@jlab.org"

              moz-do-not-send="true"><sbs_daq-bounces@jlab.org></a>

            on behalf of Alexandre Camsonne <a

              class="moz-txt-link-rfc2396E"

              href="mailto:camsonne@jlab.org" moz-do-not-send="true"><camsonne@jlab.org></a><br>

            <b>Sent:</b> Sunday, October 3, 2021 1:19 PM<br>

            <b>To:</b> Ole Hansen <a class="moz-txt-link-rfc2396E"

              href="mailto:ole@jlab.org" moz-do-not-send="true"><ole@jlab.org></a><br>

            <b>Cc:</b> <a class="moz-txt-link-abbreviated"

              href="mailto:sbs_daq@jlab.org" moz-do-not-send="true">sbs_daq@jlab.org</a>

            <a class="moz-txt-link-rfc2396E"

              href="mailto:sbs_daq@jlab.org" moz-do-not-send="true"><sbs_daq@jlab.org></a><br>

            <b>Subject:</b> [Sbs_daq] [EXTERNAL] Re: Big endian raw

            data?</font>

          <div> </div>

        </div>

        <div>

          <div dir="auto">

            <div>I think we might be able to choose. 

              <div dir="auto"><br>

              </div>

              <div dir="auto">Though now we use mostly intel CPU unless

                it breaks any software sounds  like little endian would

                be more efficient. Not sure endianness of VTP, it is an

                ARM processor is it Big Endian ?</div>

              <div dir="auto"><br>

              </div>

              <div dir="auto">Alexandre</div>

              <br>

              <br>

              <div class="x_gmail_quote">

                <div dir="ltr" class="x_gmail_attr">On Sun, Oct 3, 2021,

                  13:06 Ole Hansen <<a href="mailto:ole@jlab.org"

                    moz-do-not-send="true">ole@jlab.org</a>> wrote:<br>

                </div>

                <blockquote class="x_gmail_quote" style="margin:0 0 0

                  .8ex; border-left:1px #ccc solid; padding-left:1ex">

                  <div>Maybe our various front-ends differ in

                    endianness, so we write mixed-endian data?!? That

                    would be disastrous since it is not supported by

                    EVIO. A file can only be one or the other—a very

                    binary view. (I guess EVIO was written before we

                    became diversity-aware ;) ).<br>

                    <br>

                    Ole<br>

                    <br>

                    <div>On 3.10.21 at 13:03, Andrew Puckett wrote:<br>

                    </div>

                    <blockquote type="cite">

                      <div>

                        <p class="x_MsoNormal">Hi Ole, </p>

                        <p class="x_MsoNormal"> </p>

                        <p class="x_MsoNormal">This is interesting. The

                          GRINCH data are being read out by the new

                          VETROC modules, I don’t know if they differ

                          from the other modules in terms of

                          “endian-ness”. Maybe a DAQ expert can weigh in

                          here?</p>

                        <p class="x_MsoNormal"> </p>

                        <p class="x_MsoNormal">Andrew </p>

                        <p class="x_MsoNormal"> </p>

                        <div style="border:none; border-top:solid

                          #b5c4df 1.0pt; padding:3.0pt 0in 0in 0in">

                          <p class="x_MsoNormal"

                            style="margin-bottom:12.0pt"><b><span

                                style="font-size:12.0pt; color:black">From:

                              </span></b><span style="font-size:12.0pt;

                              color:black">Sbs_daq <a

                                href="mailto:sbs_daq-bounces@jlab.org"

                                target="_blank" rel="noreferrer"

                                moz-do-not-send="true">

                                <sbs_daq-bounces@jlab.org></a> on

                              behalf of Ole Hansen <a

                                href="mailto:ole@jlab.org"

                                target="_blank" rel="noreferrer"

                                moz-do-not-send="true">

                                <ole@jlab.org></a><br>

                              <b>Date: </b>Sunday, October 3, 2021 at

                              1:00 PM<br>

                              <b>To: </b><a

                                href="mailto:sbs_daq@jlab.org"

                                target="_blank" rel="noreferrer"

                                moz-do-not-send="true">sbs_daq@jlab.org</a>

                              <a href="mailto:sbs_daq@jlab.org"

                                target="_blank" rel="noreferrer"

                                moz-do-not-send="true"><sbs_daq@jlab.org></a><br>

                              <b>Subject: </b>[Sbs_daq] Big endian raw

                              data?</span></p>

                        </div>

                        <p class="x_MsoNormal"

                          style="margin-bottom:12.0pt">Hi guys,<br>

                          <br>

                          Bradley reported a crash of the replay

                          (actually in EVIO) with

                          /adaq1/data1/sbs/grinch_72.evio.0 (see <a

                            href="https://logbooks.jlab.org/entry/3916105"

                            target="_blank" rel="noreferrer"

                            moz-do-not-send="true">

                            https://logbooks.jlab.org/entry/3916105</a>).<br>

                          <br>

                          When digging into the cause of this crash, I

                          discovered that these raw data are written in

                          big-endian format. How can this be? I thought

                          the front-ends are Intel processors. Are we

                          taking data with ARM chips that are configured

                          for big-endian mode? Is this a mistake, or is

                          there some plan to it?<br>

                          <br>

                          These big-endian data have to be byte-swapped

                          when processing them on x86, which is what all

                          our compute nodes run. That's a LOT of work.

                          It leads to significant and seemingly

                          completely unnecessary overhead. I.e. we're

                          burning CPU cycles for nothing good, it seems.<br>

                          <br>

                          Please explain.<br>

                          <br>

                          Ole</p>

                      </div>

                    </blockquote>

                    <br>

                  </div>

                  _______________________________________________<br>

                  Sbs_daq mailing list<br>

                  <a href="mailto:Sbs_daq@jlab.org" target="_blank"

                    rel="noreferrer" moz-do-not-send="true">Sbs_daq@jlab.org</a><br>

                  <a

                    href="https://mailman.jlab.org/mailman/listinfo/sbs_daq"

                    rel="noreferrer noreferrer" target="_blank"

                    moz-do-not-send="true">https://mailman.jlab.org/mailman/listinfo/sbs_daq</a><br>

                </blockquote>

              </div>

            </div>

          </div>

        </div>

      </blockquote>

      <br>

    </blockquote>

    <br>

  </body>

</html>