[Halld-offline] [Halld-physics] Next Data Challenge...

Mark M. Ito marki at jlab.org
Mon Jul 2 09:36:20 EDT 2012


David et al.,

I did not mean to imply that there should be two data challenges.

1) one challenge is enough to start with
2) the real-data challenge involves involves simulation and 
reconstruction and the simulated-data challenge involves simulation and 
reconstruction (since we do not have real data), I'm not sure that I see 
the essential difference

To me the "challenge" part of the data challenge is principally creating 
a _system_ to do large scale things. We know we can do each of the 
individual parts: simulate, smear, reconstruct, make histograms, do 
fits, etc. at some level. And we can even to it on a large-ish scale if 
we really had to. To get a day's worth of statistics we just repeat what 
Jake did 8 times. And that is the point, we do not want to have Jake do 
it 8 times, notwithstanding the fine young man that he is. We want a 
system. And then we want to judge its performance. Therein lies the 
challenge.

My other meta-point is that we start with something modest and add 
capability and correctness. To me that is a system for simulation and a 
system for reconstruction. Other things are relatively simple from a 
system point of view. Those were the two bullets in my "initial scope" 
proposal. Another way of saying this is that I think the main planning 
job now is to eliminate tasks and features that we fully know we will 
need eventually, but whose addition (and discussion!) would slow us down 
initially. Let's not try to keep too many balls in the air.

   -- Mark

On 07/01/2012 11:30 PM, David Lawrence wrote:
>
> Hi All,
>
>
>    I think this a great conversation and a great start to gathering some specifics in regards to the Data Challenge. Here are a couple of comments:
>
> 1.) Mark may have alluded to this in his blog, but I think we should make two distinct Data Challenges. One that will address how we deal with simulated data and the other on how we deal with real data. Our plan up to now, as I understand it, is to produce and analyze simulated data and store only the DST (or REST format as described at the last software meeting). I think that should be done as a separate exercise from the real data Data Challenge exercise.
>
> 2.) For simulated data, there are two mechanisms we expect to use: a.) the JLab Scientific Computing farm and b.) the GRID. There will be some parts that the two systems will share, but some aspects will be different. One decision we need to make now is whether to test both of these in the 2013 Simulated Data Challenge, or focus only on the JLab farm.
>
> 3.) For the real data Data Challenge, I think we should focus on generating the data on the JLab farm.   Our baseline plan as presented at the Software Review did not require large amounts of data to be imported to the JLab silo from offsite producers. Adding that mechanism to the 2013 Data Challenge(s) might make things more complicated.
>
> 4.) For the real data Data Challenge, we should double check how much we save in CPU by storing the output of hdgeant rather than mcsmear (section 4.1 of document). Ideally, I'd like to push to get evio data written out that is close to what the raw data format will be. This will force us to check additional database systems that could be potential issues in offline processing.
>
> That's my 2 cents for now.
>
> Regards,
> -David
>
> ----- Original Message -----
> From: "Mark M. Ito" <marki at jlab.org>
> To: cmeyer at ernest.phys.cmu.edu
> Cc: halld-offline at jlab.org
> Sent: Saturday, June 30, 2012 6:00:49 PM
> Subject: Re: [Halld-offline] [Halld-physics] Next Data Challenge...
>
> Curtis,
>
> I have outlined some thoughts at this URL:
>
> http://markito3.wordpress.com/2012/06/30/ideas-for-a-data-challenge/
>
> I think that they are somewhat complementary to the ideas you put into
> the document.
>
>     -- Mark
>
> On 06/28/2012 02:47 PM, Curtis A. Meyer wrote:
>> Dear Colleagues -
>>
>>      as discussed in the offline meeting yesterday, we want to start
>> planning for the next data
>> challenge. People agreed to email me their ideas and I would then
>> summarize them in a
>> document that could then be discussed at the next meeting. In order to
>> get this process going,
>> I started to put this together as GlueX-doc-2031
>>
>> http://argus.phys.uregina.ca/cgi-bin/private/DocDB/ShowDocument?docid=2031
>>
>> Please feel free to use this as a starting point, as it will probably help us converge sooner.
>>
>>       thanks - Curtis
>>
>
> _______________________________________________
> Halld-offline mailing list
> Halld-offline at jlab.org
> https://mailman.jlab.org/mailman/listinfo/halld-offline

-- 
Mark M. Ito
Jefferson Lab (www.jlab.org)
(757)269-5295






More information about the Halld-offline mailing list