[Halld-offline] Data Management Plans

Curtis A. Meyer cmeyer at cmu.edu
Thu Jul 31 12:34:14 EDT 2014


Hi Matt & Chip 

I concur that making the results useful to other people are important. I think beyond the
results of a particular fit, we also need to think about extracting and publishing ``observables’'
than anyone can then compare to their model. The simple ones are cross sections and polarization
and beam asymmetries, but there are more sophisticated ones that can also be extracted. SDMEs,
etc.

 In principle, people can perform fits to these observables using their own models, etc….

 curtis




-------
Prof. Curtis A. Meyer			  Department of Physics
Phone: (412) 268-2745		  Carnegie Mellon University
Fax:      (412) 681-0648		  Pittsburgh, PA 15213
curtis.meyer at cmu.edu                http://www.curtismeyer.com/





On Jul 31, 2014, at 12:17 PM, Matthew Shepherd <mashephe at indiana.edu> wrote:

> 
> Chip,
> 
> Making n-tuples available is a natural instinct, but
> I don't think it will be practically useful.
> To make a reanalysis useful you
> not only need the four vectors, but also code to simulate
> the detector and explore the acceptance of the new cuts.
> Then you may need other data to understand the 
> systematic uncertainties in the cuts.  Is is hard to
> imagine someone doing a robust re-analysis without
> complete access to the GlueX data set and software.
> (Here I define "robust" as "capable of publishing
> in a peer reviewed journal.")  This is just my opinion --
> I think we'll spend a lot of time doing this and people
> may download and play with the data, but it will
> just be playing.  I don't think useful cross-checks
> or additional publishable results can be obtained from
> an ntuple alone.
> 
> Most key GlueX results won't be 1D histograms but 
> results of amplitude analyses.  Each such analysis
> will contain a lot of data -- likely thousands of fit
> parameters, many of which will have non-trivial correlations.
> These results are useful and able to be reanalyzed
> as they will represent "pure quantities" (with
> detector effects removed and systematic errors
> quantified).
> 
> We should focus on a standard way to make the results
> of GlueX amplitude analysis available to the 
> community with all the proper correlation matrices
> etc. so that it can be reanalyzed.  This is an
> absolute necessity in the context of doing 
> combined fits from multiple experiments that people
> talk about as a long term goal.
> 
> Matt
> 
> ---------------------------------------------------------------------
> Matthew Shepherd, Associate Professor
> Department of Physics, Indiana University, Swain West 265
> 727 East Third Street, Bloomington, IN 47405
> 
> Office Phone:  +1 812 856 5808
> 
> On Jul 31, 2014, at 12:02 PM, Chip Watson <watson at jlab.org> wrote:
> 
>> Concur, we don't want to generate unnecessary work for ourselves.  I would suggest that just the data in a plot is insufficient.  For example if what is show is a histogram (1D) which is the result of statistical analysis of 20GB of 4 vectors, then provide the 20GB file so others can try their own cuts on the data.  So probably not the raw data, but probably the N-tuples.  Yes, needs discussion and then specificity to satisfy the directive.
>> 
>> On 7/31/14 11:58 AM, Matthew Shepherd wrote:
>>> I think we need to define what "data" means in this case.
>>> NSAC was briefed on this last summer and, as far as my
>>> understanding goes, digital data can mean anything from
>>> making the plots digitally available in PDF to providing
>>> raw four-vectors, software, etc. to do a complete analysis.
>>> 
>>> I noticed that the third principle of the policy states:
>>> 
>>> "Not all data need to be shared or preserved. The costs and 
>>> benefits of doing so should be considered in data 
>>> management planning."
>>> 
>>> I think a good strategy to start with is to making available
>>> in tabular form any data points, error bars, etc. that
>>> appear in papers in refereed journals.  
>>> This seems to be optimal cost/benefit point as others 
>>> may want to do fits to cross sections,
>>> extracted scattering amplitudes, etc. from GlueX data.
>>> It is more useful than just a plot, but not as cumbersome
>>> as making the raw data available.  I don't think the
>>> raw data would be of much use to most people anyway
>>> without the insider experience to analyze it.
>>> 
>>> In this case the DOI would reference tabular data
>>> that was relevant for plots in a particular journal article
>>> (not terabytes of actual and simulated data).
>>> 
>>> Of course this needs more discussion and we need
>>> to make sure that what we do is compliant with the
>>> policy, but I'd argue for thinking about what is most
>>> logical and useful first and see if it fits the guidelines
>>> rather than trying to glean specific guidance from
>>> what is a very general policy.
>>> 
>>> Matt
>>> 
>> 
>> _______________________________________________
>> Halld-offline mailing list
>> Halld-offline at jlab.org
>> https://mailman.jlab.org/mailman/listinfo/halld-offline
> 
> 
> _______________________________________________
> Halld-offline mailing list
> Halld-offline at jlab.org
> https://mailman.jlab.org/mailman/listinfo/halld-offline

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/halld-offline/attachments/20140731/1dc76218/attachment-0002.html>


More information about the Halld-offline mailing list