[Halld-offline] Data Management Plans
Matthew Shepherd
mashephe at indiana.edu
Thu Jul 31 12:17:58 EDT 2014
Chip,
Making n-tuples available is a natural instinct, but
I don't think it will be practically useful.
To make a reanalysis useful you
not only need the four vectors, but also code to simulate
the detector and explore the acceptance of the new cuts.
Then you may need other data to understand the
systematic uncertainties in the cuts. Is is hard to
imagine someone doing a robust re-analysis without
complete access to the GlueX data set and software.
(Here I define "robust" as "capable of publishing
in a peer reviewed journal.") This is just my opinion --
I think we'll spend a lot of time doing this and people
may download and play with the data, but it will
just be playing. I don't think useful cross-checks
or additional publishable results can be obtained from
an ntuple alone.
Most key GlueX results won't be 1D histograms but
results of amplitude analyses. Each such analysis
will contain a lot of data -- likely thousands of fit
parameters, many of which will have non-trivial correlations.
These results are useful and able to be reanalyzed
as they will represent "pure quantities" (with
detector effects removed and systematic errors
quantified).
We should focus on a standard way to make the results
of GlueX amplitude analysis available to the
community with all the proper correlation matrices
etc. so that it can be reanalyzed. This is an
absolute necessity in the context of doing
combined fits from multiple experiments that people
talk about as a long term goal.
Matt
---------------------------------------------------------------------
Matthew Shepherd, Associate Professor
Department of Physics, Indiana University, Swain West 265
727 East Third Street, Bloomington, IN 47405
Office Phone: +1 812 856 5808
On Jul 31, 2014, at 12:02 PM, Chip Watson <watson at jlab.org> wrote:
> Concur, we don't want to generate unnecessary work for ourselves. I would suggest that just the data in a plot is insufficient. For example if what is show is a histogram (1D) which is the result of statistical analysis of 20GB of 4 vectors, then provide the 20GB file so others can try their own cuts on the data. So probably not the raw data, but probably the N-tuples. Yes, needs discussion and then specificity to satisfy the directive.
>
> On 7/31/14 11:58 AM, Matthew Shepherd wrote:
>> I think we need to define what "data" means in this case.
>> NSAC was briefed on this last summer and, as far as my
>> understanding goes, digital data can mean anything from
>> making the plots digitally available in PDF to providing
>> raw four-vectors, software, etc. to do a complete analysis.
>>
>> I noticed that the third principle of the policy states:
>>
>> "Not all data need to be shared or preserved. The costs and
>> benefits of doing so should be considered in data
>> management planning."
>>
>> I think a good strategy to start with is to making available
>> in tabular form any data points, error bars, etc. that
>> appear in papers in refereed journals.
>> This seems to be optimal cost/benefit point as others
>> may want to do fits to cross sections,
>> extracted scattering amplitudes, etc. from GlueX data.
>> It is more useful than just a plot, but not as cumbersome
>> as making the raw data available. I don't think the
>> raw data would be of much use to most people anyway
>> without the insider experience to analyze it.
>>
>> In this case the DOI would reference tabular data
>> that was relevant for plots in a particular journal article
>> (not terabytes of actual and simulated data).
>>
>> Of course this needs more discussion and we need
>> to make sure that what we do is compliant with the
>> policy, but I'd argue for thinking about what is most
>> logical and useful first and see if it fits the guidelines
>> rather than trying to glean specific guidance from
>> what is a very general policy.
>>
>> Matt
>>
>
> _______________________________________________
> Halld-offline mailing list
> Halld-offline at jlab.org
> https://mailman.jlab.org/mailman/listinfo/halld-offline
More information about the Halld-offline
mailing list