[HydraTeam] [EXTERNAL] Plot from the meeting earlier today
David Lawrence
davidl at jlab.org
Thu Feb 8 16:47:42 EST 2024
Hi Manav,
Thanks for the plot and explanation.
I’m trying to understand your description of the error bars. There are configurations where
the error bars stretch from 0% to 100%. Does that mean for one of the trials there was
a trial with 0% accuracy and another trial with 100% accuracy?
Regards,
-David
-------------------------------------------------------------
David Lawrence Ph.D.
Staff Scientist - - EPSCI Group Lead
Thomas Jefferson National Accelerator Facility
Newport News, VA
davidl at jlab.org
(757) 269-5567 W
(757) 746-6697 C
> On Feb 8, 2024, at 2:51 PM, Manav Bilakhia <manav.mitesh at gmail.com> wrote:
>
> Hi,
>
> I wanted to share the plot I was trying to show this morning.
>
> Here's the description of the that is attached below
>
> This plot has a lot of information in it. Let's break it down.
>
> Important note: I generated 1866 knockout plots over the summer.
>
> Full dataset means all natural plots + knockout plots.
>
> Natural plots are the plots that I did not generate.
>
> Each trial group on the x-axis had five trials each
>
> The y-axis represents the accuracy when the model was tested on just the knockout plots I created over the summer via a different training script.
> The x-axis:
> Example label1: train: 60 validation: 40 knockout in train: 0.
>
> This means that 60% of the natural dataset was used for training. 40% was set aside for validation, and there were 0 knockout plots in the training.
>
> Example label 2: train: 95 validation: 5 knockout in train: 20
>
> This means that 95% of the FULL DATASET was used for training, which includes 20% of all knockout plots.
> 5% of the full dataset was set aside for validation.
>
> Example label 3: Hydra original
>
> This means that I did not do anything different. I ran the training script without altering how many knockout or natural plants are used where. Hydra original has knockouts as it was given the full dataset.
>
>
> Interpretation: In this plot, we see no particular trend. This is because some plots in the natural dataset look like the knockout plots I produced. This was also suggested in the meeting this morning.
>
>
> Why the big lower error bars?
> This is basically because of how the models are trained. The training script figures out the location of each plot from the database. After which, it's put in a pandas data frame and shuffled before being split into training and validation sets. Sometimes, the shuffle is just unlucky for the model with very little knockout or knockout-like plots that never end up in the training. I tested all these models on bad plots only. Every time a plot was not predicted as bad, it was predicted as cosmic. There were only a handful of times when the plots were predicted led or good or no data.
> The model is confused between cosmic and bad with these unlucky shuffles.
>
> How are the error bars calculated?
>
> Lower error is the mean of trials in the trial group – min of trials in the trial group. Upper error is max of trials in the trial group – the mean of the trials in the trial group.
>
> Best,
>
> Manav
>
>
>
> <Figure_1.png>_______________________________________________
> Hydrateam mailing list
> Hydrateam at jlab.org
> https://mailman.jlab.org/mailman/listinfo/hydrateam
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mailman.jlab.org/pipermail/hydrateam/attachments/20240208/75e12358/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4640 bytes
Desc: not available
URL: <https://mailman.jlab.org/pipermail/hydrateam/attachments/20240208/75e12358/attachment.p7s>
More information about the Hydrateam
mailing list