[Solid_software] file loss at /work/halla/solid
Ole Hansen
ole at jlab.org
Wed Mar 15 19:57:13 EDT 2017
Hi Zhiwen,
thanks for the links.
For the Jan 2017 set, there are over 50,000 files lost under
/work/halla/solid. The whole work disk lost over 600,000 files. That's
staggering.
Maybe everyone can try to give me some feedback on
1. what kind of data were lost (source code (self-written, from
elsewhere), applications, production/simulation output, etc.)
2. Number of files lost
3. how important these data are (i.e. how much do you care to restore
them), perhaps on a scale from 0 ("junk") to 5 ("mission critical")
4. if a backup exists, and if so where
5. how much time you estimate to spend/have spent on restoring (assuming
you would restore), including assessing the damage, locating backups, etc.
I only need summaries, perhaps per top-level directory.
This feedback would help me make a case to the Computer Center to
reconsider the choice of the obviously unreliable Lustre system for /work.
I'll start with my own lost files as an example:
/work/halla/solid/FNAL/products:
1. Fermilab runtime environment for art
2. 19
3. test installation, unimportant, obsolete: 1
4. yes (at fnal.gov and on personal desktop)
5. 10 minutes
/work/halla/solid/ole/workbook:
1. art workbook sources
2. 2
3. test installation, unimportant: 1
4. yes (at fnal.gov and on personal desktop)
5. 5 minutes
/work/halla/solid/ole/git-2.5.5-el7:
1. Updated git binary packages, self-compiled
2. 1
3. scratch directory: 0
4. yes (on personal desktop, or recompile from source RPM)
5. 5 minutes
/work/halla/gmp12/ole/Podd_Tutorial:
1. April 2016 Podd Tutorial
2. 8
3. missing files unimportant, but should restore for completeness: 2
4. yes (hallaweb)
5. 10 minutes
/work/halla/g2p/ole/tutorial:
1. January 2015 Podd Tutorial
2. 1
3. missing file unimportant: 1
4. yes (hallaweb)
5. 5 minutes
Ole
Am 15.03.17 um 17:17 schrieb Zhiwen Zhao:
> Dear All
>
> Many of you have noticed file loss on jlab Lustre file system where all
> work disks use.
>
> The complete list are here
> /site/scicomp/lostfiles.txt
> /site/scicomp/lostfiles-jan-2017.txt
>
> /work/halla/solid is affected and the loss is across all subdir
> So you should take a look at your loss and assume your code there is not
> working.
>
> Zhiwen
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: OpenPGP digital signature
URL: <https://mailman.jlab.org/pipermail/solid_software/attachments/20170315/03a23a15/attachment.sig>
More information about the Solid_software
mailing list