[Solid_software] file loss at /work/halla/solid

Ole Hansen ole at jlab.org
Wed Mar 15 19:57:13 EDT 2017


Hi Zhiwen,

thanks for the links.

For the Jan 2017 set, there are over 50,000 files lost under
/work/halla/solid. The whole work disk lost over 600,000 files. That's
staggering.

Maybe everyone can try to give me some feedback on

1. what kind of data were lost (source code (self-written, from
elsewhere), applications, production/simulation output, etc.)
2. Number of files lost
3. how important these data are (i.e. how much do you care to restore
them), perhaps on a scale from 0 ("junk") to 5 ("mission critical")
4. if a backup exists, and if so where
5. how much time you estimate to spend/have spent on restoring (assuming
you would restore), including assessing the damage, locating backups, etc.

I only need summaries, perhaps per top-level directory.

This feedback would help me make a case to the Computer Center to
reconsider the choice of the obviously unreliable Lustre system for /work.

I'll start with my own lost files as an example:

/work/halla/solid/FNAL/products:
1. Fermilab runtime environment for art
2. 19
3. test installation, unimportant, obsolete: 1
4. yes (at fnal.gov and on personal desktop)
5. 10 minutes

/work/halla/solid/ole/workbook:
1. art workbook sources
2. 2
3. test installation, unimportant: 1
4. yes (at fnal.gov and on personal desktop)
5. 5 minutes

/work/halla/solid/ole/git-2.5.5-el7:
1. Updated git binary packages, self-compiled
2. 1
3. scratch directory: 0
4. yes (on personal desktop, or recompile from source RPM)
5. 5 minutes

/work/halla/gmp12/ole/Podd_Tutorial:
1. April 2016 Podd Tutorial
2. 8
3. missing files unimportant, but should restore for completeness: 2
4. yes (hallaweb)
5. 10 minutes

/work/halla/g2p/ole/tutorial:
1. January 2015 Podd Tutorial
2. 1
3. missing file unimportant: 1
4. yes (hallaweb)
5. 5 minutes

Ole

Am 15.03.17 um 17:17 schrieb Zhiwen Zhao:
> Dear All
> 
> Many of you have noticed file loss on jlab Lustre file system where all
> work disks use.
> 
> The complete list are here
> /site/scicomp/lostfiles.txt
> /site/scicomp/lostfiles-jan-2017.txt
> 
> /work/halla/solid is affected and the loss is across all subdir
> So you should take a look at your loss and assume your code there is not
> working.
> 
> Zhiwen


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: OpenPGP digital signature
URL: <https://mailman.jlab.org/pipermail/solid_software/attachments/20170315/03a23a15/attachment.sig>


More information about the Solid_software mailing list