[Moller_simulation] Moller12gev work disk status update

Paul King pking at jlab.org
Fri Nov 19 11:56:15 EST 2021


Dear all,

I had put the follow message out on the Slack moller_simulation channel, but am repeating it to make sure we have a persistent copy of this information.

TL;DR:  We understand what happened with the moller12gev work disks.  It's kind of a mess, but should be possible to be recovered.  More details will follow when we have them.

Thanks,
Paul.

----
We have some progress on understanding what has happened with the work disks, and are continuing to follow it up with CC folks.
1.  We had a first migration of the work disk in February 2021, and at that point started using the directory "/w/moller12gev-sciwork18".  Unfortunately, the original disk "/w/halla-scifs17exp/moller12gev" was not marked as read-only, and some people were able to either continue using it, or to start using it.  The most recent migration took the directories from "/w/moller12gev-sciwork18" to populate the new work disk, thus any activity on the "scifs17" disk is missing.
2.  The "scifs17" server had been used for the other /work/halla directories, such as the "parity" work disk, and so when the sysadmin migrated the other /work/halla directories, they noticed that "/w/halla-scifs17exp/moller12gev" was not marked as readonly, and then did mark it as read-only.
3.  The "moller12gev-sciwork18" and current work directory seem to be a good match for each other.  It appears that where there are differences they are from intentional changes done on the current work disk.
4.  The "halla-scifs17exp" disk usage is small compared to the current work disk.  The sysadmin is going to make a copy of the full disk on the new disk, and then we can sort it out from there.  Once it appears, do not just start working in this copy; for some directories we may just move them into the active work space, but for folks who have a current directory on the work disk you'll have to figure out what to move and what not to move.  I would hope that the copy of "halla-scifs17exp" disk will just be needed temporarily.  Please wait for the go-ahead to start messing with the "halla-scifs17exp" copy to make sure we've validated the copy.
5.  The work disks have some snapshotting being done (which does mean some file recovery is possible).  When we delete files from the disk (either by moving or just deleting), the space is not recovered until the last snapshot referencing those files is rotated out.  So when we did the cleanup earlier in the week, the available space went down as it was effectively hidden by the snapshots.  The sysadmin has cleared some of the snapshots, and "df" is now showing we're at 37% in use with a 8.3TB size.

If you currently have an active directory on /work/halla/moller12gev, you ought to be able to use it normally now.

If you had been using "/w/halla-scifs17exp/moller12gev" for your moller12gev work disk, please give us a little while to get the copy over and than we can see about moving your directory into the active folders.  Similarly if you have directories on both disks.
In the future, please only use the link "/work/halla/moller12gev" to refer to the work disk, to ensure that you are always using the active disk in the (likely) event that we go through another disk/server migration.

A longer discussion should be had at the simulation meetings about how best to manage our work and volatile resources to keep from stepping on each others toes.  Volatile is larger and allows us to expand beyond our reserve and our quota, but does have the risk of files being deleted if the demand exceeds the supply.  Anything "important" should be on work or be saved to tape.

We should probably also think about how much we're likely to need to write to tape over the next year, to give CC a heads-up about that.  I think we have suggested a token amount for this year and next, but I'm not sure at the moment of what we had told them.  It might have been a few 100TB or something like that.


More information about the Moller_simulation mailing list