There is currently 1 person online.

when a superblock goes bad..

..since it's the 2nd time i've experienced this, due to the paralyzingly helpless sensation of the event, it is in fact the exact moment when i began to think about abandoning my linux endeavors.

now that mac is firmly rooted in it's own variant of unix, one that is no less gaining increasingly widespread acceptance amongst the type of "fringe development" that interests me, i realize that managing 2+TB of data, without any form of backup procedure, has exceeded the boundaries of which "i am prepared to say goodbye to any drive's worth of data" is still applicable.

it is a place i'm not comfortable being; this place without anyone accepting the responsibility for data integrity.. in both mac and pc worlds, there are numerous companies happy to take your money in exchange for software that can either restore, or salvage your corrupt drives/partions.. since nothing is as prevalent in the linux world, i'm left with a sense of abandonment in which i yearn for the parental safeguards and comfort of ms windows or apple os X.

reflecting from this perspective, that i realize my data storage needs have long since reached the corporate/enterprise level of operation, and i'm well past the need for a backup procedure, but what can i say, as i've told countless other peopl, it isn't until you lose your data, that you learn the value of backing it up..

this article deals with low level hard drive maintenance; as with all things potentially damaging to your system, try not to do anything u don't understand the consequences of, as it may (and in this case, will likely) result in the permanent loss (or magnificently complicate the recovery) of your data.

ok, so my vmware vista instance isn't accessible via remote desktop.. strange, lemme check in with vmware, and it isn't familiar with the virtual machine anymore.. well let's reopen the virtual machine; hmm, errors. perhaps i should reboot linux, it's been, o let's see, around 76 days since the last reboot; shouldn't be the issue alone, but let's reboot anyway..

here we go, this looks bad; it fails while booting with the following errors (the output is cropped, this is what's viewable using the dsub/vga connection on my tv):


failed to set xfermode (err_mask=0x1)

[...]


v/sdc2:
superblock could not be read or does not describe a correct ext2
esystem. If the device is valid and it really contains an ext2
esystem (and not swap or ufs or something else), then the superblock
corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 8193 <device>


k.ext3: No such file or directory while trying to open /dev/sdc3
v/sdc3:
superblock could not be read or does not describe a correct ext2
esystem. If the device is valid and it really contains an ext2
esystem (and not swap or ufs or something else), then the superblock
corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 8193 <device>


[FAILED]


An error occurred during the file system check.
Dropping you to a shell; the system will reboot
when you leave the shell.
e root password for maintenance
type Control-D to continue):

  

a couple links start to give some insights into a possible solution;

  

and other more direct results, that apply to sun/solaris, which apparently offers a command 'newfs'.. this seems comparable to the fedora/ubuntu 'mke2fs'

right, so the superblocks are bad on a hard drive i'm not comfortable losing just yet.

fdisk modifies the partition tables, if this had become corrupt u'd be in a much worse state. near as i can tell, within a partition, the superblock allocates blocks to a filesystem, which in turn references files to filesystem blocks.

this integral ingredient in the normal healthy operation of your hard drive, is no longer valid.

fortunately backup/alternates are rumored to exist..

the OG superblock is duplicated at several additional locations, and unless you noted them when u created the partition, ur going to hafta find them. this is achieved by pretending to make a new filesystem on the corrupt partition, and taking note of where it would normally put the alternate superblocks.

the "-n" argument here is what makes this a simulation; without the -n parameter you will overwrite and destroy your data.

from the maintenance shell,


# mke2fs -n /dev/sdc1

[...]


erblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239241, 204800000, 23887872, 71663616, 78685968,
102400000

hawt dawg, that's a lot of backup alternates! now we just run fsck with the alternate superblock specified..


# /sbin/fsck.ext3 -b 32768 /dev/sdc1

..which has this to say;


v/sdc1 was not cleanly unmounted, check forced.
s 1: Checking inodes, blocks, and sizes

yay, this is good, i think.. this makes the hd activity light flicker for a good 10-15mins or so, before the rest of it;


s 2: Checking directory structure
s 3: Checking directory connectivity
s 4: Checking reference counts
s 5: Checking group summary information
e blocks count wrong for group #1 (29650, counted=0).
<y>?

i imagine it's wanting to fix the incorrect block count for group 1.. i'm going to say go ahead with this, and hope i'm not writing out a big chunk of files..

i hit enter, for the default <y> a few times, feeling a little cautious as i do, and soon am leaning on the 'enter' key up thru group #3500 or so.. it counted=0 for most of the groups, up until the 3000s or so, few exceptions here and there; this is of some concern, since i thought the drive was more full than empty..

then we go thru a similar process for the inodes count being wrong, ya sure fix them; i dunno where else to go about fixing this stuff..


v/sdc1: ***** FILE SYSTEM WAS MODIFIED *****
v/sdc1: 2451/61063168 files (36.9% non-contiguous), 109938214/122096000 block

this is good i think, and cause for a reboot.. it's actually somewhat surprising, since i've never seen the maintenance shell modify the file system otherwise, and i'd already gone ahead and DL the recovery cd..

[...]

this didn't fix the problem, or it was more widespread than simply this. it is of some concern that the partitions at sdc2 and sdc3 weren't identified or mke2fs'able..

the recovery cd cost u ur last blank cd, and while it wasn't what u wanted, at least you have plenty of blank dvd's for the kde-live iso, which at >800mb will require dvd sized media.

the live disc should allow me to boot, clean up fstab so i can boot properly, then further troubleshoot the problematic drives/partitions to get them fixed and mounted, and finally confirm that they still contain the data..

neither the recovery cd nor the live disc are needed; in the maintenance console you can remount the root file system in read write mode; i'd guess it's disabled so you don't accidentally cause any further damage..


mount -w -o remount /

meanwhile research into raid is underway; initial findings suggest the newertech ministack v3 would allow a raid configuration on the mac mini ( http://www.newertech.com/products/ministackv3.php ). alternatively, the lacie biggest quadra offers an expensive all-in-one solution to raid, but having a singular point-of-failure for a raid device seems like a bad idea to me, and the multiple/individualized nature of the ministack has much greater appeal in this sense, it offers a stackable form factor nearly identical to the mac mini itself..

this all may have been a bit pre-mature, as i suspect it could be the sdc sata controller.. need to try it on sde or sdf, if i can figure out which is which, and whether sdd is moved to sdc when sdc isn't present... if this too fails, then i need to look into SMART and/or WD diagnostic utilities.