Bug-monster dying slowly

Thursday, June 28th, 2007

Finally, an update on what I’ve been doing since the last time. This time I have a lot of small changes that have been done:

  • Implement initialization of RAID5-Arrays. This basically writes zeros over everything and makes sure parity is correct.
  • Fix a bug with mirror code. The length of the completed requests got doubled if you have a mirror with two plexes, tripled if you have a mirror with three plexes etc.
  • When a mirrored plexes are syncing, all requests after and including the first _write-request_ are delayed until syncing is finished.
  • Allow rebuilding a RAID-5 array while it is in use (e.g. mounted). Delay requests that are in conflict with the rebuild, but allow requests on the already rebuilt part to be run.
  • Allow subdisks to come up automagically after rebuild.
  • Allow stripesizes not divideable by the subdisk size. A regression in the new gvinum code prevented this.
  • Modify the event system to contain two intmax_t fields, so we won’t have to allocate/deallocate pointers all the time when passing args to gv_post_event.
  • Add support for the rename and move commands to new gvinum. The code has been rewritten for the new gvinum.
  • Fix a bug in the code for degraded writes to a RAID5-array, where only zeros were written.
  • Other minor bug/style fixes.

Next, I’m going to implement concat/stripe/mirror functionality. I already have some code from previous work I did, so I just need to adapt it to new gvinum, as well as change some ugly parts. There are some small facade-changes left, but I will do this after the last of the original vinum features is completed. Also, I will try write a nice status report, and get a testable patch out by the time the reports are finished.

Even happier…

Monday, June 18th, 2007

Finally I did the initalization code for raid5 plexes, and this means I’m pretty much complete with updating old gvinum to the new event system, but it will probably need some fixes here and there as it gets tested.

What remains in terms of needed functionality is the concat/mirror/stripe commands to easily create a concat/mirror/stripe volume out of three disks. I also have noted some issues that I think could need an improvement. More on this next time.

Subdisks now live happy in raid5-town…

Wednesday, June 13th, 2007

So, finally the exams are over, and I’ve been able to work sort of full-time on my project the last days. What I’ve done is (a bit technical this time perhaps, but this stuff tends to become that):

Implemented attach/detach routines. This makes it possible to attach a subdisk/plex to a plex/volume, or detach a plex/subdisk from a volume/plex. The detach routine makes sure all connections between the objects are broken correctly, and only if it’s possible (unless forced ofcourse). The attach routine makes sure the objects are correctly connected together again, and that a plex that misses subdisks includes them in the previous size when calculating the new size (so we don’t get wrong sizes on the plexes).

Tested rebuild of degraded plexes. The detach/attach routines enabled me to check if the rebuild of a degraded raid5 plex could work. And it did! This means (and this is something that I really missed in old gvinum), that when a drive fails, you can detach the failed subdisk, create a new subdisk on the plex (and it will check if the plex “misses” a subdisk), and then use ‘start <plexname>’ to rebuild the plex (The state of the plex must be degraded and the subdisk you wish to rebuild must be stale) and you’re good to go!


Implement syncing of plexes. This means one can now add a mirror to a volume, and have the new plex to be synced from the original. After a couple of tests, it seems to work, but I did get a bug I need to reproduce.

I also discovered some bugs regarding mirrored plexes that I will address in
the near future. This probably came with the change with the new gvinum
event system.

Next on my schedule is to hunt some weird state bugs where the state is not correctly set, as well as the mirrored plex problems I’ve seen. Also, I need to guarantee that a plex sync is up-to-date (that no data is written to the synced plex in the meantime).