Testing csup

March 6th, 2008 by lulf

The last couple of weeks I’ve been very busy with school (and I expected this to be a quiet semester). However, I’ve found some of the last few bugs lurking around in csup:

  • Deltas that had a ‘hand-hacked’ date would have deltatexts that would be misplaced in the rcsfile.
  • When adding a new diff, a ‘..’ would be converted to ‘.’ twice, meaning it disappeared.

Now, there are only these issues left, but I’m not sure if I really want to fix this:

  • Some RCS files have an extra space between desc and the deltas. CVSup fixes this by counting the lines and then write them out when writing out the RCS file. I think this is silly, since it doesn’t really matter according to the RCS standard.
  • Some files appear to display garbage values, such as src/share/examples/kld/firmware/fwimage/firmware.img,v
    This disappears for some reasons in csup, but I’m not sure how to handle this. Comments are welcome.

  • It has a quite high memory usage, and this might be due to some leaks that
    I’ve been unable to find. I’ll do a much better audit of the code and run
    valgrind to investigate this further.
  • Does not support md5 of RCS stream, so it can’t detect errors yet.
  • Statusfile file attributes might not be correct.
  • Some RCS parts such as newphrases (man rcsfile) is not supported yet.
  • Some hardcoded limits that may break it.
  • Things done a silly way such as sorting and comparing, which I have plans to
    improve later.

So, finally, you can try out patches if you’d like:
http://people.freebsd.org/~lulf/patches/csup/cvsmode

Currently, I’m including the tokenizer generated by flex, since the flex file itself can’t be compiled with csup.

Improving csup

January 31st, 2008 by lulf

Hello there!

 It’s been a while. Partially because I’ve become a FreeBSD committer and had more productive stuff to do than writing in my weblog, and partially because my account was disabled after Google Summer of Code (Also, thanks to Google for SoC).

 Since last time, I’ve been working on getting my gvinum work of this summer into the tree, but since this has to be reviewed before going into the tree, and 7.0-RELEASE is much more important right now, it’s sort of on hold. In the meantime I’ve been working on implementing CVSmode for csup. This is something I’ve been meaning to do for a long time, but never found the big motivation until my exam-period before last christmas. So, I’ll tell you a bit of what I’ve done here. For those of you unfamiliar with cvsup, it’s a network CVS file synchronization tool which is heavily used in FreeBSD. However, it is written in Modula-3 and is therefore not very easy to maintain and it doesn’t integrate into the FreeBSD base system very well. So, Maxime Henrion started a C rewrite of cvsup called csup.

First, a bit on how csup works (or the cvsup protocol). The client runs three threads performing these tasks:

  • The lister, which examines the clients files, and sends information about them to the server.

  • The detailer, which recieves commands from server on what needs to be done (“this file needs updating, send me the details of it’s revisions).

  • The updater, which recieves the actual updates from the server (“add this delta to the RCSfile”).

More details on how the protocol works can be found on http://www.cvsup.org/howsofast.html

 So, what is CVSmode anyway? In csups normal operation, csup requests the files from a specific branch, called checkout mode. This is the typical way a user would use csup, fetching the src-tree for RELENG_7 for instance. However, a developer would often like to have the FreeBSD CVS repository on his local machine, and this is where CVSmode plays a part. CVSmode means that csup will recieve the entire CVS repository, and also fetch updates to the actual RCSfiles. So far, csup does only support the checkout mode.

So, what’s needed for CVSMode to work?

  1. Support for the protocol, so the client is able to not only act correctly on the commands from the server, but also respond correctly. This involves modifying the detailer and the updater part of csup.  This part needs to be a bit cleaned up right now, but is in a working state.

  2. Correctly parse RCSFiles. Firstly, I made a lexer with flex and parser with yacc. Then I found out I needed reentrancy, and started using bison. After realizing using bison for this wasn’t really nice since bison wasn’t in base, I rewrote the parser in C.

  3. The ability to update RCSfiles. This required a RCSfile interface. This interface is used by both the parser and the updater, to import and edit RCSfiles. Writing this interface is probably what has taken most of my time.

  4. Writing the RCSFiles out with the new updates. This is done internally by the RCSfile implementation.

So, this is what I’ve been working on implementing the last month or two. And I have the most parts working. What’s missing is a crucial part of (4). To write out the new RCSFiles to disk, a correct algorithm to apply diffs and reverse diffs is needed. The algorithm for applying diff was already created by csups author, but the reverse diff algorithm is a bit different. The last week or so, I’ve been studying the algorithm used in cvsup, and I’ve started to implement something similar although a bit different in it’s implementation. So, hopefully I’ll have this work pretty soon, at least before people start switching over to some new version control system :)

Huntin’ them bugs

August 17th, 2007 by lulf

More status updates… I’ve been fixing many small gvinum bugs the last couple of weeks:

  • The state of gvinum objects were changed after reloading. This meant that objects got the wrong state when gvinum was brought up.

  • Made gvinum always use the most recent configuration it finds when setting object states.

  • Make sure the newest drive is always the newest, and not the first in the drivelist, as was previously assumed.

  • Add “growable”-state to be used when a plex is ready to be grown.

  • Allow a plex to be rebuilt even though it’s also growable.

  • Do not change the size of the volume until the plex is completely grown.

  • Add status of growing and rebuild of a plex in the list output.

  • Prevent rebuild to take over the I/O system increasing access-count at the start and end of the rebuild.

Probably a couple of other fixes as well. Also, I’ve updated the vinum-examples page in the handbook to reflect new features and more practical examples. I’ve posted a “call for testers” on current@, arch@ and geom@, and have received some response from people who are willing to help me test. Thanks to them. I’ve uploaded the code-sample that I’ll be delivering to google here: http://folk.ntnu.no/lulf/gvinum_soc2007.tar.gz

Cleaning up

August 6th, 2007 by lulf

The last couple of weeks I’ve tested and done bugfixing and cleanup of gvinum code. I refactored some parts to make the code belong where it seems logical. I also implemented growing for striped plexes, but that was quite easy since I could reuse most of the code for growing RAID-5 plexes. Unfortunately I was sick for a week and unable to work.

What remains now is to do more testing (can’t get enough), and write and update documenatation on gvinum. I have updated patches for gvinum at http://folk.ntnu.no/lulf/patches/freebsd/gvinum for both RELENG_6 and CURRENT. I appreciate reports from brave users who tries it out, even if it works :)

Also, I created a new perforce-branch called gvinum_cache. I’ve currently implemented a read/write-cache to check if this would give much speed-up for gvinum. It’s not very nice for reliability, but could be an option for those who want better performance. Anyway, I’ll update more on this later.

Growing up

July 17th, 2007 by lulf

Since last post I haven’t really done that much do gvinum, but a few things.

  • I added a few automated test-scripts to check if a volume behaves properly

  • Go through test-plan and make sure that gvinum passes the tests.

  • I’ve been thinking a lot on how to best implement growing RAID-5 plexes.

  • I’ve implemented growing of RAID-5 plexes.

Now, the first and second points are quite boring to do, but I had to do it. Now the last points were trickier, since I didn’t really know where I should start. Finally I decided the best way was to let the plex overwrite itself! A more detalied explanation can be found in the TODO of my perforce branch. I need to test the implementation a bit now. Other than that, I’ve been a bit lazy on my own work this week, and tried to help other students with reviews etc.

Bugathon-week

July 4th, 2007 by lulf

Since last post, there has been many small bugfixes to gvinum. After some debating with myself on how I should implement concat/stripe/mirror, I think I got it pretty much right. The event system changed gvinum a lot, so I had to rewrite most of the code I already had on this.

I have done a lot of testing this week, and I made a test plan that I’m going to follow. Hopefully, I’ll also be able to create som automatic tests for this.

I’ve even been a good boy and updated the gvinum manpage! I added some examples to the manpage as well, so that it’s easier to get into gvinum for inexperienced users (not sure if we want gvinum to live even longer, but :) ).

A lot of small problems with weird states being set was also fixed, since this can be very confusing if you havent used gvinum much.

What I’d like to do next, is create a set of testscripts that I can use to test quickly and easily with. I also noticed that it would be nice to have a similar command like ‘mirror’ for RAID-5 volumes. This could be used like this: ‘gvinum raid5 <disk1> <disk2> <disk3>’. Other than that, I’ve started to think on how I’m going to implement raid-5 resizing and other goals in my proposal.

Bug-monster dying slowly

June 28th, 2007 by lulf

Finally, an update on what I’ve been doing since the last time. This time I have a lot of small changes that have been done:

  • Implement initialization of RAID5-Arrays. This basically writes zeros over everything and makes sure parity is correct.

  • Fix a bug with mirror code. The length of the completed requests got doubled if you have a mirror with two plexes, tripled if you have a mirror with three plexes etc.

  • When a mirrored plexes are syncing, all requests after and including the first write-request are delayed until syncing is finished.

  • Allow rebuilding a RAID-5 array while it is in use (e.g. mounted). Delay requests that are in conflict with the rebuild, but allow requests on the already rebuilt part to be run.

  • Allow subdisks to come up automagically after rebuild.

  • Allow stripesizes not divideable by the subdisk size. A regression in the new gvinum code prevented this.

  • Modify the event system to contain two intmax_t fields, so we won’t have to allocate/deallocate pointers all the time when passing args to gv_post_event.

  • Add support for the rename and move commands to new gvinum. The code has been rewritten for the new gvinum.

  • Fix a bug in the code for degraded writes to a RAID5-array, where only zeros were written.

  • Other minor bug/style fixes.

Next, I’m going to implement concat/stripe/mirror functionality. I already have some code from previous work I did, so I just need to adapt it to new gvinum, as well as change some ugly parts. There are some small facade-changes left, but I will do this after the last of the original vinum features is completed. Also, I will try write a nice status report, and get a testable patch out by the time the reports are finished.

Even happier…

June 18th, 2007 by lulf

Finally I did the initalization code for raid5 plexes, and this means I’m pretty much complete with updating old gvinum to the new event system, but it will probably need some fixes here and there as it gets tested.

What remains in terms of needed functionality is the concat/mirror/stripe commands to easily create a concat/mirror/stripe volume out of three disks. I also have noted some issues that I think could need an improvement. More on this next time.

Subdisks now live happy in raid5-town…

June 13th, 2007 by lulf

So, finally the exams are over, and I’ve been able to work sort of full-time on my project the last days. What I’ve done is (a bit technical this time perhaps, but this stuff tends to become that):

Implemented attach/detach routines. This makes it possible to attach a subdisk/plex to a plex/volume, or detach a plex/subdisk from a volume/plex. The detach routine makes sure all connections between the objects are broken correctly, and only if it’s possible (unless forced ofcourse). The attach routine makes sure the objects are correctly connected together again, and that a plex that misses subdisks includes them in the previous size when calculating the new size (so we don’t get wrong sizes on the plexes).

Tested rebuild of degraded plexes. The detach/attach routines enabled me to check if the rebuild of a degraded raid5 plex could work. And it did! This means (and this is something that I really missed in old gvinum), that when a drive fails, you can detach the failed subdisk, create a new subdisk on the plex (and it will check if the plex “misses” a subdisk), and then use ‘start <plexname>’ to rebuild the plex (The state of the plex must be degraded and the subdisk you wish to rebuild must be stale) and you’re good to go!

Bugfixes.

Implement syncing of plexes. This means one can now add a mirror to a volume, and have the new plex to be synced from the original. After a couple of tests, it seems to work, but I did get a bug I need to reproduce.

I also discovered some bugs regarding mirrored plexes that I will address in
the near future. This probably came with the change with the new gvinum
event system.

Next on my schedule is to hunt some weird state bugs where the state is not correctly set, as well as the mirrored plex problems I’ve seen. Also, I need to guarantee that a plex sync is up-to-date (that no data is written to the synced plex in the meantime).

Raid5 improvements

May 26th, 2007 by lulf

The last couple of weeks I had to practice for some exams. In other words, a great time for coding :)

This week I’ve been working on making RAID5 parity rebuild work. This includes user initiated rebuild/check, as well as rebuilding a degraded plex during plex initialization. (This is a vital feature, since if a drive fails, one must be able to rebuild the plex with the new drive. I have not been able to test this enough yet because I need the attach/detach routines to do it. So instead of continuing and getting the initalization/synchronization-routines in, I will implement attach/detach next, which should be quick since I already have some old code for it.