Category Archives: doc

SGML to XML migration of the doc tree

I’m that type of guy, who doesn’t like talking too much and prefers to just silently work on something instead. I think it is bad, since I have some interesting projects in my queue although lacking free time I’m progressing slowly. I’ve decided to blog more about projects that I work on or that I’m interested in. This is the first entry of this series.

It has been quite some time that I started working on converting the doc tree from SGML to XML but because of (1) lacking free time, (2) the doc repo being a quickly moving target, (3) lacking a good VCS system that support moves and renames and good branching, it hasn’t ever been completed and I just always have it in a nearly complete stage, trying to keep up with merging upstream changes. But since (3) is resolved now, it will be easier to create a development branch and to keep it in sync until it is totally finished and the work can be merged back. Also, more people became interested and this motivates me to try to dedicate more time to finally finish this and they may also help out. In the following, I’ll try to summarize what this migration consists of.

First, I would like to emphasize that this change won’t make a big difference to doc committers since XML is actually a superset of SGML so there will only be minor changes in the markup and the change will rather affect the toolchain and the generated output. Let’s check the characteristics of SGML and XML to better understand what this means.


  • Stands for Standard Generalized Markup Language and it is the father of XML. It is really aged and when it was introduced, (1) document sizes did matter and (2) there were no such experience available in the field to rely on. As a consequence, SGML is a real beast and it supports much more features than XML but those features do not matter in today’s usages. For example, SGML permits starting and ending tags to have a different syntax than that one of the usual <> brackets or allows sometimes that we omit end tags or lets us abbreviate <foo>bar</foo> as <foo>bar</> or even <foo/bar/. Nowadays, these extra bytes are not expensive and mostly the commonly known syntax is used in the doc tree even if we could still abbreviate.
  • SGML being so complex with its great many of features, required complex processing software. Because of this complexity, there are few open source choices out there and their development is discontinued.
  • The DocBook schema that we use was earlier developed for SGML but it is discontinued and newer versions use XML technologies.
  • Rendering SGML documents is done with the DSSSL standard that is also very old and complex. The open source choices are also limited here, namely Jade and its fork OpenJade. DocBook provided its DSSSL stylesheets but not surprisingly, they are also discontinued.
  • As a result of using these old software with old stylesheets, we have several unresolved rendering issues in the documentation. Sometimes parts are missing in the RTF version or lines are running out of the page in PDF. Such PRs have been sitting for a long time in GNATS.


  • Stands for eXtensible Markup Language and was created to simplify SGML so that its processing can also be simplified. It has more strict requirements on the document to keep this simplicity. However, it is still perfectly valid SGML.
  • There are various XML parsers out there in several programming languages, which gives us more choices. These pieces of software are still being developed and XML is still widely used.
  • Recent versions of DocBook are based on XML and there are different schema standards that can be used for validation.
  • Transforming XML documents to plain text, HTML or another XML documents can practically done with the XSLT standard. As opposed to DSSSL, it is widely supported standard with various XSLT processors out there. However, XSLT is more limited to DSSSL because actually it is a transformation language not a rendering standard.
  • DocBook has excellent XSLT stylesheets that are developed together with the schema and support more output formats than currently available. For example, they support the popular EPUB format or there is a beta support for HTML5.
  • Because of XSLT being a transformation language, it cannot produce e.g. PDF or RTF output. It has to be done with another standard: XSL-FO. It is an XML-based typesetting language that can be rendered in various formats, depending on the capabilities of the chosen XSL-FO processor. First, the XSL-FO document is generated from the source document with an XSLT stylesheet and then it is further processed by an XSL-FO processor. Unfortunately, there are only 2 processor choices out there: xmlroff and Apache FOP. Apache FOP generates excellent quality output but it is Java-based, while xmlroff is written in C but it is lacking quite some features. This problem still needs to be solved or at last resort we can keep using the DSSSL stylesheets with (Open)Jade until a better solution urges. But even if we don’t build Apache FOP PDFs officially, it would be nice to have the opportunity to generate it since they are much better looking and better readable than the current solution.
  • XML has many related standards, like XLink, which is already used in DocBook 5.0 to support advanced linking. The rest are not planned to be used for any feature yet but there many possible open opportunities.

I hope I haven’t left anything important out. Comments and questions are welcome.

FreeBSD needs fresh Blood!

Oh well, it’s time to write some nice job offer, of course it’s all
for free, and you can’t earn any money out of it, but you’ll get a
big thanks, hugs and love from the community. Ask your self, how
long have you been using FreeBSD. Months? Years? Decades? And you
love using it because of whatever reason but at the same time
you’re feeling a bit guilty to use it all for free without giving
anything back? Well now you’ll have the chance to change that.
We at FreeBSD are always in need of new people who are willing
to spare some of their time and effort into FreeBSD development.

Let me share a bit of my experience. I have (re)built a lot of
teams in the past, such as gecko@, kde@, python@, and I was
involved in the creation of FreeBSD vbox@ team. I have always
managed to get assistance from a lot of people, but recently more
and more people have started to complain about the slowness,
broken commits and requested for more Call for Testing. And that
is actually a big problem. I am the kind of person who like to
call for test, but I am also the kind of person who easily gets
disapointed when I’m not getting much feedbacks. The best example
here is ATI, Xorg and Xfce update. I did a call for testing because
Xorg and Driver updates is always a big issue because there are so
many different hardware involved with various configurations. From
the call for testing, we managed to get a total of 19 mails of
positive feedback and after 2 weeks I’ve committed the update.
What happened after that was I received a lot of complains for
not conducting much testing, yadda, yadda. Well I say it ain’t
my fault for not testing much, but it is also your fault for not
helping us. It is always easy to blame instead of helping. Ask
yourself why have you not helped us in testing properly and give
us feedbacks. Complaining is fine when it is done in the right
way, with the right tone.

While I’m talking about Xorg, the FreeBSD Xorg Team is currently
a one man show effort, supported by kwm@ and fluffy@. Xorg alone
is too big to get worked on. Plus you should not think that it is
affecting the ports only, but it affects the kernel as well, which
we are having the most problems at the moment. And of course I
would like to call for help on that as well. Based on my last call
for help, it is funny to see how many people wanted to offer some
help, but after knowing the amount of work involved, I have stopped
hearing from these guys. I understand that to update Xorg is always
a crappy job but I love doing it, because it is nice to get more and
more experience in understanding how things work, and it helps to
improve my skills a lot.

Lets a talk a bit about our FreeBSD KDE Team. KDE is nice, but it
really is a fat project. It needs a lot of love, and maintenance
time. Currently it’s a 4 people project, namely makc@, fluffy@ and
avilla@. While for support Raphael Kubo da Costa is handling it
actively. The thing is, KDE involves more than just KDE packages.
It includes Qt, PY-Qt, KOffice and Cmake as well. It is a big
project too and it would be nice to find more people to contribute
in the development.

And now lets talk about gecko@. gecko@ includes all Mozilla Project,
namely Firefox, Thunderbird and Seamonkey. It is currently maintained
by beat@ and decke@, and supported by flo@ and andreas. So again,
I’d like to see some fresh faces for this project as well. If you are willing
to help, do ping us via mail :p.

As for FreeBSD Gnome Team, well I can’t say much about gnome but
whenever I see the cvs commits in marcuscome tree, it seems like
most work for the upcoming gnome3 is done by kwm@, and supported
by marcus@, mezz@ and avl@. Gnome includes not only Gnome things
but it also include gtk and cairo, the one that always cause
problems in a major update. I think the team would love to have
some fresh blood in the team.

Okay, all of these need an understanding of programming and
scripting. If you think that you can’t do any of that, testing would
also help much. FreeBSD is one of the best documented open source
project, so that’s another area that could use some help too. Check
if is available in your language, or start helping to
improve the FreeBSD documents in your language. It would be very
helpful and the community will thank you for that. So if you would
like to offer some help, ping me in irc/jabber/mail :-)

- Martin

FreeBSD FTP docs available again

The prebuilt FreeBSD Documentation (such as the FreeBSD Handbook etc) is now again available via FTP at

The system will resume regular builds with weekly Sunday uploads within a couple of days, but I uploaded a new documentation set today. It probably hasn’t hit the FTP mirrors yet, but it should be available on most mirrors within 24-48 hours.

While updating the build system I also disabled the Palm book versions – I simply don’t believe they are used enough anymore to justify mirroring them every week to all the mirrors. If anybody misses the Palm precompiled versions, let me know and I will reconsider if they should be built.