Behind INDEX-*.db

Seizing the opportunity of having more free time from my internship, I’ve been able to unravel a bit more behind INDEX-*.db’s format.

Based on the information dumped from MySQL 5.0.x’s db_dump185 source, I can see that the Ruby generated the following formatted database:

:categories{Categories}
:db_version
[db specific version string]
:origins

{Package name
Package origin}
:pkgnames
{Package name
Package origin}
:virtual categories
{

{?Virtual Category Name
{complete Virtual Category Package Origins}
}

{Package name
Package origin}
{Origin
Package Name|Path|Prefix|Comment|Description|Maintainer|{Categories}|{Build Dependencies}|{Run Dependencies}|Website|{Extract Dependencies}|{Patch Dependencies}|{Fetch Dependencies}
a}
}

Observations:

The reason for:

{Package name

Package origin}

being injected every once in a while is probably due to the overflow facility of the BDB database format, combined with the fact that my output was from a raw database dump.

Also, it does appear as if the:

{Package name

Package origin}

set is sorted by “Package name”, which I find a) interesting and b) inefficient, depending on the algorithm used to extract the fields from the INDEX-* file.

Summary:

So, the ruby ports management scripts tack on an additional metadata to the existing INDEX-* file most likely for what the author considered to be wise for looking up ports / packages. It may decrease the search time, but serves only to increase the overall raw INDEX-* data by 2.1MB, which results in greater pre-/post-processing.

Notes:

gcooper@optimus ~/gcooper/Desktop
$ ls <del>lh INDEX</del>7 INDEX-7.raw
<del>rwx</del>-----+ 1 gcooper None 9.9M Apr 14 15:34 INDEX-7
<del>rwx</del>-----+ 1 gcooper None 12M May 19 00:17 INDEX-7.raw

  • The

{Origin

Package Name|Path|Prefix|Comment|Description|Maintainer|{Categories}|{Build Dependencies}|{Run Dependencies}|Website|{Extract Dependencies}|{Patch Dependencies}|{Fetch Dependencies}

a}
set appears to have been taken verbatim from the INDEX-* file. See http://www.lpthe.jussieu.fr/~talon/freebsdports.html#htoc11 for more details.

Notation:

  • “{ }” : denotes pattern repeating list, typically space delimited.

  • code blocks denote verbatim string segments or characters.

  • Italicized text denotes package metadata fields.

Add to del.icio.us - Digg this article