The Man Also Known As "I'm Batman!"

Hello Citizen! Would you like to be my sidekick? Would you like to ride with... Batman?

Thursday, June 09, 2005

We've moved! Check out all these great articles and others at http://akaimbatman.intelligentblogger.com!

Explanation of Database File Systems

Category: Technology Explained

One of the hottest topics on the market today is Database File Systems. Between Gnome Storage, WinFS, and now Apple Spotlight, it seems like everyone is making a big deal out of these features. But what is a database file system and why is it so important? I'll attempt to answer those questions and provide a good explanation of how a modern Database File System can be structured.

Defining the Problem

In every day life we tend to keep track of things by association. For example, I know that the bill I received from the credit card company is both a bill and is from my CC company. So what if I wanted to find that bill again, but didn't remember what I did with it? Well, I'd probably go and check in my bills first, then perhaps in a folder where I kept old paperwork, then on my desk, then perhaps somewhere that I might have left it by accident. Note the associations here:
  • All Bills
  • Archived Paperwork
  • Recent Desktop Documents
  • Search Common Areas
These associations are a form of "meta-data" or "data about data". Usually we don't consider this information anywhere near as important as the data itself, but without it we couldn't even find the data!

Now let's assume for a moment that our mythical credit card bill was delivered to us electronically. If we further assume that we kept it as a file on disk, then the question that comes to bear is: Where do we put it so that it's easy to find?

Options include:
  • In a "Bills" folder. It's probably a good idea to keep all the bills together. That way we know where to find one that we might need.
  • In an "Archived Documents" folder. The bill is already taken care of and paid. Why not just archive it and keep it out of the way?
  • On the Desktop. After all, it's an important file, right? So this important file should go somewhere noticable.
As you may have noticed, this is very similar to the issues faced in real life. The difference is that a computer should be able to do much more to help organize the information than just placing it somewhere you hope is obvious.

To a certain degree a computer can help. If I lose track of what I did with the bill, then I can attempt to search for it. The computer will then trace through every file in the system attempting to find what I lost. The problem with such a search, however, is that the search can only work as well as the name I've given the file. If I got lazy and called the file "CC Bill", then it won't be returned in a search for "Credit Card".

Meta-Data

What if we could improve upon the situation presented above? For example, what if we could place the bill in the Bills folder, Archived Documents folder, and the Desktop simultaneously? How about if we could search inside the document for the information it contains? What if the computer could intelligently score files it finds in a search and sort them based on the files it thinks are the best match? With the meta-data from a database file system, all of this becomes possible.

Generally speaking, there are three types of meta-data that a system might support:
  1. Inherent meta-data
  2. User applied meta-data
  3. Organizational meta-data
Inherent meta-data is data that can be derived from the very existence of the file. At the simplest level, this includes things like the date it was created and the size of the data. More complex meta-data schemes may actually dive into the binary stream of the file data and extract additional meta information assigned to the file. This could be as simple as the text of a word processing document or as complex as the name of a movie or the artist of a song. It should go without saying that the better the system is at understanding files, the more useful information it can extract from a file.

User applied meta-data is data that the computer user explicitly adds to a file. For example, I might type a note in a meta-data field stating that I need to remember that this bill is due by the 28th of February instead of the usual 30th of the month. This information can often be very useful, but it can be difficult to convince users to take the time to apply it.

Organizational meta-data is something of a cross between the two previous types of meta-data. It's the type of data that a user might add to a file for the purposes of better organization, and thus becomes an inherent attribute that can be queried on. For example, let's say that I wanted to categorize bills under a meta-data tag called "Bills". And let's say that I then had a tag for each credit card company so that I could easily find correspondence with them. Now I have added meta-data to the file that tells the system that the document is a Bill from Credit Card Company A!

Out with the Old, In with the New

The key to a database file system is that the meta-data attached to files provides us with a better method for searching for things. The theory is that if searches get good enough, the traditional methods of accessing files can go away all together. Instead of browsing through folders, the user can just type "Bills" and get a list of all the bills stored on the system. Or the user can narrow it down and ask for "Bills from Credit Card Company A" and receive a more specific list of results. The key is that the search will always come back with what the user needs, but in a much quicker fashion than if the user had attempted to manually find the files.

So does that mean that Folders will go away all together? The answer is both yes and no. No, traditional folders won't be as useful. But at the same time, it is occasionally nice to be able to browse the information contained in your system. The replacement solution is two fold. The first part of the solution is to allow the user to save search queries as a psuedo-folder. This provides a user with easy, and automatic organization of his files. For example, he could create a saved query that searches for all movies. Then whenever he wishes to know which movies exist on his system, he can just open the psuedo-folder and see the results of the search!

The only issue with using saved queries is that without regular folders your files will be lost until you can craft a query to find them. As a result, another solution is necessary.

The second part is a concept known as Labels. Labels are a type of meta-data that falls under the Organizational category. The idea is that you can create as many Labels in your system as you'd like, then apply them to individual files. As files are tagged with Labels, they automatically appear in psuedo-folders that display the name of the Label. Files lacking a label will appear in an area that displays unlinked files. This list not only ensures that the user never loses his files, but also encourages him to properly Label them. And since a user can apply and subtract any number of Labels at will, there's much less of a need to ensure organizational correctness up front.

The concept can even be extended into common metaphors in use today. What if the Desktop stopped being a folder for files, and was just a standard system label? Files could easily be moved to the desktop and removed just by applying and removing the label! Files could be trashed with nothing more than a Label called "Trash". Hundreds of uses could spring up to ease the user interface just because a better method of organization exists! That is why database file systems are considered the future of computer file systems.


Links:

Wikipedia: File Systems
Practical File System Design with the Be File System (1999)
Spotlight Technology Brief
DBFS for KDE
Microsoft WinFS

Questions? Comments? Use the "comments" feature on this blog to leave feedback. The more people who I know are listening, the more of these articles I will do. So sound off and let your voice be heard!

 
WARNING: Comments have been temporarily disabled during maintenence. Comments will reappear soon.