The Man Also Known As "I'm Batman!": The Intelligent File Format: Part 2

Category: Conceptual Design

In the second part of this article, I'll attempt to address how the Intelligent File Format might be implemented.

Like most file format documents, we'll begin by describing the physical format of the file, then we'll delve into the contents and how to link them.

The Format Table

As you can see, the format is fairly simple with just a header, the the JAR file containing the code, then the actual data used by the file. In theory, the code embedded in the JAR portion of the code would only ever see the Payload portion of the file.

Header

The header would consist of a structure similar to the one below:

File ID	4 bytes	0xCAFEFEED
JAR Length	4 bytes
Flags	1 byte
Loader Class	UTF8 String

File ID

The file ID is nothing more than a standard identifier used to auto-detect the file type and ensure that utilities know that this is a binary file.

JAR Length

Informs the loader exactly how much of the file is taken up by the embedded software. The loader can then use this information to find the beginning of the payload stream.

Flags

Bit 0 - Write
Bit 1 - Random Access

The write flag tells the loader if the file can be opened for writing, or if the data is considered read only. This is important because if no software to write the file is included, the program using the data may linger under the assumption that changed data has been saved when in reality no actual data has made it to disk.

The random access flag tells the loader software that the file cannot be streamed. If the file is being loaded over the network or other serial location, the program requesting the file will be forced to move the data to random access storage (such as a temp file) before it can be opened.

Loader Class

The loader class is a string that tells the loader software which class to call inside the embedded JAR file.

JAR File and Supporting Classes

The JAR file needs to contain sufficient software to load the file data into a code structure and then optionally write that structure back out.

While streaming of input/output can be handled by the standard java.io.InputStream and java.io.OutputStream classes provided by Java, random access will need a new API. The only API that the Java Virtual Machine provides for random access is the java.io.RandomAccessFile API which is far too file specific to meet the needs of a modern, portable file format. The data could be randomly accessed on disk, in memory, or over the network. Thus a new interface must be developed to replicate the APIs of java.io.RandomAccessFile, but without tying the implementation to any particular storage system.

The following interface and abstract class accomplish that task:


public interface RandomAccess
{
    public long getLength();
    public void setLength(long length);
    public void seek(long position);
    public long getFilePointer();
}


public abstract class RandomAccessData 
  implements InputStream, OutputStream, Datainput, 
  DataOuput, RandomAccess
{
    public abstract boolean isWritable();
}

The "isWritable()" method returns true if the file can be modified. The rest of the methods are equivalent to the methods contained in the java.io.RandomAcessFile class.

The loader class referenced by the header is expected to conform to an interface such as the following:


public interface IntelligentFileFormat
{
    public String[] getInputOutputInterfaces();
    public String[] getOutputInterfaces();
    public String[] getFormatterInterfaces();

    public Object getInput(InputStream in, String type) 
        throws UnsupportedOperationException, FileFormatException;

    public void getOutput(OutputStream out, Object output) 
        throws UnsupportedOperationException, FileFormatException;

    public Object getFormatter(RandomAccess io, String type) 
        throws UnsupportedOperationException, FileFormatException;
}

The first three methods inform the loading software what types objects this code might return. While most implementations would only have a single type of object to return, some implementations may allow for the data to be loaded in a variety of ways. For example, a vector image may be returnable as a DOM of the vector data, or as a java.awt.Image object. Alternatively, a software vendor may choose to fully document a "simple" interface while leaving the more complex, feature rich interface undocumented. (See part 3 for more information on this.)

The type of the object is expected to be a fully qualified class name. The types of objects that can be written are not required to be symmetrical with the types of objects that can be read. Using the vector image example, the vector DOM might be writable to disk while the java.awt.Image object may not.

The latter three methods are where the reading and writing occur. The getInput() method returns an object that may stream the data in, potentially allowing for partial renderings of the data to be shown while the file is still loading.

The getOutput() method uses the object passed in to rewrite the payload data in the file. As stated above, the type of objects that can be written out will not always match the types that can be read in.

The getFormatter() method returns an object that has random access to the payload data. Depending on the setting of the "write" flag and the mode under which the file is opened, the returned object may also be capable of changing the payload data.

If the payload data is not in the expected format or the loader passes an unknown type, the methods will throw a FileFormatException. If the class does not support a method (e.g. getInput() is called on a random access file, getFormatter() is called on a streaming file, or getOutput() is called with an object that cannot be written), a UnknownOperationException will be thrown.

Implementation

An implementation of the above software would carry out the following steps:

Open the file for reading and/or writing.

Read in the header.

Use the JAR size in the header to extract the JAR file.

Load the JAR file into a secure ClassLoader.

Ask the secure ClassLoader for an instance of the class listed in the header.

Get a list of supported interfaces from the class.

Open the payload for streaming or random access based on the flags in the header.

Cast the object returned by the implementation to the expected object type -OR- use reflection to investigate the available APIs.

Go to Part 3 ->

The Man Also Known As "I'm Batman!"

Monday, February 20, 2006

The Intelligent File Format: Part 2

The Format Table

Header

JAR File and Supporting Classes

Implementation