Over the years, I've accumulated a large collection of e-books and digital music albums, not to mention family pictures. Information overload is not a philosophical point of view, it's a real problem that forces me to devote time, effort and money to maintain that collection.
That's probably why so many media organizers exist. Because I believe that all applications should be delivered from the Web, and because no ready-made Web media organizer struck me as fulfilling my needs, I started to write my own using Drupal 6, dubbed Mediatheque. Here are the most important design goals I had in mind:
Based on these goals, here are the significant implementation details of the current system:
Starting with a volume root, the folders
queue finds and enqueues the files, the files
queue creates Drupal documents (nodes) and enqueues them for plugin processing, and the plugins
queue applies the registered plugins to extract metadata.
Each processing step produces a log entry. This allows to track the errors that are produced during media processing. The log is a Drupal table that is integrated with Views via hook_views_data. Each log entry contains information about the processed document, the file hash, the plugin name and version, and the status code of the processing, allowing to detect the cases where re-processing should occur.
Plugins are metadata extractors that are associated with MIME type patterns. For example, Mediatheque currently comes with an ID3 extractor that uses the getID3 library and an e-book metadata extractor that uses the Google Books API via Zend Gdata.
Metadata extracted by the plugins is stored in the document node using my CCK Metadata module. This is a simple name/value CCK field and the document node includes this field with unlimited cardinality. The metadata pairs returned by each plugin are prefixed with a unique plugin prefix to be able to handle re-processing. CCK Metadata also allows custom formatting of specific metadata entries via hook_cck_metadata_fields
:
/** * Implementation of hook_cck_metadata_fields(). */ function mediatheque_cck_metadata_fields() { return array( 'isbn:thumbnail' => array( 'formatter' => 'mediatheque_formatter_thumbnail', ), ); }
Eventually, this metadata system should reuse RDF instead of using a custom design.
Mediatheque is still very much a work in progress. However, many conceptual challenges have already been solved, and I would love to hear your feedback!
Attachment | Size |
---|---|
log.png | 132.92 KB |
plugins.png | 53.99 KB |
volumes.png | 50.21 KB |
document.png | 87.44 KB |
mediatheque2.png | 163.26 KB |
Comments
Importing iTunes
In my copious free time I've been working on a module that imports iTune's XML file into 1st class drupal nodes. Don't know if this would be something you'd be interested in using. I haven't published it yet, it's not quite "ready enough" for me but this might give me the push to get it cleaned up and published onto github. If you think it would help give me a hollar.
Thanks for the offer. That would be great, but it would require some changes in the Mediatheque architecture to allow for volumes which are not filesystem folders. If you're interested to help out on this, please open an issue and we can discuss it more thoroughly there.
How about mediamosa.org?
Thanks! I wasn't aware of that I'll try it out :-)
Nice concept. I have been pondering this myself, as I want to build a document photographing rig. This will allow me to image old personal documents, incoming invoices and thousands of old physical photographs. My thinking is that it will be much faster photographing them than attempting to scan each one.
I was then going to categorise and tag each document image. The tag info will be stored in a Drupal table and also stored in the jpg EXIF metadata. This way, if I loose the DB table info, I can easily rebuild the info by re-reading the exif data. The most time consuming part of the process will be categorising the document.
I will need to write the image tagging software and from the looks of it, with your module, I will need to write a MediatheueuExifPlugin, as this will handle the data extraction from the jpg. I will have a play with this in a month or two, when the rig is finished.
Thanks! I'd love to see other implementations of Mediatheque, so do let me know when you get started. I will be interested to use the EXIF plugin as well for all my family photos :-)