Mediatheque, a Drupal media organizer

Over the years, I've accumulated a large collection of e-books and digital music albums, not to mention family pictures. Information overload is not a philosophical point of view, it's a real problem that forces me to devote time, effort and money to maintain that collection.

That's probably why so many media organizers exist. Because I believe that all applications should be delivered from the Web, and because no ready-made Web media organizer struck me as fulfilling my needs, I started to write my own using Drupal 6, dubbed Mediatheque. Here are the most important design goals I had in mind:

Design goals

  • The main job of the organizer is to "ingest" media files by processing them to extract metadata, which is made available for searching and browsing.
  • I should be able to point the system to different "volumes" containing my media. These volumes are essentially folders that are present somewhere on the network.
  • No extra file space (beyond database growth) should be required to ingest media files.
  • The system should be able to store arbitrary metadata about the media.
  • I should be able to add new information handlers for media at any time - the system would silently re-process the files.
  • New version of media handlers should also be supported by re-processing the files.
  • The system should be smart about recognizing files across name changes and metadata updates.
  • The system should be robust in the face of plugin errors.
  • Media display should be dependent on its type.

Implementation details

Based on these goals, here are the significant implementation details of the current system:

  • Drupal Queue is used to process the files in the background. In fact, I am using 3 queues as a processing pipeline:

Starting with a volume root, the folders queue finds and enqueues the files, the files queue creates Drupal documents (nodes) and enqueues them for plugin processing, and the plugins queue applies the registered plugins to extract metadata.

  • Each processing step produces a log entry. This allows to track the errors that are produced during media processing. The log is a Drupal table that is integrated with Views via hook_views_data. Each log entry contains information about the processed document, the file hash, the plugin name and version, and the status code of the processing, allowing to detect the cases where re-processing should occur. Mediatheque log

  • Plugins are metadata extractors that are associated with MIME type patterns. For example, Mediatheque currently comes with an ID3 extractor that uses the getID3 library and an e-book metadata extractor that uses the Google Books API via Zend Gdata. Mediatheque plugins

  • Metadata extracted by the plugins is stored in the document node using my CCK Metadata module. This is a simple name/value CCK field and the document node includes this field with unlimited cardinality. The metadata pairs returned by each plugin are prefixed with a unique plugin prefix to be able to handle re-processing. CCK Metadata also allows custom formatting of specific metadata entries via hook_cck_metadata_fields:

/**
 * Implementation of hook_cck_metadata_fields().
 */
function mediatheque_cck_metadata_fields() {
  return array(
    'isbn:thumbnail' => array(
      'formatter' => 'mediatheque_formatter_thumbnail',
    ),
  );
}

Eventually, this metadata system should reuse RDF instead of using a custom design. Mediatheque plugins

  • Finally, the main mediatheque view is created as a regular view with filterable metadata. The metadata name exposed filter is converted to a drop-down via my Views Hacks' Views Selective Exposed Filters module whose job is to restrict the available filter values to those found in the view results. Mediatheque plugins

Conclusion

Mediatheque is still very much a work in progress. However, many conceptual challenges have already been solved, and I would love to hear your feedback!

AttachmentSize
log.png132.92 KB
plugins.png53.99 KB
volumes.png50.21 KB
document.png87.44 KB
mediatheque2.png163.26 KB

Comments

Importing iTunes

In my copious free time I've been working on a module that imports iTune's XML file into 1st class drupal nodes. Don't know if this would be something you'd be interested in using. I haven't published it yet, it's not quite "ready enough" for me but this might give me the push to get it cleaned up and published onto github. If you think it would help give me a hollar.

Thanks for the offer. That would be great, but it would require some changes in the Mediatheque architecture to allow for volumes which are not filesystem folders. If you're interested to help out on this, please open an issue and we can discuss it more thoroughly there.

How about mediamosa.org?

Thanks! I wasn't aware of that I'll try it out :-)

Nice concept. I have been pondering this myself, as I want to build a document photographing rig. This will allow me to image old personal documents, incoming invoices and thousands of old physical photographs. My thinking is that it will be much faster photographing them than attempting to scan each one.

I was then going to categorise and tag each document image. The tag info will be stored in a Drupal table and also stored in the jpg EXIF metadata. This way, if I loose the DB table info, I can easily rebuild the info by re-reading the exif data. The most time consuming part of the process will be categorising the document.

I will need to write the image tagging software and from the looks of it, with your module, I will need to write a MediatheueuExifPlugin, as this will handle the data extraction from the jpg. I will have a play with this in a month or two, when the rig is finished.

Thanks! I'd love to see other implementations of Mediatheque, so do let me know when you get started. I will be interested to use the EXIF plugin as well for all my family photos :-)