Translating Entity-Relationship modeling to Drupal

Originally published at OpenCraft

Using the CCK + Views platform, we can replicate the traditional ER (Entity-Relationship) framework into a rapid application development environment that minimizes procedural and SQL programming by replacing most of it with "Visual ER programming": a way to specify data model and behaviour in terms of SQL-like abstractions directly inside the runtime Drupal environment - and reaping the benefits of incremental, always functional output.

ER Modeling

ER modeling deals with defining entities and relations among them. In Drupal, CCK content types correspond to entities, while relations can be specified through a number of ways, including CCK node references and Relativity module's parent-child relationship system. Neither are very comprehensive compared to the expressive power of ER relationships but it's a good start. Being able to map ER models to Drupal is a useful exercise that allows one to think in abstract ER terms and to implement the model 1-to-1 in software. Here are a few clues to perform that mapping:

  • In most cases, an entity type is mapped to a content type. Each instance of any content type is actually a Node, which can be thought of as the base class for all entities.
  • The entity's attributes are mapped to CCK fields. Most basic SQL types: integer, float, boolean, string, date, are all readily available as CCK fields, and the open CCK architecture allows to add many more besides programmatically. In that sense, the Drupal modeling system extends SQL with arbitrary field types.
  • In ER modeling, some entities need to be classified, such as movie genres or news topics. Typically, these are implemented in traditional database programming by creating lookup tables (for classification enums) and linking their PK to the entity being classified. Drupal provides this ability, in greatly enhanced form, through the Taxonomy module that allows the creation of arbitrary classifications that can be linear, tree-like, or even graph-like. Classification terms can be attached to any node. This is a huge time-saver and nullifies the problem of distinguishing between entity tables and lookup tables.
  • Simple relations can be defined using the CCK nodereference field type. This is a field that can hold a reference to another node in the system, in effect creating a relation between the referer and the referee. The name of the field can specify the relation (i.e., FK) name. The CCK nodereference can hold one or multiple references, enabling the cardinalities 1-to-1 or 1-to-many. Still, the CCK nodereference type lacks the ability to explicitly set and enforce specific cardinalities, and the relations just "feel" one-way, asymmetric, because they are defined as an attribute of one of the types, instead of link between two types. Relativity is another module that enables to explicitly set relations among types, and cardinalities are given more support. Its problem is that those relations are always implied to be of the parent-child type.

In conclusion, it is apparent that we use CCK to achieve both ER modeling and SQL CREATE TABLE statements in one go. That's a clear benefit of the system.

Now that we've defined the static ER model and mapped it to Drupal, we still need to be able to do two things: populating the database and querying on it. Let's take the second task first.

SQL Views

Querying entities in traditional SQL thinking involves constructing SQL views that correspond to desired reports. This is what the Drupal Views module accomplishes: it enables to construct named queries through a Web interface or API that return rows of nodes (or any other Drupal entity, like users, comments, taxonomy terms, etc.) according to specific criteria, just like a normal SQL SELECT statement would. Let's see how the Views mechanism corresponds to a SELECT statement:

  • A Drupal view (henceforth a view) has a unique name and a unique URL to be accessed from the UI.
  • The "SELECT tablename.fieldname AS label" idiom is achieved by specifying the fields of the main entity (or related entities) that should be displayed, and how each should be labelled.
  • The "FROM table1 JOIN table2 ..." idiom is achieved in two ways:
    • Implicitly because the Views system knows the table to which each selected field belongs. It then creates an internal JOIN between the master table and the other entity's table, assuming that the master PK is present as a foreign key in the second entity.
    • In addition, Views supports explicit programmer-supplied relationships that define how additional tables relate to the master table.
  • The "WHERE" clause is controlled through Filters where each filter specifies a condition, acting on a single field, that will filter the result set. Again, those fields used for filters affect the JOIN clause of the SQL statement.
  • Arguments are sometimes passed to SQL views. These are also provided by Drupal Views, through the Arguments section that expects arguments to be passed on the URL of the view. These arguments are added as conditions to the WHERE clause.
  • Finally, the displayed view can be shown on the browser with active filters that allow the user to further filter the query at runtime, through the Exposed Filters section. It is worth noting the the Views architecture is flexible enough to allow overriding most of these features with custom code. However, the default UI is usually enough to define most views without recourse to coding.

Populating and editing the database

Populating a traditional database involves executing SQL INSERT and UPDATE statements. Drupal provides more than one way to capture data:

  • Through the usual "Create content" and node editing mechanisms, whereby CCK displays data entry forms for the content types that are present in the system. In this sense, CRUD operations are automatically functional as soon as the CCK content type is created.
  • Through the Node Import module that allows CSV data to be imported into new instances of specified content types. It is worth noting that Drupal lacks a flexible ETL (extract, transform, load) module that generalizes this idea to arbitrary data sources and intermediate data transformations.
  • Through the Views Bulk Operations module that allows programmable actions to be executed on rows of nodes, as returned by Views queries.

Comments

Thanks for the writeup!

Great thoughts - I agree with previous commenter that if Drupal could get a better grasp of ER Modeling, referential integrity, etc it would really take it to another realm of possibility altogether.

I really appreciated your thoughts on using Taxonomy for classification instead of defining another table of classifications and setting relationships to them.

Thanks!

Object reference model lacking

I think one of the main thing lacking in Drupal is the possibility to mimick a relational database on the level of contenttypes en node relations. The node reference not being reciprocal and having no attributes of its own which can be used to query only certan kinds of relationships, makes the migration of custom build data repositories into drupal problematic.

Thanks for the useful

Thanks for the useful article. I'm looking for a solution which ensures 1:M cardinality with Node relationships module. But I don't think it's exist as a module for now. I tried Relativity but as you mentioned in the article despite its some useful features using node reference provides more functions.