A view of nodes and their translations, part 2: optimizing the Views-generated query

Last week, I described a technique to query and display nodes in all available translations. This worked well enough, but a performance-minded reader pointed out that the query generated by Views (that includes N self-joins for N enabled languages) would not scale to a large number of nodes.

My usual approach when implementing new ideas is to ensure the logic works first, and only handle optimization when needed. It's a strategy that has worked well for me in the past. So I set out to test this hypothesis, and to optimize the query if it was needed. Here's what happened:

Generating content in translation

The first obstacle was to generate a large set of nodes and their translations. Devel Generate, the Devel sub-module that generates Drupal objects for development purposes, does not support content translation at the time of writing. I submitted a D7 patch to the 2 years old feature request to achieve this. I tested it with 10K nodes, and it seems to work well. Your review is appreciated!

Testing the technique on a large set

Having generated 10K nodes and their translations to Arabic and French (30K nodes in total), I cloned the Proverbs view from last time to query and display this content. The result was quite explicit: the view page never finished loading! Clearly, the Views-generated query was not scaling. And for good reason: 3 SQL JOINS of 30,000 records each is a performance black hole. Optimization was needed.

Optimizing the Views-generated query without sacrificing Views functionality

My goal for optimizing the query was to retain all the advantages that Views offers in terms of theming query results, integrating into Drupal pages, etc. - these are indispensable features when creating real-world applications. In short, I wanted to transparently override the Views-generated query. To do so, I needed to:

  • Remove the peformance-killing JOINs from the query
  • Perform an optimized query to find node translations
  • Re-insert the results from the optimized query into the Views results, to allow it to proceed with the display

The code I used follows. I will explain the important parts below.

Remove the peformance-killing JOINs from the query

The function demo_i18n_views_query_alter() removes from the Views query object all references to the SQL JOINs, which are called "relationships" in Views parlance. Views core invokes this hook just before converting the query object into an SQL statement. The resulting query that Views will execute looks like this:

SELECT node.nid AS nid, node.created AS node_created
FROM {node} node
WHERE (( (node.status = '1') AND (node.type IN  ('multilingual_node')) AND (node.tnid = node.nid OR node.tnid = 0) ))

Perform an optimized query to find node translations

The query as modified above will only return nodes that are translation sources. It's now up to me to query the node translations, by waiting for Views to execute the modified query, and then gathering the nids to find their translations (as stored in {node}.tnid). This is a simple query using the SQL IN operator. I call this hand-made query in the demo_i18n_views_post_execute() function, which is invoked by Views after it executes its own query.

Re-insert the results from the optimized query into the Views results

The challenge with the new query is that it returns one node translation per row, as opposed to the original query which returned all translations on the same row. In addition, the results need to be copied into the view::result object, with the right key names that Views expects. In order to find the right key names, I first displayed the results from the unmodified Views query and noted the result keys. With this information, I then proceeded to loop over the optimized query results, and find the corresponding entry in the Views result array that would receive them. This loop is also implemented in the demo_i18n_views_post_execute() function.

Conclusion and future work

The results were impressive! The view page loaded in very acceptable time (ApacheBench reports a mean time of ~1350ms, against ~650ms in the case of a view with just 4 nodes), and Views happily themed the translated nodes as if it had queried them itself. You can see this code in action on my i18n demo site.

The approach of hand-crafting Views queries has been on my mind for a long time, and I'm glad I took the first step. So far, I am not sure that a generic module can be created out of this, mainly due to the necessity to transform the result set after the optimized query is run. In any case, I'll be applying this technique in my projects!

Comments

Its hard for me to picture what this is doing without seeing the before and after SQL query. Funny thing is I just posted a blog post on a tuning a slow Views query and it suffered from a similar problem too many Joins. Solution was to move the intent of what was happening in the JOINS into the WHERE clause.

See http://blogs.mpr.org/developer/2013/06/using-mysqls-slow-query-log-to-pi...

Thanks!

The original query is given in the previous post:

SELECT node_node_1.title AS node_node_1_title, 
            node_node_1.nid AS node_node_1_nid, 
            node_node_1.language AS node_node_1_language, 
            node_node_2.title AS node_node_2_title, 
            node_node_2.nid AS node_node_2_nid, 
            node_node_2.language AS node_node_2_language, 
            node_node.title AS node_node_title, 
            node_node.nid AS node_node_nid, 
            node_node.language AS node_node_language, 
            node.created AS node_created
FROM {node} node
LEFT JOIN {node} node_node 
           ON (node.nid = node_node.tnid OR (node_node.tnid = 0 AND node.nid = node_node.nid)) AND node_node.language = 'ar'
LEFT JOIN {node} node_node_1 
           ON (node.nid = node_node_1.tnid OR (node_node_1.tnid = 0 AND node.nid = node_node_1.nid)) AND node_node_1.language = 'en'
LEFT JOIN {node} node_node_2 
           ON (node.nid = node_node_2.tnid OR (node_node_2.tnid = 0 AND node.nid = node_node_2.nid)) AND node_node_2.language = 'fr'
WHERE (( (node.status = '1') AND (node.type IN  ('proverb')) AND (node.tnid = node.nid OR node.tnid = 0) ))

The modified query looks like this:

SELECT node.nid AS nid, node.created AS node_created
FROM {node} node
WHERE (( (node.status = '1') AND (node.type IN  ('proverb')) AND (node.tnid = node.nid OR node.tnid = 0) ))

In addition, the demo_i18n_views_post_execute() function executes a secondary query to find the node translations:

SELECT node.tnid, node.nid, node.title, node.language
FROM {node}
WHERE node.tnid IN (:nids) OR node.nid IN (:nids)

Thanks for the link to your interesting post!

just a note : the translation of nodes is not related to "i18n" module but the core module "translation" .

and regarding views, i think there ought to be a better solution than using "unset" in query_alter . i think the right way is to remove the relations from the query in views_ui and instead implement a view field handler

one of the reasons i hate views is the lack of documentation, hopefully in drupal 8 they will finally fix that . but here is an example : http://heididev.com/01/11/2011/custom-field-handlers-views-3

maybe run the sql in pre_render : https://api.drupal.org/api/views/handlers!views_handler_field.inc/function/views_handler_field%3A%3Apre_render/7