Regenerating your Drupal search index using Batch API

I love to use Views with the core Search module. The combination allows me to create advanced search interfaces using just exposed filters.

So today I was making changes to my search index structure, and I was wondering how to regenerate the million or so entries in my index. Waiting for cron to activate the indexing would surely not do! A quick Web search yielded the module Reindex and I started chanting "There's a module for that!" to my baffled coworker. But my joy was short-lived as I realized that Reindex only works for nodes - and my data is stored in Data tables...

After reading the module's code, I decided to generalize their approach to interactively reindex any type of content. Here's the result, simplified:

Reindexing in action

  • Set up the batch process:
<?php
// @file mymodule.module

/**
 * Implementation of hook_menu().
 */
function mymodule_menu() {
 
$items = array();
 
$items['admin/settings/mymodule/reindex'] = array(
   
'title' => 'Reindex content',
   
'page callback' => 'drupal_get_form',
   
'page arguments' => array('mymodule_reindex'),
   
'access arguments' => array('administer site configuration'),
  );
  return
$items;
}

function
mymodule_reindex() {
 
$batch = array(
   
'title' => t('Regenerating search index'),
   
'operations' => array(
      array(
'_mymodule_reindex_batch', array('data_search')), // data_search indexes Data tables
   
),
   
'finished' => '_recotype_reindex_finished',
  );
 
batch_set($batch);
 
batch_process('');
}

?>
  • Implement the batch process: This is the core function where we invoke hook_update_index on the desired module (in our case data_search, could also be node, user, etc.) to do the indexing, and then hook_search('status') to return the remaining items. The status is used to drive the Batch API progress bar:
<?php
// @file mymodule.module

function _mymodule_reindex_batch($module, &$context) {
 
module_invoke($module, 'update_index');
 
$status = module_invoke($module, 'search', 'status');
 
$completed = $status['total'] - $status['remaining'];
 
$context['finished'] = empty($status['total']) ? 1 : $completed / $status['total'];
 
$context['message'] = t('Now reindexing module !module...', array('!module' => $module));
}

function
_recotype_reindex_finished($success, $results, $operations) {
 
drupal_set_message(t('Reindexed all records.'));
}

?>

That's it! This code can be generalized to show a list of modules that implement hook_update_index, and create one Batch API operation per selected module. Now you too can gaze into the progress bar as it slowly makes it way across the screen.

AttachmentSize
reindexing.png35.28 KB

Comments

Committed a sandbox project

Committed a sandbox project on Drupal.org generalizing the above approach for all modules that expose their own index.

Thank you for this example,

Thank you for this example, it was very helpful to me. I think I found a small typo. shouldn't this line: $status = module_invoke($module, 'search', 'status'); be in fact: $status = module_invoke($module, 'search_status');

I double-checked, but my code

I double-checked, but my code is correct:

$status = module_invoke($module, 'search', 'status');

It is the same statement used in the search module itself, in function search_admin_settings().

I forgot to say I'm using

I forgot to say I'm using drupal 7.

That explains it :-)

That explains it :-)

Very cool. Was just what i

Very cool. Was just what i was looking for.

Couple things though for people stumbling upon this:

  • i am not sure what the module data_search is; i guess a 3rd party module the author is using for non-node related data?? anyway, if you wanted to use this to re-index your standard node data, i think the line:

array('_mymodule_reindex_batch', array('data_search')), - would need to pass array('node')

although, not sure why a better solution wouldn't be simply to have a module_implements('update_index') and then loop through all the modules that are feeding your search index.

also, in that same line, there is an error, it shouldn't be _mymodule... it needs to be array('_recotype_reindex_batch', array('node')),

Thanks for reporting these

Thanks for reporting these issues. I added a comment on the data_search index and fixed the bad function name.