Work Inquiries
sales@pandco.studio
Ph: +1-905-995-5955
Back

Resolving Solr Duplicates in Drupal 10

Optimizing Drupal’s Search API Solr Module: Reducing Duplicates with Field Groupings


The transition from Drupal 9 to Drupal 10 brings with it a plethora of updates and some challenges, particularly in module compatibility and functionality. Our team at SeCan recently experienced this transition. Generally the transition from Drupal 9 to Drupal 10 was straightforward, but we did encounter out fair share of challenges, particularly caused by the Search API Solr module which allows us to enhance our site’s search performance. This article details our journey and the solutions we implemented to overcome the challenges encountered.

The Challenge: Eliminating Duplicate Search Results

Post Drupal Core upgrade, my team identified that the search functionality for a particular index on our Drupal site was yielding multiple duplicate results. This prompted an investigation into our existing Solr API setup. Our site, which relies on two contributed modules, search_api_solr and search_api_grouping, began exhibiting issues with multiple duplicate results in certain indexes i.e. each search query was resulting in ~10 nodes per field. As of writing this article, the search_api_grouping module pivotal in reducing duplicates is deprecated and not compatible with Drupal 10.

Navigating Deprecation with Innovative Solutions

During our exploration of several options, we discovered that several previously used methods in Drupal 9, like hook_search_api_solr_query_alter, etc, were deprecated and removed in the latest version of the search_api_solr ^4.3 and thus will not work. This finding led us to explore an alternative approach and towards the adoption of EventSubscribers.

So essentially, as opposed to patching the now deprecated search_api_grouping module, we decided to implement a new EventSubscriber in our custom module. The new file, SolrQueryAlterEventSubscriber.php, allowed us to alter the Solr query efficiently and group results on the required fields and ultimately eliminating duplicates.

To achieve this, we first created the file SolrQueryAlterEventSubscriber.php and also updates the yml accordingly. The module structure resembles something like this;

custom_module/
│
├── src/
│   └── EventSubscriber/
│       └── SolrQueryAlterEventSubscriber.php
│
├── custom_module.info.yml
├── custom_module.module
├── custom_module.services.yml
└── README.md

Here’s what the code for the query alter looks like:

<?php

namespace Drupal\custom_module\EventSubscriber;

use Drupal\search_api_solr\Event\PreQueryEvent;
use Drupal\search_api_solr\Event\SearchApiSolrEvents;
use Drupal\search_api\Query\QueryInterface as SapiQueryInterface;
use Solarium\Core\Query\QueryInterface as SolariumQueryInterface;
use Symfony\Component\EventDispatcher\EventSubscriberInterface;


/**
 * Alters the query where necessary to implement business logic.
 * 
 * This hook is called after the select query is finally converted into an
 * expression that meets the requirements of the targeted query parser.
 * Replaces the --> hook_search_api_solr_query_alter().
 *
 * @package Drupal\custom_module\EventSubscriber
 */

class SolrQueryAlterEventSubscriber implements EventSubscriberInterface {

  /**
   * {@inheritdoc}
   */
  public static function getSubscribedEvents(): array {
    return [
      SearchApiSolrEvents::PRE_QUERY => 'alterSolrQuery',
    ];
  }

  /**
   * Alters the Solr query.
   * Adds a group parameter to the query.
   * This essentially groups the results by the field specified in the group.field parameter.
   * Ultimately the goal is to reduce duplicate results.
   *
   * @param PreQueryEvent $event
   *   The pre-query event.
   */

  public function alterSolrQuery(PreQueryEvent $event) {

    $solarium_query = $event->getSolariumQuery();
    $search_api_query = $event->getSearchApiQuery();
    $index = $search_api_query->getIndex();
    $index_id = $index->id();
   
    if ($index_id === 'custom_index') {  // Only alter the required index query.

      // Add the group, group.field, and indent parameters.
      // $solarium_query->addParam('q.op', 'OR');
      $solarium_query->addParam('group', 'true');
      $solarium_query->addParam('group.field', 'its_field_name'); // The field to group by.
      $solarium_query->addParam('group.main', 'true');  // Use the result of the last field grouping as the main result.

      //$params = $solarium_query->getParams(); //verify the new parameters are altered 
      //\Drupal::logger('Query Params')->notice('Modified Solr query params: ' . print_r($params, TRUE));

    }

  }

}

Finally, we defined the custom_module.services.yml file. Don’t forget to clear cache using drush cr or from devel.

services:
  custom_module.solr_query_subscriber:
    class: Drupal\custom_module\EventSubscriber\SolrQueryAlterEventSubscriber
    tags:
      - { name: event_subscriber }

By implementing the EventSubscriber, we successfully adapted our search functionality to the new Drupal environment. In my opinion, this not only resolved our duplicate issue but also set a precedent for future upgrades and modifications in our Drupal system. Perhaps hooks are not the only way to alter or modify modules, let me know what you think down below?

Could we have done this differently?

References

https://www.drupal.org/docs/develop/creating-modules/subscribe-to-and-dispatch-events

https://www.drupal.org/docs/8/modules/search-api-solr/search-api-solr-howtos/how-to-replace-the-deprecated-hook-hook_search_api_solr_query_alter-with-prequeryevent

https://cwiki.apache.org/confluence/display/solr/FieldCollapsing

Paul
Paul
https://pandco.studio

Leave a Reply

Your email address will not be published. Required fields are marked *