Inside the Codebase: A Deep Dive Into Drupal Rag Integration

Welcome back! After our first article that showed you our new Drupal Rag Integration app, many of you liked it. Thank you! Today, let’s look at how the code works.

Quick Reminder

If you missed our first article or need a quick reminder, you can take a look at it here. It’ll help you understand what we’re talking about today.

How the App is Built

Our integration revolves around a dynamic interaction between a backend designed for intelligent data retrieval and augmentation, and a robust Content Management System (CMS) frontend.

RAG Backend:

  • Vector Database: Chroma
  • Rag Backend Framework: FastAPI with Python
  • Local Language Model Abstraction Layer (OLLAMA): Introduces a layer that allows for local language model processing.
  • LLM Model: Mistral
  • Programming Language: Python 3.6

Website:

  • CMS: Drupal 10
  • Database: MySQL
  • Programming Language: PHP 8.1
Drupal RAG Integration Architecture

Key APIs of the Integration

The heart of our integration lies within these four APIs:

Add Feed API (/feed/add):

  • Method: POST
  • Parameters: node_id (string), data (string)
  • Returns: document_ids (list of strings)
  • Description: Content is divided into smaller chunks via RecursiveCharacterTextSplitter, assigning unique IDs to each chunk. These IDs play a vital role in future update and delete operations. Subsequently, chunks are stored with distinct document_ids in the Vector Database.
@app.post("/feed/add")
def feed_add(feed_data: FeedData = Body(...)):
    nid = feed_data.nid
    data = feed_data.data

    ids = add_docs(nid=nid, data=data)
    # Return ids
    return {"response": "Document successfully added.", "doc_ids": ids}

Update Feed API (/feed/update):

  • Method: POST
  • Parameters: node_id (string), document_ids (list of strings), data (string)
  • Returns: document_ids (array of strings)
  • Description: Using the given document_ids, existing documents are deleted. Identical to the add_feed process, this API repopulates the database with updated information, returning new document_ids.
@app.post("/feed/update")
def feed_update(feed_data: UpdateData = Body(...)):
    nid = feed_data.nid
    ids = feed_data.ids
    data = feed_data.data

    vectordb_manager = VectorDbManager()

    # Delete with ids passed.
    vectordb_manager.delete_ids(ids=ids)

    # Create fresh documents.
    new_ids = add_docs(nid=nid, data=data)

    # return the ids
    return {"response": "Document successfully updated.", "doc_ids": new_ids}

Delete Feed API (/feed/delete):

  • Method: DELETE
  • Parameters: document_ids (array of strings)
  • Returns: HTTP status 200 if successfully deleted
  • Description: Removes documents using their unique identifiers via a Chroma delete query.
@app.post("/feed/delete")
def feed_delete(data: DeleteData = Body(...)):
    ids = data.ids

    # Delete with ids passed.
    vectordb_manager = VectorDbManager()
    vectordb_manager.delete_ids(ids=ids)

    return {"response": "Document successfully deleted."}

Chroma Vector Database

Configurations for the Chroma Vector Database reside in the vector_manager.py under the VectorDbManager.store_data() method, with the default storage location being /chroma_data. This location is pivotal and can be adjusted as per requirement.

vectordb = chroma.Chroma.from_documents(
    documents=chunks,
    embedding=fastembed.FastEmbedEmbeddings(),
    persist_directory="chroma_data",
    ids=ids
)
vectordb.persist()

Drupal Integration Schema and Hooks

We introduce a schema identified as drupal_rag_integration_node_doc to bridge node_id and document_ids. Drupal’s hooks hook_ENTITY_TYPE_insert, hook_ENTITY_TYPE_update, and hook_ENTITY_TYPE_delete operate in tandem with these APIs to manage the node’s lifecycle through JSON-formatted data packets.

/**
 * Implements hook_ENTITY_TYPE_insert() for node entities.
 *
 * @param \Drupal\Core\Entity\EntityInterface $entity
 *   The node entity that was inserted.
 */
function drupal_rag_integration_node_insert(EntityInterface $entity) {
  /** @var RagEntityOperations $entity_operations */
  $entity_operations = \Drupal::service('drupal_rag_integration.entity_operations');
  $entity_operations->handleInsert($entity);
}

/**
 * Implements hook_ENTITY_TYPE_update() for node entities.
 *
 * @param \Drupal\Core\Entity\EntityInterface $entity
 *   The node entity that was updated.
 */
function drupal_rag_integration_node_update(EntityInterface $entity) {
  /** @var RagEntityOperations $entity_operations */
  $entity_operations = \Drupal::service('drupal_rag_integration.entity_operations');
  $entity_operations->handleUpdate($entity);
}

/**
 * Implements hook_ENTITY_TYPE_delete() for node entities.
 *
 * @param \Drupal\Core\Entity\EntityInterface $entity
 *   The node entity that was deleted.
 */
function drupal_rag_integration_node_delete(EntityInterface $entity) {
  /** @var RagEntityOperations $entity_operations */
  $entity_operations = \Drupal::service('drupal_rag_integration.entity_operations');
  $entity_operations->handleDelete($entity);
}

User Interaction Through ASK Form

At the frontend, users engage a form aptly named “ASK” to query the Drupal database and other general inquiries. The form leverages the Ask API for this purpose.

  public function submitForm(array &$form, FormStateInterface $form_state): void {
    $question = $form_state->getValue('question');

    $endpoint = '/ask';
    $payload = json_encode(['question' => $question]);

    $response = $this->apiClient->callApi($endpoint, $payload);

    if (isset($response['response'])) {
      $form_state->set('response', $response['response']);
    } else {
      $form_state->set('response', $this->t('Error occurred: @error', ['@error' => $response['error'] ?? $this->t('Unknown error')]));
    }

    $form_state->setRebuild(TRUE);
  }

Ask API (/ask)

  • Method: POST
  • Parameters: question (string)
  • Returns: Generated response (string)
  • Description: Implements context retrieval from user prompts via the Chroma Vector Database, subsequently passing the augmented query to the Mistral LLM model, which formulates the final response.
@app.post("/ask")
def ask(data: Question = Body(...)):
    question = data.question

    rag_obj = Rag()
    rag_obj.set_retrieve()
    rag_obj.augment()
    response = rag_obj.generate(question)

    return {"response": response}

Watch the Detailed Video Explanation

Hope the article helps in some way. Stay tuned for more articles like this!

Posts in this series

Related Posts