Inside the Codebase: A Deep Dive Into Drupal Rag Integration
Welcome back! After our first article that showed you our new Drupal Rag Integration app, many of you liked it. Thank you! Today, let’s look at how the code works.
Quick Reminder
If you missed our first article or need a quick reminder, you can take a look at it here. It’ll help you understand what we’re talking about today.
How the App is Built
Our integration revolves around a dynamic interaction between a backend designed for intelligent data retrieval and augmentation, and a robust Content Management System (CMS) frontend.
RAG Backend:
- Vector Database: Chroma
- Rag Backend Framework: FastAPI with Python
- Local Language Model Abstraction Layer (OLLAMA): Introduces a layer that allows for local language model processing.
- LLM Model: Mistral
- Programming Language: Python 3.6
Website:
- CMS: Drupal 10
- Database: MySQL
- Programming Language: PHP 8.1
Key APIs of the Integration
The heart of our integration lies within these four APIs:
Add Feed API (/feed/add):
- Method: POST
- Parameters: node_id (string), data (string)
- Returns: document_ids (list of strings)
- Description: Content is divided into smaller chunks via
RecursiveCharacterTextSplitter
, assigning unique IDs to each chunk. These IDs play a vital role in future update and delete operations. Subsequently, chunks are stored with distinct document_ids in the Vector Database.
@app.post("/feed/add")
def feed_add(feed_data: FeedData = Body(...)):
nid = feed_data.nid
data = feed_data.data
ids = add_docs(nid=nid, data=data)
# Return ids
return {"response": "Document successfully added.", "doc_ids": ids}
Update Feed API (/feed/update):
- Method: POST
- Parameters: node_id (string), document_ids (list of strings), data (string)
- Returns: document_ids (array of strings)
- Description: Using the given document_ids, existing documents are deleted. Identical to the add_feed process, this API repopulates the database with updated information, returning new document_ids.
@app.post("/feed/update")
def feed_update(feed_data: UpdateData = Body(...)):
nid = feed_data.nid
ids = feed_data.ids
data = feed_data.data
vectordb_manager = VectorDbManager()
# Delete with ids passed.
vectordb_manager.delete_ids(ids=ids)
# Create fresh documents.
new_ids = add_docs(nid=nid, data=data)
# return the ids
return {"response": "Document successfully updated.", "doc_ids": new_ids}
Delete Feed API (/feed/delete):
- Method: DELETE
- Parameters: document_ids (array of strings)
- Returns: HTTP status 200 if successfully deleted
- Description: Removes documents using their unique identifiers via a Chroma delete query.
@app.post("/feed/delete")
def feed_delete(data: DeleteData = Body(...)):
ids = data.ids
# Delete with ids passed.
vectordb_manager = VectorDbManager()
vectordb_manager.delete_ids(ids=ids)
return {"response": "Document successfully deleted."}
Chroma Vector Database
Configurations for the Chroma Vector Database reside in the vector_manager.py
under the VectorDbManager.store_data()
method, with the default storage location being /chroma_data
. This location is pivotal and can be adjusted as per requirement.
vectordb = chroma.Chroma.from_documents(
documents=chunks,
embedding=fastembed.FastEmbedEmbeddings(),
persist_directory="chroma_data",
ids=ids
)
vectordb.persist()
Drupal Integration Schema and Hooks
We introduce a schema identified as drupal_rag_integration_node_doc
to bridge node_id and document_ids. Drupal’s hooks hook_ENTITY_TYPE_insert
, hook_ENTITY_TYPE_update
, and hook_ENTITY_TYPE_delete
operate in tandem with these APIs to manage the node’s lifecycle through JSON-formatted data packets.
/**
* Implements hook_ENTITY_TYPE_insert() for node entities.
*
* @param \Drupal\Core\Entity\EntityInterface $entity
* The node entity that was inserted.
*/
function drupal_rag_integration_node_insert(EntityInterface $entity) {
/** @var RagEntityOperations $entity_operations */
$entity_operations = \Drupal::service('drupal_rag_integration.entity_operations');
$entity_operations->handleInsert($entity);
}
/**
* Implements hook_ENTITY_TYPE_update() for node entities.
*
* @param \Drupal\Core\Entity\EntityInterface $entity
* The node entity that was updated.
*/
function drupal_rag_integration_node_update(EntityInterface $entity) {
/** @var RagEntityOperations $entity_operations */
$entity_operations = \Drupal::service('drupal_rag_integration.entity_operations');
$entity_operations->handleUpdate($entity);
}
/**
* Implements hook_ENTITY_TYPE_delete() for node entities.
*
* @param \Drupal\Core\Entity\EntityInterface $entity
* The node entity that was deleted.
*/
function drupal_rag_integration_node_delete(EntityInterface $entity) {
/** @var RagEntityOperations $entity_operations */
$entity_operations = \Drupal::service('drupal_rag_integration.entity_operations');
$entity_operations->handleDelete($entity);
}
User Interaction Through ASK Form
At the frontend, users engage a form aptly named “ASK” to query the Drupal database and other general inquiries. The form leverages the Ask API for this purpose.
public function submitForm(array &$form, FormStateInterface $form_state): void {
$question = $form_state->getValue('question');
$endpoint = '/ask';
$payload = json_encode(['question' => $question]);
$response = $this->apiClient->callApi($endpoint, $payload);
if (isset($response['response'])) {
$form_state->set('response', $response['response']);
} else {
$form_state->set('response', $this->t('Error occurred: @error', ['@error' => $response['error'] ?? $this->t('Unknown error')]));
}
$form_state->setRebuild(TRUE);
}
Ask API (/ask)
- Method: POST
- Parameters: question (string)
- Returns: Generated response (string)
- Description: Implements context retrieval from user prompts via the Chroma Vector Database, subsequently passing the augmented query to the Mistral LLM model, which formulates the final response.
@app.post("/ask")
def ask(data: Question = Body(...)):
question = data.question
rag_obj = Rag()
rag_obj.set_retrieve()
rag_obj.augment()
response = rag_obj.generate(question)
return {"response": response}
Watch the Detailed Video Explanation
Hope the article helps in some way. Stay tuned for more articles like this!
Posts in this series
- Inside the Codebase: A Deep Dive Into Drupal Rag Integration
- Build Smart Drupal Chatbots With RAG Integration and Ollama
- DocuMentor: Build a RAG Chatbot With Ollama, Chroma & Streamlit
- Song Recommender: Building a RAG Application for Beginners From Scratch
- Retrieval Augmented Generation (RAG): A Beginner’s Guide to This Complex Architecture.