Day 28 - Building a Simple Chatbot Using Traditional NLP Techniques
On Day 28 of the challenge, I tackled the task of building a simple chatbot using traditional NLP techniques. The goal was to implement the chatbot in two ways:
- Without Vectorization – Using a simple rule-based approach with pattern matching.
- With Vectorization – Using Bag of Words (BoW) and Cosine Similarity to improve the chatbot’s flexibility.
If you want to see the code, you can find it here: GIT REPO.
Approach 1: Without Vectorization
In this approach, I implemented the chatbot using basic pattern matching without vectorization. The chatbot attempts to match user inputs with predefined patterns and respond based on those matches. This method uses Bag of Words (BoW) directly by comparing the words in the user’s input with the patterns.
Steps Taken:
- Define Intents:
- Created a dictionary of intents where each intent had patterns (common ways the user might phrase a query) and responses (how the chatbot should reply).
- Preprocess Input:
- The user input is converted to lowercase, punctuation is removed, and it’s tokenized into words to prepare it for pattern matching.
- Pattern Matching:
- For each user input, the chatbot compares the all the words with the predefined patterns for each intent and returns the most relevant intent.
- Generate Responses:
- Once an intent is matched, the chatbot picks a random response from the predefined responses for that intent.
- Create Chat loop:
- Now, we need to create a while loop where the chatbot keeps asking for input and responds until the user says something like “bye” or “exit”.
Code:
# Problem: Build a simple chatbot using traditional NLP techniques
# Without vectorization
import re
import random
# Step 1: Create a dataset of intents and responses
intents = {
'greeting': {
'patterns': ['hello', 'hi', 'hey', 'good morning', 'good evening', 'good afternoon'],
'responses': ['Hello!', 'Hi there!', 'Greetings!', 'Good day!', 'Hey! How can I help you today?']
},
'farewell': {
'patterns': ['bye', 'goodbye', 'see you later', 'farewell', 'take care'],
'responses': ['Goodbye!', 'Take care!', 'See you later!', 'Farewell!', 'Have a great day!']
},
'thanks': {
'patterns': ['thank you', 'thanks', 'thank you so much', 'much appreciated', 'thanks a lot'],
'responses': ['You’re welcome!', 'Glad I could help!', 'Anytime!', 'My pleasure!', 'No problem!']
},
'bot_name': {
'patterns': ['what is your name', 'who are you', 'tell me your name'],
'responses': ['I’m your friendly chatbot!', 'You can call me Chatbot!', 'I am a chatbot created to help you.']
},
'bot_purpose': {
'patterns': ['what can you do', 'how can you help me', 'what do you do'],
'responses': ['I can assist you with basic queries, answer questions, and chat with you!', 'I’m here to help you with anything you need.', 'I can chat with you and answer simple questions!']
},
'feeling': {
'patterns': ['how are you', 'how are you doing', 'are you okay'],
'responses': ['I’m just a bot, but I’m doing great!', 'I’m feeling helpful today!', 'I’m here to help you, so I’m doing well!']
},
'age': {
'patterns': ['how old are you', 'what is your age', 'when were you created'],
'responses': ['I don’t have an age like humans, but I’m always learning!', 'Age is just a number, and I don’t have one!', 'I was created recently to help you out!']
},
'weather': {
'patterns': ['what is the weather', 'how is the weather today', 'tell me the weather'],
'responses': ['I can’t check the weather right now, but you can check your weather app!', 'I don’t have access to weather data, but I hope it’s nice outside!', 'Check your weather app for accurate information!']
},
'joke': {
'patterns': ['tell me a joke', 'make me laugh', 'tell a joke'],
'responses': ['Why did the computer go to the doctor? Because it had a virus!', 'Why don’t robots have brothers? Because they all have trans-sisters!', 'I’d tell you a joke about UDP, but you might not get it!']
},
'help': {
'patterns': ['help me', 'i need help', 'can you help me'],
'responses': ['Sure, I’m here to help! What do you need?', 'Of course! Let me know how I can assist you.', 'I’m happy to help. Please tell me what you need assistance with!']
},
'unknown': {
'patterns': ['who is the president', 'where is the moon', 'how to cook pasta', 'tell me a story'],
'responses': ['Sorry, I’m not sure how to answer that.', 'I don’t have the answer to that right now.', 'Hmm, I don’t know, but I can find out!']
}
}
# Step 2: Preprocess user input
def preprocess(text):
text = text.lower() # Conver to lowercase
text = re.sub(r'[^\w\s]', '', text) # Remove punctuation
tokens = text.split() # Tokenization by splitting
return tokens
# Step 3: Implement pattern matching with bag of words (BOW)
def match_intent(processed_input, intents):
for intent, intent_data in intents.items():
patterns = intent_data['patterns']
for pattern in patterns:
processed_pattern = preprocess(pattern)
if all(word in processed_input for word in processed_pattern):
return intent
return None
# Step 4: Generate the response
def get_response(intent, intents):
return random.choice(intents[intent]['responses'])
def chatbot():
print("Hey! I am a 28_chatbot, How can i help you? Type bye to exit")
while True:
user_input = input("You: ")
if user_input.lower() == 'bye':
print("Chatbot: Good Bye!")
break
processed_input = preprocess(user_input)
matched_intent = match_intent(processed_input=processed_input, intents=intents)
if matched_intent:
response = get_response(intent=matched_intent, intents=intents)
print(f"Chatbot: {response}")
else:
print(f"Chatbot: Sorry i could not understand that.")
# Run the chatbot
chatbot()
Example Interaction:
You: Hey
Chatbot: Good day!
You: How can you help me
Chatbot: I can chat with you and answer simple questions!
You: Who is the president
Chatbot: Sorry, I’m not sure how to answer that.
You: Who will win
Chatbot: Sorry i could not understand that.
You: bye
Chatbot: Good Bye!
Approach 2: With Vectorization
In this second approach, I enhanced the chatbot using CountVectorizer from scikit-learn to vectorize the input and the patterns. This allowed for more flexible matching using cosine similarity between user input and predefined patterns.
Steps Taken:
- Vectorization of Patterns:
- I used CountVectorizer to transform the predefined patterns into vectors (Bag of Words).
- Vectorize User Input:
- When the user inputs a sentence, it is vectorized using the same CountVectorizer that was trained on the patterns.
- Cosine Similarity:
- I used Cosine Similarity to compare the user input vector with all pattern vectors, identifying the most similar pattern and its corresponding intent.
- Generate Responses:
- After matching the most similar intent, the chatbot responds with a random response from that intent’s list of responses.
Code:
# Problem: Build a simple chatbot using traditional NLP techniques
# With Vectorization
import random
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
# Step 1: Create a dataset of intents and responses
intents = {
'greeting': {
'patterns': ['hello', 'hi', 'hey', 'good morning', 'good evening', 'good afternoon'],
'responses': ['Hello!', 'Hi there!', 'Greetings!', 'Good day!', 'Hey! How can I help you today?']
},
'farewell': {
'patterns': ['bye', 'goodbye', 'see you later', 'farewell', 'take care'],
'responses': ['Goodbye!', 'Take care!', 'See you later!', 'Farewell!', 'Have a great day!']
},
'thanks': {
'patterns': ['thank you', 'thanks', 'thank you so much', 'much appreciated', 'thanks a lot'],
'responses': ['You’re welcome!', 'Glad I could help!', 'Anytime!', 'My pleasure!', 'No problem!']
},
'bot_name': {
'patterns': ['what is your name', 'who are you', 'tell me your name'],
'responses': ['I’m your friendly chatbot!', 'You can call me Chatbot!', 'I am a chatbot created to help you.']
},
'bot_purpose': {
'patterns': ['what can you do', 'how can you help me', 'what do you do'],
'responses': ['I can assist you with basic queries, answer questions, and chat with you!', 'I’m here to help you with anything you need.', 'I can chat with you and answer simple questions!']
},
'feeling': {
'patterns': ['how are you', 'how are you doing', 'are you okay'],
'responses': ['I’m just a bot, but I’m doing great!', 'I’m feeling helpful today!', 'I’m here to help you, so I’m doing well!']
},
'age': {
'patterns': ['how old are you', 'what is your age', 'when were you created'],
'responses': ['I don’t have an age like humans, but I’m always learning!', 'Age is just a number, and I don’t have one!', 'I was created recently to help you out!']
},
'weather': {
'patterns': ['what is the weather', 'how is the weather today', 'tell me the weather'],
'responses': ['I can’t check the weather right now, but you can check your weather app!', 'I don’t have access to weather data, but I hope it’s nice outside!', 'Check your weather app for accurate information!']
},
'joke': {
'patterns': ['tell me a joke', 'make me laugh', 'tell a joke'],
'responses': ['Why did the computer go to the doctor? Because it had a virus!', 'Why don’t robots have brothers? Because they all have trans-sisters!', 'I’d tell you a joke about UDP, but you might not get it!']
},
'help': {
'patterns': ['help me', 'i need help', 'can you help me'],
'responses': ['Sure, I’m here to help! What do you need?', 'Of course! Let me know how I can assist you.', 'I’m happy to help. Please tell me what you need assistance with!']
},
'unknown': {
'patterns': ['who is the president', 'where is the moon', 'how to cook pasta', 'tell me a story'],
'responses': ['Sorry, I’m not sure how to answer that.', 'I don’t have the answer to that right now.', 'Hmm, I don’t know, but I can find out!']
}
}
vectorizer = CountVectorizer()
# Train the vectorizer on all patterns
all_patterns = []
intent_labels = []
for intent, intent_data in intents.items():
patterns = intent_data['patterns']
all_patterns.extend(patterns) # Combine all patterns
intent_labels.extend([intent] * len(patterns)) # Track of which pattern belongs to which intent
# Fit the vectorizer
X = vectorizer.fit_transform(all_patterns) # Build the vocubulary on all the patterns.
# Function to find the best matching intent
def match_intent(user_input):
user_vec = vectorizer.transform([user_input])
similarity_scores = cosine_similarity(user_vec, X) # Compare user input to all patterns
best_matching_intent_idx = similarity_scores.argmax() # Get the maximum value simarity index
return intent_labels[best_matching_intent_idx]
def chatbot():
print("Hey, How can i help you? Type bye to exit")
while True:
user_input = input("You: ")
if user_input.lower() == 'bye':
print("Good bye!")
break
matched_intent = match_intent(user_input)
responce = random.choice(intents[matched_intent]['responses'])
print(f"Chatbot: {responce}")
# Run the chatbot
chatbot()
Example Interaction:
Hey, How can i help you? Type bye to exit
You: Hey
Chatbot: Hi there!
You: How can you help me
Chatbot: I’m here to help you with anything you need.
You: Who is the president
Chatbot: Sorry, I’m not sure how to answer that.
You: Who will win
Chatbot: I am a chatbot created to help you.
You: bye
Good bye!
Improvements with Vectorization:
- Flexible Matching: The use of cosine similarity allowed the chatbot to match user input more flexibly, even if the words used weren’t exactly the same as the predefined patterns.
- Scalability: This approach scales better for larger datasets, as it can handle more complex inputs and varied phrasing
Gratitude
On Day 28, I implemented two versions of a chatbot using traditional NLP techniques:
- Without Vectorization – A basic rule-based approach with exact word matching.
- With Vectorization – A more flexible approach using CountVectorizer and cosine similarity for better input matching.
Learning the basics is always fun, especially to understand how even the simplest forms of technology, like ChatGPT, are built.
Stay Tuned!