Song Recommender: Building a RAG Application for Beginners From Scratch

Have you been trying to understand RAG and read a bunch of articles, yet feel overwhelmed by how complicated it seems to implement? You’ve just hit the jackpot. In this article, I aim to demystify the RAG concept and demonstrate its utility by building a hands-on song recommender application – all without unnecessary complexity. There’s no need for a deep understanding of AI and machine learning; basic knowledge of Python and a willingness to learn something new are all that’s required. So, without further ado, let’s get started.

To know more about RAG - Read my preious article here

Imagine this: you’re humming a tune or a string of lyrics is stuck in your head. You loved the song and want to stay in the groove by listening to similar songs. That’s exactly where our RAG application comes into play. Just type in the lyrics or mention the genre, and voila! Our application recommends a song that aligns with your input.

Let’s build a small database of songs with their titles and details, where ‘detail’ includes a few lines from the lyrics of the song and its genre.

songs_corpus = [
  {"title": "Bohemian Rhapsody", "detail": "lyrics: Is this the real life?; genre: Roc"},
  {"title": "Shake It Off", "detail": "lyrics: Players gonna play, hate; genre: Pop"},
  {"title": "Thriller", "detail": "lyrics: Cause this is thriller; genre: Pop"},
  {"title": "Rolling in the Deep", "detail": "lyrics: There's a fire start; genre: Pop"},
  {"title": "Smells Like Teen Spirit", "detail": "lyrics: With the lights out; genre: Gru"},
  {"title": "Hotel California", "detail": "lyrics: On a dark desert hwy; genre: Roc"},
  {"title": "Sweet Child o' Mine", "detail": "lyrics: She's got eyes blue; genre: Roc"},
  {"title": "Wonderwall", "detail": "lyrics: Because maybe, save; genre: Alt"},
  {"title": "Billie Jean", "detail": "lyrics: But the kid is not; genre: Pop"},
  {"title": "Firework", "detail": "lyrics: Do you ever feel so; genre: Pop"}
]

To find the similarity between the user input and the songs of the corpus, we are going to use Jaccard Similarity.

let’s understand with an example:

Set A: {“I”, “love”, “cats”, “and”, “dogs”}

Set B: {“We”, “love”, “cats”, “not”, “dogs”}

Jaccard Similarity = Intersection(A, B) / Union(A, B)

Intersection(A, B) = common words in A and B = love, cats, dogs = 3 words

Union(A, B) = All unique words in A and B = I, love, cats, and, dogs, we, not = 7 words

Jaccard Similarity = 3 / 7 = 0.42

Let’s code it up.

def tokenize(text):
  return set(text.lower().split(" "))

# Mesaures similarity between two data sets.
# Jaccard Index = Intersection (A, B) / Union (A, B)
def jaccard_similarity(query, document):
  tokenize_query = tokenize(query)
  tokenize_document = tokenize(document)

  intersection = tokenize_query.intersection(tokenize_document)
  union = tokenize_query.union(tokenize_document)

  similarity = len(intersection) / len(union)
  return similarity

Now use the jaccard_similarity function to get the song with maximum similarity from our songs corpus.

def get_relevant_document(query):
  relavant_song_title = ''
  max_similarity = 0
  for song in songs_corpus:
    detail = song['detail']

    similarity = jaccard_similarity(query, detail)
    if similarity > max_similarity:
      max_similarity = similarity
      relavant_song_title = song['title']

  return relavant_song_title

Let’s test what we have built so far.

user_input = input("Tell me what are you thinking, i will recommand a song.\n")
relevant_document = get_relevant_document(user_input)

print("I reccomed you to listen: " + relevant_document)

Output:

Relevant songs from the corpus

Congratulations, it works! This process of retrieving the relevant data from your own content is known as Retrieval. Now, let’s test one more example.

Output:

Relevant songs from the corpus example 2

It did not work with the negative case. That’s where we need a large language model to fallback on.

Itegrate LLM:

Augmentation: We will augment this relevant song with the original user input and prepare a prompt to pass to the LLM for generating a response.

#Augment the relavant song to original query.
prompt = f'''
You are a bot that makes recommendations for songs. You answer in very short sentences and do not include extra information.
This is the recommended song: {relevant_document}
The user input is: {user_input}
Compile a recommendation to the user based on the recommended song and the user input. If the user has no interset simple deny.
'''

Generation

This step is pretty simple: pass the prompt we have generated to the LLM. We are using Ollama to run the LLM (llama3) locally, and Langchain to invoke it.

More about ollama and how to run at local, LINK

Langchain: Langchain is a framework designed to simplify the creation of applications using large language models. Learn more about Langchain.

The code is straightforward: import the LLM package from langchain_community and use its Ollama() method to specify which LLM model you want to use for your application. I am using llama3:latest. Once you have created the llm object, use the invoke method to pass the prompt and invoke the LLM to generate the response. We wrap all of this in the get_response() function.

from langchain_community.llms import ollama

def get_response(prompt):
  llm = ollama.Ollama(model="llama3:latest")
  response = llm.invoke(prompt)

  return response

Final touch: call the get_response and print to the user.

response = get_response(prompt)

print("I reccomed you to listen: " + response)

Test it on the same inputs.

Attempt 1: RAG Recommandar app example 1


Attempt 1: RAG Recommandar app example 2

Congratulations! You have built a RAG application from scratch.

In the next article, we will delve a little deeper. Stay tuned!

Full code

from langchain_community.llms import ollama

songs_corpus = [
  {"title": "Bohemian Rhapsody", "detail": "lyrics: Is this the real life?; genre: Roc"},
  {"title": "Shake It Off", "detail": "lyrics: Players gonna play, hate; genre: Pop"},
  {"title": "Thriller", "detail": "lyrics: Cause this is thriller; genre: Pop"},
  {"title": "Rolling in the Deep", "detail": "lyrics: There's a fire start; genre: Pop"},
  {"title": "Smells Like Teen Spirit", "detail": "lyrics: With the lights out; genre: Gru"},
  {"title": "Hotel California", "detail": "lyrics: On a dark desert hwy; genre: Roc"},
  {"title": "Sweet Child o' Mine", "detail": "lyrics: She's got eyes blue; genre: Roc"},
  {"title": "Wonderwall", "detail": "lyrics: Because maybe, save; genre: Alt"},
  {"title": "Billie Jean", "detail": "lyrics: But the kid is not; genre: Pop"},
  {"title": "Firework", "detail": "lyrics: Do you ever feel so; genre: Pop"}
]

def tokenize(text):
  return set(text.lower().split(" "))

# Mesaures similarity between two data sets.
# Jaccard Index = Intersection (A, B) / Union (A, B)
def jaccard_similarity(query, document):
  tokenize_query = tokenize(query)
  tokenize_document = tokenize(document)

  intersection = tokenize_query.intersection(tokenize_document)
  union = tokenize_query.union(tokenize_document)

  similarity = len(intersection) / len(union)
  return similarity

def get_relevant_document(query):
  relavant_song_title = ''
  max_similarity = 0
  for song in songs_corpus:
    detail = song['detail']

    similarity = jaccard_similarity(query, detail)
    if similarity > max_similarity:
      max_similarity = similarity
      relavant_song_title = song['title']

  return relavant_song_title

user_input = input("Tell me what are you thinking, i will recommand a song.\n")
relevant_document = get_relevant_document(user_input)

#Augment the relavant song to original query.
prompt = f'''
You are a bot that makes recommendations for songs. You answer in very short sentences and do not include extra information.
This is the recommended song: {relevant_document}
The user input is: {user_input}
Compile a recommendation to the user based on the recommended song and the user input. If the user has no interset simple deny.
'''

def get_response(prompt):
  llm = ollama.Ollama(model="llama3:latest")
  response = llm.invoke(prompt)

  return response

response = get_response(prompt)

print("I reccomed you to listen: " + response)

Github Repository

You can find the github repository here

Video Explaination

Posts in this series

Related Posts