Building a RAG-based Conversational Application with Nebula LLM

Retrieval-Augmented Generation (RAG) is a technique that combines retrieval-based and generative AI models to produce highly contextual domain-specific responses. There are numerous applications for RAG: Question & Answering, Summarization, Report Generation, and more. In his recent blog, Yujian Tang (Zilliz) elaborates on the tech stack that is needed to implement a simple conversational RAG application using Symbl.ai’s domain specific foundation model, Nebula. Below is a summary of the implementation:

Technology stack

Nebula LLM:
Symbl’s Nebula LLM is a domain specific pre-trained foundation model for interaction data. It serves as the generative model replacing OpenAI’s GPT-3.5 in this project by Zilliz
Milvus Vector DB:
Milvus serves as the vector database for storing vectors derived from conversation data on which similar matching is performed and relevant information is extracted for feeding the Nebula LLM
MPNet V2 Embedding Model from Hugging Face:
MPNet V2, an embedding model from Hugging Face, is used in place of OpenAI embeddings to derive embeddings from conversation data and user search queries
LangChain:
LangChain serves as an orchestration framework. Key components used for creating conversational memory for the application are: the vector store retriever memory, conversation chain, and prompt template object

High Level Implementation

Install required libraries such as LangChain, Milvus (Lite), PyMilvus, python-dotenv, and Sentence Transformers
Load API key for Nebula (Register to request a key)
Initiate vector database and embedding model
Create a sample conversation and save it to memory as context using VectorStoreRetrieverMemory
Import Nebula LLM by using the API key
Create a prompt template including user query using PromptTemplate
Create a conversation chaining the Nebula LLM, memory, and prompt using ConversationChain for asking a question

Output Image (Reference: https://zilliz.com/blog/building-rag-apps-without-openai-part-I)

Conclusion

As businesses and users start focusing on the specificity of output and outcome from LLMs, the use of domain specific models like Nebula is gaining prominence. The Nebula Embedding Model is also trained on interaction data, and can produce high quality vector embeddings on conversational data and knowledge.

You can read about the full implementation of this RAG-based conversational system, along with code samples, at the blog published by Zilliz.

Kartik Talamadupula

Director of AI Research

Kartik Talamadupula is a research scientist who has spent over a decade applying AI techniques to business problems in automation, human-AI collaboration, and NLP.

Building a RAG-based Conversational Application with the Nebula LLM

Technology stack

High Level Implementation

Conclusion

Kartik Talamadupula

Director of AI Research

Ready to get started?

Platform

Developers

Company