Retrieval-Augmented Generation (RAG) is a technique that combines retrieval-based and generative AI models to produce highly contextual domain-specific responses. There are numerous applications for RAG: Question & Answering, Summarization, Report Generation, and more. In his recent blog, Yujian Tang (Zilliz) elaborates on the tech stack that is needed to implement a simple conversational RAG application using’s domain specific foundation model, Nebula. Below is a summary of the implementation:

Technology stack

  • Nebula LLM:
    Symbl’s Nebula LLM is a domain specific pre-trained foundation model for interaction data. It serves as the generative model replacing OpenAI’s GPT-3.5 in this project by Zilliz
  • Milvus Vector DB:
    Milvus serves as the vector database for storing vectors derived from conversation data on which similar matching is performed and relevant information is extracted for feeding the Nebula LLM
  • MPNet V2 Embedding Model from Hugging Face:
    MPNet V2, an embedding model from Hugging Face, is used in place of OpenAI embeddings to derive embeddings from conversation data and user search queries
  • LangChain:
    LangChain serves as an orchestration framework. Key components used for creating conversational memory for the application are: the vector store retriever memory, conversation chain, and prompt template object

High Level Implementation 

Output Image (Reference:


As businesses and users start focusing on the specificity of output and outcome from LLMs, the use of domain specific models like Nebula is gaining prominence. The Nebula Embedding Model is also trained on interaction data, and can produce high quality vector embeddings on conversational data and knowledge.

You can read about the full implementation of this RAG-based conversational system, along with code samples,  at the blog published by Zilliz


Avatar photo
Kartik Talamadupula
Director of AI Research

Kartik Talamadupula is a research scientist who has spent over a decade applying AI techniques to business problems in automation, human-AI collaboration, and NLP.