As the benefits and potential of learning language models (LLMs) become more apparent, increasing numbers of organizations are looking for ways to incorporate them into their business processes. LLM chaining frameworks provide an efficient and cost-effective way for companies to test the capabilities of language models and discover for themselves how to best leverage generative AI to achieve their long-term objectives. 

With that in mind, this guide explores the concept of chaining in LLMs, the different LLM chaining frameworks on offer, and how to determine which framework is best for your company’s particular needs.

What is LLM chaining?

LLM chaining is the process of connecting large language models to other applications, tools, and services to produce the best possible output from a given input. Connecting an LLM to additional components enables you to create feature-rich generative AI applications, called LLM chains, that are better suited to your particular use case. 

The most common example of LLM chaining is the practice of connecting a pre-trained language model to an external data source to optimize its responses, i.e., retrieval-augmented generation (RAG). In such an LLM chain, the user’s prompt will be used to query a data source, such as a vector database, for information relevant to the user’s input. The information will then be retrieved and used to augment the model’s response to enhance its accuracy and relevance.  

Subsequently, an LLM chaining framework is a collection of components, libraries, and tools that allow developers to create LLM chains. By providing all the elements necessary to create highly functional LLM applications, development teams can save considerable time and resources when building AI-powered solutions.   

Examples of generative applications you can create with LLM chaining frameworks include: 

  • Chatbot Assistants: chaining frameworks contain the components to build performant chatbots. This includes connecting it to a data store so it can access domain-specific or proprietary information and adding short or long-term memory for recalling prior conversations.
  • Document Analysis: an LLM chain enables you to significantly increase the maximum size of documents that a model can analyse. Connecting a model to tools such as text splitters and chunkers allows LLM applications to process larger documents than a foundation model can by default.    
  • Text Summarisers: an LLM chain that connects to external information sources allows an application to accurately summarise texts from specialized knowledge domains.   
  • Semantic Search: LLM chaining frameworks provide the data retrieval, indexing, and storage mechanisms required to build performant search applications. 
  • Question Answering (QA) Applications: chaining frameworks provide integrations with a wide variety of data types and sources, making them ideal for developing LLM-powered QA applications.   

Different Chaining Frameworks 

Let’s turn our attention to some of the leading LLM chaining frameworks, including their key components and features, their pros and cons, and when it’s best to use them.  


LangChain is an open-source framework available in Python and Javascript that enables developers to combine LLMs with other tools and systems to create an array of end-to-end AI applications. In addition to choosing from a vast selection of off-the-shelf modules, developers can create LLM chains with LangChain Expression Language (LECL), a simple declarative programming language that can further streamline the lifecycle for developing LLM-based applications. 

The LangChain ecosystem is comprised of the following:

  • LangChain Libraries: the core of the framework that contains modules, known as primitives, and a runtime for creating chains and agents. Library modules include:
    • Model I/O: functionality related to interacting with a large selection of language and chat models, which includes managing input prompts and parsing output.
    • Chains: wrappers that link modules together, including individual modules and pre-composed chains.  LangChain features two categories of ready-made chains: 
      • Generic Chains: a general purpose chain used to link other application components together.
      • Utility Chains: chains used to achieve specific tasks, e.g., APIChain, which uses an LLM to convert a query into an API request, executes said request and receives a response, and then passes that response back to the LLM.
    • Retrieval: components that facilitate the storage and access of external information used by LLM-powered applications, including embedding models, document retrievers, text splitters, and vector databases. 
    • Agents: tools that receive instructions from user input or LLM output and use them to carry out pre-defined actions automatically. 
    • Memory: modules that endow LLM applications with short and long-term memory modules, enabling them to retain a persistent state while in operation, e.g., a conversation history for a chatbot application. 
  • LangChain Templates: a collection of reference architectures that cover a variety of use cases that developers can use as a starting point for bespoke AI applications. 
  • LangServe: tools that allow developers to deploy an LLM chain as a REST API.
  • LangSmith: a development platform that integrates with LangChain for the streamlined debugging, testing, and monitoring of production-grade LLM applications. 

Pros and Cons of LangChain

  • Comprehensive Library: LangChain provides a broad library of ready-made LLM chains and modules that give developers the means to craft solutions suited to a comprehensive range of use cases.
  • Active Development Community: LangChain has a sizable active user base, making it easier to find a solution when troubleshooting or seeking advice when you encounter a more pressing problem. A larger community also means more contributions of custom LLM chains, tools, and functionality; the C# implementation of LangChain is an excellent example of this. 
  • Extensive Collection of Integrations: LangChain offers hundreds of integrations including compatibility with a wide assortment of LLMs from the leading vendors, HuggingFace, and Cohere; vector databases like ChromaDB and Pinecone; and cloud service providers such as GCP and AWS. 
  • Powerful Prompt Engineering Capabilities: LangChain’s prompt templates simplify the process of precisely defining how input and output are formatted, allowing for more efficient and robust LLM applications. 
  • Lacks LlamaIndex’s Search and Retrieval Capabilities: despite having the larger and more comprehensive library of the two frameworks, LangChain’s data retrieval, indexing, and querying functionality isn’t as powerful as LlamaIndex’s plug-and-play search and storage capabilities.  

When Should You Use LangChain? 

LangChain is a good choice of framework if you’re just getting started with LLM chains, or LLM application development in general. Its selection of out-of-the-box chains and relative simplicity make it well-suited for experimentation and prototyping custom LLM chains. 


LlamaIndex is a flexible Python and Typescript-based framework that specializes in chaining LLMs to a variety of external data sources. Its powerful data connection and indexing capabilities make LlamaIndex particularly well-suited for creating data-centric LLM applications, such as those that utilize RAG to improve or optimize language model output. 

Key components of the LlamaIndex framework include:

  • LlamaHub: a large library of data connectors that enable the automatic ingestion of unstructured, structured, and semi-structured data from over 100 different sources, such as databases and APIs.
  • Indexing: components that allow you to create data indices that LLM-based applications can traverse to speed up data retrieval. LlamaIndex also facilitates easy index updating – as opposed to recreating the index from scratch whenever you insert new data. 
  • Engines: core components that connect LLMs to data sources so you can access the documents therein. LlamaIndex features two types of engines: 
    • Query Engines: retrieval interfaces that allow you to augment model output with documents accessed by your data connectors. This includes the ability to query multiple data sources, as well as selecting where to retrieve data from depending on the nature of the query, e.g., from a SQL or vector database
    • Chat Engines: by tracking conversation history and retaining input for subsequent queries, chat engines enable multi-message querying with data
  • Agents: automatic reasoning engines that take user input and can internally determine the best course of action to return the optimal response.  
  • LlamaPacks: templates of real-world RAG applications created with LlamaIndex. 

Pros and Cons of LlamaIndex

  • Data Processing and Retrieval: undoubtedly, LlamaIndex’s strength is its ready-made data connectors, indices, and query engines,  which provide a simple yet powerful way for an LLM to access knowledge it wasn’t exposed to during training and give it memory. 
  • Diverse Integrations: LlamaIndex can be integrated with an array of third-party tools and services. This, notably, includes LangChain, giving developers access to additional tools and functionality not offered by LlamaIndex’s libraries. 
  • Multi-Modal Support: LlamaIndex provides data connectors for images in addition to text documents. 
  • Library isn’t as large or diverse as LangChains: while LlamaIndex performs data processing and retrieval tasks better than LangChain, it lacks its overall flexibility and versatility.
  • Not as Intuitive: while getting to grips with LlamaIndex isn’t especially challenging, it’s not as intuitive to use as other LLM chaining frameworks.  

When Should You Use LlamaIndex?

Compared to LangChain

While LangChain offers a broader, general-purpose component library, LlamaIndex excels at data collection, indexing, and querying. This makes LlamaIndex best suited for use cases that require semantic search and retrieval applications and those with large and/or complex datasets.


Haystack is an open-source Python framework for developing custom LLM applications. In particular, much like LlamaIndex, Haystack provides a library of components that can be used to create applications with semantic search and retrieval capabilities. 

Key features of the Haystack framework include:

  • Nodes: the building blocks of applications in Haystack, nodes are individual components designed to carry out different tasks or assume various roles within a system. Haystack features a comprehensive collection of out-of-the-box nodes while also enabling you to create custom components. Examples of nodes include:
    • Prompts: used to pass LLMs input or specify a prompt template
    • Reader: scans texts within stored documents to find answers to queries.
    • Retriever: obtains documents relevant to the input query from a database.
    • Crawler: scrapes text from web pages and creates a document.
    • Decision: to classify queries and direct them to different nodes depending on their content.
  • Pipelines: this is the Haystack implementation of a chain, in which nodes are combined to form an end-to-end application. You can compose custom pipelines from nodes and other pipelines, or start with a ready-made pipeline from the library. Examples of Haystack’s ready-made pipelines include:
    • DocumentSearchPipeline: returns documents that best match a given query.
    • ExtractiveQAPipeline: answers a given question from information extracted from stored documents.
    • SearchSummarisationPipeline: creates a summary of the retrieved documents used to produce output for a query.
  • Agents: components that use existing nodes or pipelines to iteratively generate the best possible output for a given query. Agents are useful for scenarios where arriving at a correct answer takes a couple of iterations and collecting more knowledge first.

When it receives input, an agent first generates a “thought”: the initial part of the prompt that is then divided into smaller steps. From this, the agent selects a component and feeds it text input. Based on the quality of the component’s output, they will either display it to the user or repeat the process until it produces a more accurate response.

Pros and Cons of Haystack

  • Easy To Read Documentation: Haystack’s documentation is concise and well laid out, making it easier for developers to get to grips with its components and features. This is further assisted by the inclusion of tutorials and cookbooks that further flatten the learning curve. 
  • Simple Framework: Haystack is ostensibly comprised of nodes, pipelines, and agents, each of which has a relatively small amount of classes, making it easier for developers to find the appropriate component for their use case.  
  • Smaller User Base: Haystack doesn’t have as active a developer community as LangChain or even LlamaIndex, so there are fewer contributions of community-made components and it may be harder to find answers if you get stuck.

When Should You Use Haystack?

Although Haystack doesn’t offer LangChain’s breadth of functionality, like LlamaIndex, it offers powerful semantic search and data retrieval capabilities. In light of this, and the fact it’s more compact and intuitive than LlamaIndex, Haystack is a great chaining framework for developing simpler search and indexing-based LLM applications – as well as quick proof of concept prototyping.


AutoGen is a Python-based LLM chaining framework that resulted from collaborative research by Microsoft and the Universities of Washington, Penn State, and Xidian. It enables developers to create LLM applications through the configuration of multiple AI agents, that communicate with each other to undertake tasks, such as retrieving data and executing code. Consequently, agents and applications are created through the process of conversational programming, while LLM chaining in AutoGen is represented by agent interactions.

Key features of the AutoGen framework include:

  • Multi-Agent Conversations: using pre-built or custom agents to pass data between each other with optional human intervention. AutoGen agents fall into three different categories:
    • UserProxyAgent: collects input from the user and passes it to other agents for processing and execution.
    • AssistantAgent: receives data from UserProxyAgents and other AssistantAgents and carries out a pre-configured task.
    • GroupChatManager: manages the communication between agents in an application.
  • Diverse Conversational Patterns: AutoGen supports a range of diverse conversation patterns for mapping complex workflows. You can configure conversations according to the number of agents, their conversational autonomy (i.e., if they receive human input), and the conversational topology. Additionally, dynamic conversational capabilities change the agent topology according to the flow of conversation and the agents’ ability to complete their given tasks. This is particularly useful for more intricate use cases for which the patterns of interaction between agents can’t be anticipated in advance.

Pros and Cons of AutoGen 

  • Simplicity: the abstraction of components as agents can make it easier to understand how the AutoGen framework functions. This is especially true for non-technical personnel – including management and key stakeholders in digital transformation projects.
  • Intuitive Agent Customisation: agents are configured through intuitive user interfaces and conversational commands, enabling agent customization with minimal coding – if any at all. 
  • Harder to Debug: locating bugs in multi-agent applications could prove complex compared to more conventional LLM chaining frameworks because of the various interdependencies between agents.

When Should You Use AutoGen?

AutoGen is an appropriate LLM chaining framework if you’re looking to create an LLM application that revolves around multi-agent interactions, automation, and conversational prompting. Similarly, AutoGen allows development teams to experiment with the potential utility of autonomous agents and how they could be best integrated into their organization’s existing workflows. 


The emergence of LLM chaining frameworks marked a crucial moment in the current generative AI revolution – as they helped to significantly level the playing field for AI adoption. Now, instead of generative AI application development being restricted to companies with vast resources and highly skilled personnel, it’s feasible for companies of all sizes to harness the power of LLM-based systems and tools. 

With frameworks like LangChain, LlamaIndex, Haystack, and AutoGen, organizations can access a wide variety of LLMs and combine them with other powerful components to create end-to-end applications at a lower cost, in less time, and without needing to hire developers who are highly proficient in machine learning.


Avatar photo
Kartik Talamadupula
Director of AI Research

Kartik Talamadupula is a research scientist who has spent over a decade applying AI techniques to business problems in automation, human-AI collaboration, and NLP.