Build Local RAG with Your Data

Developing RAG Systems with DeepSeek R1 & Ollama

Build robust RAG systems using DeepSeek R1 and Ollama. Discover setup procedures, best practices, and tips for developing intelligent AI solutions.

Bhavik Jikadara
Published in
4 min readJan 29, 2025

DeepSeek R1 and Ollama provide powerful tools for building Retrieval-Augmented Generation (RAG) systems. This guide covers the setup, implementation, and best practices for developing RAG applications using these technologies.

Why RAG Systems Are Game-Changing

Retrieval-augmented generation (RAG) systems combine the best of search and generative AI, enabling context-aware responses that are precise and accurate. With tools like DeepSeek R1 and Ollama, creating a RAG system is no longer daunting. Whether you’re building a chatbot, knowledge assistant, or an AI-powered search engine, this guide equips you with everything you need to know.

Prerequisites

What You’ll Learn

  • Setting up DeepSeek R1 and Ollama for RAG.
  • Implementing document processing, vector storage, and query pipelines.
  • Optimizing for performance, relevance, and user experience.

Steps to Build the RAG Pipeline

1. Setting Up the Environment and Importing libraries

Ensure you have installed the required Python packages. You can install them using:

pip install langchain-core langchain-community langchain-ollama langchain-huggingface faiss-cpu psutil
  • Import libraries
from typing import List
from langchain_core.documents import Document
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_core.prompts import ChatPromptTemplate
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_ollama.llms import OllamaLLM
from langchain_community.vectorstores import FAISS
import logging
import psutil
import os

2. Initializing the RAGPipeline Class

The the the RAGPipeline class manages the entire process, including memory monitoring, document loading, embedding generation, and querying the model.

class RAGPipeline:
def __init__(self, model_name: str = "llama2:7b-chat-q4", max_memory_gb: float = 3.0):
self.setup_logging()
self.check_system_memory(max_memory_gb)

# Load the language model (LLM)
self.llm = OllamaLLM(model="deepseek-r1:8b")

# Initialize embeddings using a lightweight model
self.embeddings = HuggingFaceEmbeddings(
model_name="sentence-transformers/all-mpnet-base-v2",
model_kwargs={'device': 'cpu'} # Use CPU for efficiency
)

# Define the prompt template
self.prompt = ChatPromptTemplate.from_template("""
Answer the question based only on the following context. Be concise.
If you cannot find the answer in the context, say "I cannot answer this based on the provided context."

Context: {context}
Question: {question}
Answer: """
)

3. Memory Management and Logging

To prevent crashes in low-memory environments, we log and check available memory before execution.

def setup_logging(self):
logging.basicConfig(level=logging.INFO)
self.logger = logging.getLogger(__name__)
def check_system_memory(self, max_memory_gb: float):
available_memory = psutil.virtual_memory().available / (1024 ** 3)
self.logger.info(f"Available system memory: {available_memory:.1f} GB")
if available_memory < max_memory_gb:
self.logger.warning("Memory is below recommended threshold.")

4. Loading and Splitting Documents

We use TextLoader and RecursiveCharacterTextSplitter to process documents efficiently.

def load_and_split_documents(self, file_path: str) -> List[Document]:
loader = TextLoader(file_path)
documents = loader.load()

text_splitter = RecursiveCharacterTextSplitter(
chunk_size=500,
chunk_overlap=50,
length_function=len,
add_start_index=True,
)
splits = text_splitter.split_documents(documents)
self.logger.info(f"Created {len(splits)} document chunks")
return splits

5. Creating a Vector Store with FAISS

We use FAISS for efficient document retrieval, and processing documents in smaller batches.

def create_vectorstore(self, documents: List[Document]) -> FAISS:
batch_size = 32
vectorstore = FAISS.from_documents(documents[:batch_size], self.embeddings)

for i in range(batch_size, len(documents), batch_size):
batch = documents[i:i + batch_size]
vectorstore.add_documents(batch)
self.logger.info(f"Processed batch {i//batch_size + 1}")
return vectorstore

6. Setting Up the RAG Chain

We define the retrieval mechanism to fetch relevant documents efficiently.

def setup_rag_chain(self, vectorstore: FAISS):
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 2, "fetch_k": 3})

def format_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| self.prompt
| self.llm
| StrOutputParser()
)
return rag_chain

7. Querying the Model with Memory Monitoring

We log memory usage before executing the query.

def query(self, chain, question: str) -> str:
memory_usage = psutil.Process(os.getpid()).memory_info().rss / 1024 / 1024
self.logger.info(f"Memory usage: {memory_usage:.1f} MB")
return chain.invoke(question)

8. Putting Everything Together in

We initialize the RAG pipeline, process documents, and run a sample query.

def main():
rag = RAGPipeline(model_name="deepseek-r1:8b", max_memory_gb=3.0)

documents = rag.load_and_split_documents("data/knowledge.txt")
vectorstore = rag.create_vectorstore(documents)
chain = rag.setup_rag_chain(vectorstore)

question = "What is AI?"
response = rag.query(chain, question)
print(f"Question: {question}\nAnswer: {response}")
if __name__ == "__main__":
main()

Conclusion

This blog detailed how to build a memory-efficient RAG pipeline using LangChain, Ollama, FAISS, and Hugging Face embeddings. By optimizing document chunking, vector storage, and memory monitoring, this approach ensures efficient AI-driven document retrieval even in low-resource environments. Try implementing this pipeline with your dataset and let us know your thoughts!

References

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

AI Agent Insider
AI Agent Insider

Published in AI Agent Insider

AI Agent Insider is your go-to source for the latest on AI agents. Explore breakthroughs, applications, and industry impact, from virtual assistants to autonomous systems. Dive into how AI is reshaping automation and interaction in our digital world. Stay ahead with us!

Bhavik Jikadara
Bhavik Jikadara

Written by Bhavik Jikadara

🚀 AI/ML & MLOps expert 🌟 Crafting advanced solutions to speed up data retrieval 📊 and enhance ML model lifecycles. buymeacoffee.com/bhavikjikadara

Responses (4)

What are your thoughts?