Codincity Engineering

Towards Evaluation-Driven Enterprise Assistance with MongoDB and LeanAssist

April 18, 2025

As enterprise teams increasingly adopt Retrieval-Augmented Generation (RAG) systems, the challenge isn’t just building one—it’s making it production-ready, observable, and aligned with real user needs. At Codincity, we’ve built LeanAssist, a generative AI assistant designed for technical and operations teams.

In this post, we walk through how LeanAssist uses MongoDB Atlas Vector Search and Azure OpenAI, and how evaluation-driven development (EDD) is embedded throughout the system. The result is a scalable and explainable RAG platform that integrates directly with enterprise knowledge stores like SharePoint, SQL, and document repositories.

Why We Built LeanAssist

Enterprise support and engineering teams often struggle with:

  • Tribal knowledge stored in SharePoint, wikis, and ticketing systems.
  • Manual correlation of past incidents, SOPs, and RCA documentation.
  • Long onboarding cycles for new engineers due to scattered information.
  • Hallucinations from LLMs when answers aren’t grounded in real context.

We needed a way to connect structured and unstructured data, enable semantic retrieval, and ensure that every AI-generated answer could be traced back to source material. That led to LeanAssist.

Architecture Overview

LeanAssist is built as a modular Retrieval-Augmented Generation (RAG) stack. At a high level:

  1. Data Ingestion: Content from SharePoint, internal drives, and MS SQL (ticket DB) is ingested and chunked semantically.
  2. Embedding & Indexing: Chunks are embedded and stored in MongoDB Atlas with metadata. Atlas Vector Search indexes these chunks for cosine similarity queries.
  3. Retrieval Pipeline: When a user issues a query, LeanAssist embeds it and uses MongoDB’s vector search to retrieve top-k relevant chunks.
  4. LLM Prompt Construction: Retrieved chunks are formatted into a prompt with role-based context and fed to Azure OpenAI GPT (private instance).
  5. Generation & Traceability: The LLM produces a response. Alongside it, we return the retrieved chunks, source links, and metadata for traceability.

MongoDB Integration in Detail

We use MongoDB Atlas as the core vector database. Here's how it's structured:

  • Each chunk is stored in a MongoDB document with:
    • text: the raw chunk
    • embedding: a vector embedding
    • source_type: e.g., SOP, RCA, ticket
    • timestamp, team, access_level fields
  • Atlas Vector Search is configured on the embedding field using cosineSimilarity.
  • At query time, we pass the user query’s embedding and a set of metadata filters (e.g., team-specific, recent-only).
  • MongoDB returns the top-k semantically similar chunks in <100ms.

This lets us handle multi-source retrieval with access control using a single database.

Example Query Flow

Let’s say a user asks:

“How was a similar outage handled last quarter for Team Delta’s L2 pipeline?” Here’s what happens:

  1. LeanAssist embeds the question using OpenAI's embedding model.
  2. Vector search runs against MongoDB with filters: team = Delta, source_type in [RCA, ticket], timestamp > 90d.
  3. Top-k chunks are retrieved, each with trace metadata.
  4. A structured prompt is built, and Azure OpenAI generates a summary referencing the prior fix and SOP.
  5. The response includes source trace and an option to view the original documents.

Evaluation-Driven Development (EDD)

Like with test-driven development (TDD), we believe AI systems must be continuously evaluated with observable metrics. LeanAssist uses the following EDD practices:

Metrics Logged Per Interaction:

  • Document relevance score (cosine similarity)
  • Faithfulness check (does the answer cite retrieved text accurately?)
  • Response completeness (does it address the query fully?)
  • Latency (retrieval + generation)

Evaluation Pipeline:

  • A sample set of representative queries is replayed weekly.
  • Multiple prompt templates are A/B tested.
  • Retrieval hit rate and chunk overlap are monitored for drift.
  • User feedback (thumbs up/down + comments) is linked to prompt versions and chunk sets.

This ensures we can trace regressions, improve prompts based on usage, and test new retrieval strategies safely.

Deployment Model

LeanAssist is deployed as a private instance within the client’s Azure environment. Data never leaves the tenant boundary.

Results

  • ↓ 45% average search-to-resolution time across L1/L2 users
  • ↑ 60% increase in retrieval precision (based on internal relevance scoring)
  • ↑ Engineer trust thanks to source-backed, explainable responses
  • No need to duplicate KBs — works off existing enterprise storage

Final Thoughts

RAG systems only work at scale if they’re observable, grounded in enterprise data, and built with real user feedback. MongoDB Atlas gave us an efficient, metadata-aware vector layer. Azure OpenAI handled generation securely. The rest was engineering discipline around prompts, chunking, and evaluation.

If you're building for IT, support, or operations—take your time with retrieval logic, set up your evaluation loop early, and don’t skip traceability. That's what makes the difference in production.

More Blogs

View All
Codiner
July 15, 2024

Revolutionizing Infrastructure Management with Terra-Auto

Read More
Codiner
June 13, 2024

Drive Cloud Infrastructure And Operations with Generative AI

Read More
Codiner
January 8, 2024

Reimagine and Renew your applications with Codincity Application Modernization Services

Read More