Back to Home 🇮🇹 Italiano

How I Automated My Codebase Management with n8n and AI

A complete automation that transforms your GitHub repository into an intelligent assistant for creating issues and answering questions about your code.

n8n MongoDB Atlas Vertex AI Telegram RAG

1. System Overview

This n8n automation consists of three main workflows:

  • Initial indexing: Downloads the entire codebase and loads it into a vector store
  • Automatic sync: Every push to GitHub automatically updates modified files
  • AI Assistant: A Telegram bot that uses the codebase to create GitHub issues and answer questions
System Architecture Diagram

Tech Stack:

  • n8n for orchestration
  • MongoDB Atlas as vector store
  • Google Vertex AI (Gemini 2.5 Pro) as LLM
  • Telegram as user interface

2. Codebase Indexing

The Basic Flow

The first step is to download all .dart files from the GitHub repository and prepare them for embedding.

Main nodes:

  1. HTTP Request → Downloads the file tree from GitHub API
  2. get file list → Transforms the tree into a file list
  3. Filter .dart Files → Filters only Dart files
  4. Download Raw Content → Downloads the raw content of each file

You can get the complete file list using the GitHub API tree endpoint: replace "owner/your-repo" with your repository and "COMMIT_SHA" with the commit SHA you want to explore.

https://api.github.com/repos/OWNER/YOUR-REPO/git/trees/COMMIT_SHA?recursive=1

By replacing OWNER/YOUR-REPO and the SHA, you get all the files present in that commit.

Indexing Workflow Diagram

Intelligent Chunking

Here's where it gets interesting. We can't just put entire files in the vector store - they're too large. We need to do "chunking", but intelligently.

// The code looks for semantic separators in Dart code
const separators = [
  /\nclass /,
  /\nimport /,
  /\nvoid /,
  /\nFuture

Instead of randomly cutting the code every X characters, the chunker node tries to split on semantic boundaries - when a new class starts, an import, a function, etc. This keeps the context intact.

Maximum chunk length: 1200 characters

Data Format

Each chunk is structured like this:

{
  file: "path/to/file.dart",
  chunkIndex: 0,
  content: "class MyWidget extends StatelessWidget..."
}

And then transformed :

{
  pageContent: "class MyWidget...",
  metadata: {
    filename: "path/to/file.dart",
    chunkIndex: 0
  }
}

Embeddings and Storage

  • Embeddings are generated and saved in MongoDB Atlas with vector index
  • Each chunk is semantically searchable
MongoDB Vector Index

3. Automatic Updates on Push

The Problem

Indexing the entire codebase is fine the first time, but what happens when you commit? I don't want to re-index everything every time.

The Solution: Incremental Updates

The Github Trigger node listens to push events and the get new files node processes commits to understand what changed:

const filesToIngest = new Set();  // New or modified files
const filesToDelete = new Set();  // Removed or modified files

for (const commit of commits) {
  commit.added.forEach(path => filesToIngest.add(path));
  commit.modified.forEach(path => {
    filesToIngest.add(path);   // Re-index new version
    filesToDelete.add(path);   // Remove old version
  });
  commit.removed.forEach(path => filesToDelete.add(path));
}
Sync Workflow Diagram

Two Separate Branches

The Switch node splits the flow:

  • Branch "delete": Removes old documents from MongoDB
  • Branch "ingest": Downloads and indexes new files

Example delete operation:

{
  "operation": "delete",
  "collection": "codebase",
  "query": "{ \"filename\": \"lib/screens/home.dart\" }"
}

Logging to Google Sheets

Every change is logged to a Google Sheet with:

  • File path
  • Operation (delete/ingest)
  • Timestamp
  • Commit ID
  • Author

This allows for a complete history of changes to the vector store.

4. AI Agent with Telegram

The User Interface

I chose Telegram as the interface because:

  • It's fast
  • Supports voice messages
  • Always available on my phone
  • Has a good API

The Telegram Flow

  1. Telegram Trigger → Receives messages
  2. is admin → Verifies it's me or an authorized user
  3. Switch1 → Distinguishes between text, photo, and audio
  4. Code in JavaScript1 → Extracts text from message
  5. AI Agent → Processes the request
Telegram Interaction Workflow

Multimodal Support

Voice Messages:

  • Downloaded via Telegram API
  • Transcribed with Gemini using Vertex AI APIs
  • Processed as normal text

Images:

  • Analyzed with Gemini
  • The idea is to analyze them for UI/bug screenshots

The AI Agent

The heart of the system is the AI Agent node configured with:

LLM: Google Vertex AI - Gemini 2.5 Pro

I use Google because I have free credits from Vertex

Available tools:

  1. MongoDB Atlas Vector Store → Searches the codebase
  2. Create an issue in GitHub → Creates issues automatically

Memory: Buffer window memory

  • Maintains conversational context
  • One session per Telegram chat

The System Prompt

The prompt is long and detailed, but the key points are:

Core Capabilities:

  1. GitHub Issue Creation - Convert user messages into structured issues
  2. Coding Assistant - Provide development guidance using codebase knowledge
  3. File Recommendations - Suggest specific files to edit

Reasoning Budget

An interesting thing in the prompt is the "reasoning budget":

Maximum 2-3 reasoning steps per response
NO loops or iterations - Decide → Draft → Done

This forces the agent to be efficient and not waste tokens on useless reasoning.

5. How RAG Works

Retrieval-Augmented Generation

RAG = Retrieve relevant information before generating a response.

The process:

  1. User query → "the button to select the exam in the timer isn't working?"
  2. Query embedding → Converts the question into a numeric vector
  3. Similarity search → Searches for the most similar chunks in the vector store
Query vector: [0.23, -0.45, 0.12, ...]

Top results:
- pages/timer/button.dart (similarity: 0.89)
- pages/timer/exam_selector.dart (similarity: 0.82)
- src/interceptors/authInterceptor.ts (similarity: 0.78)
  1. Context injection → Relevant chunks are passed to the LLM
  2. Response generation → The LLM generates the response using the context

Why It Works

  • Semantic search: Doesn't search for keywords, but meaning
  • Precise context: Only the relevant parts of the code
  • Always up to date: The vector store syncs with every push

6. Practical Results

What I Can Do Now

1. Create issues quickly

Telegram: "I want to add dark mode to the settings page, assign it to copilot"
Bot: [Complete issue draft with files to modify]
Me: "yes"
Bot: ✅ Issue #123 created

2. Ask questions about the code

Telegram: "where is the user state managed?"
Bot: "User state is managed in src/state/userState.dart using Bloc.
     It's initialized in src/app.dart at app launch..."

3. Assisted debugging

Telegram: "I have an error in API parsing, where should I look?"
Bot: "Check src/services/apiService.dart, lines 45-60.
     JSON parsing might fail if a field is missing..."

Metrics

  • Time to create an issue: From 5-10 minutes → 30 seconds
  • Chunks in vector store: ~3788
  • Cost per query: zero (Google provides free credits)

7. Next Steps

New files:

If the automation is down and new commits arrive, they get lost. It would be better to implement a system that doesn't just look at the last push's changes but compares the differences between the last processed commit and the latest one that arrived.

Open source:

Use open source models when possible.