Skip to content

ST10119175/Simple-Vertex-agent

Repository files navigation

Vertex AI RAG Server

Node.js CI License: MIT Node.js Version

A production-ready Node.js Express server that connects to a Google Cloud Vertex AI Data Store and answers questions using Retrieval-Augmented Generation (RAG).


How It Works

Data source (website / PDFs)
        ↓  scrape
   Data Store (Google Cloud)
        ↓  embed
   Vector Index (Vertex AI)
        ↑  retrieval
   Express Server  ←──  POST /api/search
        ↓
   LLM generates answer + citations
        ↓
   JSON response to client

vertex_rag_pipeline

  1. Content is ingested into a Vertex AI Data Store (done once, in the console)
  2. A client sends a question via POST /api/search
  3. Vertex AI embeds the question, retrieves the top matching chunks
  4. The LLM generates a grounded answer with citations
  5. The server returns a structured JSON response

Prerequisites


Project Structure

vertex-ai-rag-server/
├── .env                              ← your secret config (never commit)
├── .env.example                      ← template for other devs
├── .gitignore
├── package.json
├── server.js                         ← starts the server
├── app.js                            ← Express app factory (used by server + tests)
├── src/
│   ├── config/
│   │   └── env.js                    ← centralized env validation
│   ├── services/
│   │   └── searchService.js          ← Vertex AI search logic + agent preamble
│   ├── controllers/
│   │   └── searchController.js       ← request/response handling
│   ├── routes/
│   │   └── searchRoutes.js           ← route definitions
│   └── middleware/
│       └── errorHandler.js           ← global error handler
├── tests/
│   ├── routes/
│   │   ├── health.test.js            ← health + 404 tests
│   │   └── search.test.js            ← search endpoint tests (mocked API)
│   └── middleware/
│       └── errorHandler.test.js      ← error handler tests
└── readme.md

Installation

# Clone the repo
git clone <your-repo-url> && cd vertex-ai-rag-server

# Install dependencies
npm install

Configuration

Copy the environment template and fill in your values:

cp .env.example .env
# Google Cloud
PROJECT_ID="your-gcp-project-id"
LOCATION="global"
DATA_STORE_ID="your-data-store-id"

# Server
PORT=3000
NODE_ENV=development

# Agent persona (optional) — personalizes the AI assistant for your business
# Leave empty for a generic assistant
AGENT_PERSONA="Acme Corp, a cloud computing company based in San Francisco"

LOCATION is typically global, but may be us or eu depending on where you created your data store.


Usage

Authenticate with Google Cloud

gcloud auth application-default login 

or 

& "C:\Users\user\AppData\Local\Google\Cloud SDK\google-cloud-sdk\bin\gcloud.cmd" auth application-default login

Start the server

# Production
npm start

# Development (auto-restart on file changes)
npm run dev

You should see:

🚀 Vertex AI RAG Server running on http://localhost:3000
   Environment: development
   Health:      http://localhost:3000/api/health
   Search:      POST http://localhost:3000/api/search

API Reference

Health Check

GET /api/health

Response:

{
  "success": true,
  "data": {
    "status": "healthy",
    "uptime": 42,
    "environment": "development"
  }
}

Search

POST /api/search
Content-Type: application/json

Request body:

{
  "query": "what services do you offer"
}

Success response (200):

{
  "success": true,
  "data": {
    "answer": "The company provides comprehensive facilities management...",
    "citations": [
      { "title": "Services Page", "uri": "https://example.com/services" }
    ],
    "resultCount": 5,
    "confidence": {
      "skippedReasons": [],
      "safetyScores": {
        "Toxicity": 0.05
      }
    }
  }
}

Error response (400):

{
  "success": false,
  "error": "A non-empty \"query\" string is required in the request body."
}

No answer found (404):

{
  "success": false,
  "error": "No answer found for the given query.",
  "data": {
    "resultCount": 0,
    "confidence": {
      "skippedReasons": ["OUT_OF_DOMAIN_QUERY_IGNORED"],
      "safetyScores": {}
    }
  }
}

Quick Test

curl -X POST http://localhost:3000/api/search \
  -H "Content-Type: application/json" \
  -d "{\"query\": \"what services do you offer\"}"

Testing

Tests use Jest and Supertest. The Vertex AI API is fully mocked — no Google Cloud credentials are needed to run tests.

# Run all tests
npm test

# Run tests in watch mode (re-runs on file changes)
npm run test:watch

# Run with coverage report
npm run test:coverage

Test Coverage

Suite Tests What it covers
health.test.js 3 Health endpoint returns 200, unknown routes return 404
search.test.js 8 Input validation (empty body, missing query, blank string, wrong type), successful search with answer/citations/confidence, query trimming, no answer → 404, API failure → 500
errorHandler.test.js 3 Default 500 status, custom statusCode, hides internal errors in production

Confidence & Safety

Every search response includes a confidence object with two signals:

Field Meaning
skippedReasons Empty array = answer generated successfully. If populated, explains why the summary was skipped (e.g. OUT_OF_DOMAIN_QUERY_IGNORED, NO_RELEVANT_CONTENT, POTENTIAL_POLICY_VIOLATION)
safetyScores Safety category → score map. Lower is better — a high score means the response was flagged for that category

Troubleshooting

Error: Could not load the default credentials Run gcloud auth application-default login and try again.

Error: 404 Data store not found Double-check DATA_STORE_ID and PROJECT_ID in .env. Make sure the data store finished indexing.

Answer is empty or unhelpful The data store may not have indexed enough content, or the question doesn't match the source material. Check the confidence.skippedReasons field in the response for details.


Dependencies

Package Purpose
express HTTP server framework
@google-cloud/discoveryengine Vertex AI Search SDK
cors Cross-origin request support
helmet Security headers
morgan Request logging
dotenv Loads .env config into process.env

Dev Dependencies

Package Purpose
jest Test runner and assertion library
supertest HTTP assertion library for Express apps

License

MIT

About

A production-ready Node.js Express server for Retrieval-Augmented Generation (RAG) using Google Cloud Vertex AI Data Stores.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors