Vertex AI RAG Server

A production-ready Node.js Express server that connects to a Google Cloud Vertex AI Data Store and answers questions using Retrieval-Augmented Generation (RAG).

How It Works

Data source (website / PDFs)
        ↓  scrape
   Data Store (Google Cloud)
        ↓  embed
   Vector Index (Vertex AI)
        ↑  retrieval
   Express Server  ←──  POST /api/search
        ↓
   LLM generates answer + citations
        ↓
   JSON response to client

Content is ingested into a Vertex AI Data Store (done once, in the console)
A client sends a question via POST /api/search
Vertex AI embeds the question, retrieves the top matching chunks
The LLM generates a grounded answer with citations
The server returns a structured JSON response

Prerequisites

Node.js v18+
A Google Cloud account with billing enabled
The Google Cloud CLI installed locally
A Vertex AI Data Store already created

Project Structure

vertex-ai-rag-server/
├── .env                              ← your secret config (never commit)
├── .env.example                      ← template for other devs
├── .gitignore
├── package.json
├── server.js                         ← starts the server
├── app.js                            ← Express app factory (used by server + tests)
├── src/
│   ├── config/
│   │   └── env.js                    ← centralized env validation
│   ├── services/
│   │   └── searchService.js          ← Vertex AI search logic + agent preamble
│   ├── controllers/
│   │   └── searchController.js       ← request/response handling
│   ├── routes/
│   │   └── searchRoutes.js           ← route definitions
│   └── middleware/
│       └── errorHandler.js           ← global error handler
├── tests/
│   ├── routes/
│   │   ├── health.test.js            ← health + 404 tests
│   │   └── search.test.js            ← search endpoint tests (mocked API)
│   └── middleware/
│       └── errorHandler.test.js      ← error handler tests
└── readme.md

Installation

# Clone the repo
git clone <your-repo-url> && cd vertex-ai-rag-server

# Install dependencies
npm install

Configuration

Copy the environment template and fill in your values:

cp .env.example .env

# Google Cloud
PROJECT_ID="your-gcp-project-id"
LOCATION="global"
DATA_STORE_ID="your-data-store-id"

# Server
PORT=3000
NODE_ENV=development

# Agent persona (optional) — personalizes the AI assistant for your business
# Leave empty for a generic assistant
AGENT_PERSONA="Acme Corp, a cloud computing company based in San Francisco"

LOCATION is typically global, but may be us or eu depending on where you created your data store.

Usage

Authenticate with Google Cloud

gcloud auth application-default login 

or 

& "C:\Users\user\AppData\Local\Google\Cloud SDK\google-cloud-sdk\bin\gcloud.cmd" auth application-default login

Start the server

# Production
npm start

# Development (auto-restart on file changes)
npm run dev

You should see:

🚀 Vertex AI RAG Server running on http://localhost:3000
   Environment: development
   Health:      http://localhost:3000/api/health
   Search:      POST http://localhost:3000/api/search

API Reference

Health Check

GET /api/health

Response:

{
  "success": true,
  "data": {
    "status": "healthy",
    "uptime": 42,
    "environment": "development"
  }
}

Search

POST /api/search
Content-Type: application/json

Request body:

{
  "query": "what services do you offer"
}

Success response (200):

{
  "success": true,
  "data": {
    "answer": "The company provides comprehensive facilities management...",
    "citations": [
      { "title": "Services Page", "uri": "https://example.com/services" }
    ],
    "resultCount": 5,
    "confidence": {
      "skippedReasons": [],
      "safetyScores": {
        "Toxicity": 0.05
      }
    }
  }
}

Error response (400):

{
  "success": false,
  "error": "A non-empty \"query\" string is required in the request body."
}

No answer found (404):

{
  "success": false,
  "error": "No answer found for the given query.",
  "data": {
    "resultCount": 0,
    "confidence": {
      "skippedReasons": ["OUT_OF_DOMAIN_QUERY_IGNORED"],
      "safetyScores": {}
    }
  }
}

Quick Test

curl -X POST http://localhost:3000/api/search \
  -H "Content-Type: application/json" \
  -d "{\"query\": \"what services do you offer\"}"

Testing

Tests use Jest and Supertest. The Vertex AI API is fully mocked — no Google Cloud credentials are needed to run tests.

# Run all tests
npm test

# Run tests in watch mode (re-runs on file changes)
npm run test:watch

# Run with coverage report
npm run test:coverage

Test Coverage

Suite	Tests	What it covers
`health.test.js`	3	Health endpoint returns 200, unknown routes return 404
`search.test.js`	8	Input validation (empty body, missing query, blank string, wrong type), successful search with answer/citations/confidence, query trimming, no answer → 404, API failure → 500
`errorHandler.test.js`	3	Default 500 status, custom statusCode, hides internal errors in production

Confidence & Safety

Every search response includes a confidence object with two signals:

Field	Meaning
`skippedReasons`	Empty array = answer generated successfully. If populated, explains why the summary was skipped (e.g. `OUT_OF_DOMAIN_QUERY_IGNORED`, `NO_RELEVANT_CONTENT`, `POTENTIAL_POLICY_VIOLATION`)
`safetyScores`	Safety category → score map. Lower is better — a high score means the response was flagged for that category

Troubleshooting

Error: Could not load the default credentials Run gcloud auth application-default login and try again.

Error: 404 Data store not found Double-check DATA_STORE_ID and PROJECT_ID in .env. Make sure the data store finished indexing.

Answer is empty or unhelpful The data store may not have indexed enough content, or the question doesn't match the source material. Check the confidence.skippedReasons field in the response for details.

Dependencies

Package	Purpose
`express`	HTTP server framework
`@google-cloud/discoveryengine`	Vertex AI Search SDK
`cors`	Cross-origin request support
`helmet`	Security headers
`morgan`	Request logging
`dotenv`	Loads `.env` config into `process.env`

Dev Dependencies

Package	Purpose
`jest`	Test runner and assertion library
`supertest`	HTTP assertion library for Express apps

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vertex AI RAG Server

How It Works

Prerequisites

Project Structure

Installation

Configuration

Usage

Authenticate with Google Cloud

Start the server

API Reference

Health Check

Search

Quick Test

Testing

Test Coverage

Confidence & Safety

Troubleshooting

Dependencies

Dev Dependencies

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github		.github
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
app.js		app.js
package-lock.json		package-lock.json
package.json		package.json
readme.md		readme.md
server.js		server.js
vertex_rag_pipeline.svg		vertex_rag_pipeline.svg

Folders and files

Latest commit

History

Repository files navigation

Vertex AI RAG Server

How It Works

Prerequisites

Project Structure

Installation

Configuration

Usage

Authenticate with Google Cloud

Start the server

API Reference

Health Check

Search

Quick Test

Testing

Test Coverage

Confidence & Safety

Troubleshooting

Dependencies

Dev Dependencies

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages