A production-ready Node.js Express server that connects to a Google Cloud Vertex AI Data Store and answers questions using Retrieval-Augmented Generation (RAG).
Data source (website / PDFs)
↓ scrape
Data Store (Google Cloud)
↓ embed
Vector Index (Vertex AI)
↑ retrieval
Express Server ←── POST /api/search
↓
LLM generates answer + citations
↓
JSON response to client
- Content is ingested into a Vertex AI Data Store (done once, in the console)
- A client sends a question via
POST /api/search - Vertex AI embeds the question, retrieves the top matching chunks
- The LLM generates a grounded answer with citations
- The server returns a structured JSON response
- Node.js v18+
- A Google Cloud account with billing enabled
- The Google Cloud CLI installed locally
- A Vertex AI Data Store already created
vertex-ai-rag-server/
├── .env ← your secret config (never commit)
├── .env.example ← template for other devs
├── .gitignore
├── package.json
├── server.js ← starts the server
├── app.js ← Express app factory (used by server + tests)
├── src/
│ ├── config/
│ │ └── env.js ← centralized env validation
│ ├── services/
│ │ └── searchService.js ← Vertex AI search logic + agent preamble
│ ├── controllers/
│ │ └── searchController.js ← request/response handling
│ ├── routes/
│ │ └── searchRoutes.js ← route definitions
│ └── middleware/
│ └── errorHandler.js ← global error handler
├── tests/
│ ├── routes/
│ │ ├── health.test.js ← health + 404 tests
│ │ └── search.test.js ← search endpoint tests (mocked API)
│ └── middleware/
│ └── errorHandler.test.js ← error handler tests
└── readme.md
# Clone the repo
git clone <your-repo-url> && cd vertex-ai-rag-server
# Install dependencies
npm installCopy the environment template and fill in your values:
cp .env.example .env# Google Cloud
PROJECT_ID="your-gcp-project-id"
LOCATION="global"
DATA_STORE_ID="your-data-store-id"
# Server
PORT=3000
NODE_ENV=development
# Agent persona (optional) — personalizes the AI assistant for your business
# Leave empty for a generic assistant
AGENT_PERSONA="Acme Corp, a cloud computing company based in San Francisco"
LOCATIONis typicallyglobal, but may beusoreudepending on where you created your data store.
gcloud auth application-default login
or
& "C:\Users\user\AppData\Local\Google\Cloud SDK\google-cloud-sdk\bin\gcloud.cmd" auth application-default login# Production
npm start
# Development (auto-restart on file changes)
npm run devYou should see:
🚀 Vertex AI RAG Server running on http://localhost:3000
Environment: development
Health: http://localhost:3000/api/health
Search: POST http://localhost:3000/api/search
GET /api/health
Response:
{
"success": true,
"data": {
"status": "healthy",
"uptime": 42,
"environment": "development"
}
}POST /api/search
Content-Type: application/json
Request body:
{
"query": "what services do you offer"
}Success response (200):
{
"success": true,
"data": {
"answer": "The company provides comprehensive facilities management...",
"citations": [
{ "title": "Services Page", "uri": "https://example.com/services" }
],
"resultCount": 5,
"confidence": {
"skippedReasons": [],
"safetyScores": {
"Toxicity": 0.05
}
}
}
}Error response (400):
{
"success": false,
"error": "A non-empty \"query\" string is required in the request body."
}No answer found (404):
{
"success": false,
"error": "No answer found for the given query.",
"data": {
"resultCount": 0,
"confidence": {
"skippedReasons": ["OUT_OF_DOMAIN_QUERY_IGNORED"],
"safetyScores": {}
}
}
}curl -X POST http://localhost:3000/api/search \
-H "Content-Type: application/json" \
-d "{\"query\": \"what services do you offer\"}"Tests use Jest and Supertest. The Vertex AI API is fully mocked — no Google Cloud credentials are needed to run tests.
# Run all tests
npm test
# Run tests in watch mode (re-runs on file changes)
npm run test:watch
# Run with coverage report
npm run test:coverage| Suite | Tests | What it covers |
|---|---|---|
health.test.js |
3 | Health endpoint returns 200, unknown routes return 404 |
search.test.js |
8 | Input validation (empty body, missing query, blank string, wrong type), successful search with answer/citations/confidence, query trimming, no answer → 404, API failure → 500 |
errorHandler.test.js |
3 | Default 500 status, custom statusCode, hides internal errors in production |
Every search response includes a confidence object with two signals:
| Field | Meaning |
|---|---|
skippedReasons |
Empty array = answer generated successfully. If populated, explains why the summary was skipped (e.g. OUT_OF_DOMAIN_QUERY_IGNORED, NO_RELEVANT_CONTENT, POTENTIAL_POLICY_VIOLATION) |
safetyScores |
Safety category → score map. Lower is better — a high score means the response was flagged for that category |
Error: Could not load the default credentials
Run gcloud auth application-default login and try again.
Error: 404 Data store not found
Double-check DATA_STORE_ID and PROJECT_ID in .env. Make sure the data store finished indexing.
Answer is empty or unhelpful
The data store may not have indexed enough content, or the question doesn't match the source material. Check the confidence.skippedReasons field in the response for details.
| Package | Purpose |
|---|---|
express |
HTTP server framework |
@google-cloud/discoveryengine |
Vertex AI Search SDK |
cors |
Cross-origin request support |
helmet |
Security headers |
morgan |
Request logging |
dotenv |
Loads .env config into process.env |
| Package | Purpose |
|---|---|
jest |
Test runner and assertion library |
supertest |
HTTP assertion library for Express apps |
MIT