Skip to content

feat: add new scorers and tools for enhanced evaluation#132

Merged
ssdeanx merged 1 commit intomainfrom
develop
Apr 15, 2026
Merged

feat: add new scorers and tools for enhanced evaluation#132
ssdeanx merged 1 commit intomainfrom
develop

Conversation

@ssdeanx
Copy link
Copy Markdown
Owner

@ssdeanx ssdeanx commented Apr 15, 2026

  • Introduced factualityScorer to evaluate claims for factual accuracy and evidence support.
  • Added financialDataScorer to assess the integrity of financial analysis outputs, checking for required fields and data sanity.
  • Implemented keywordCoverageScorer to measure the coverage of required keywords in outputs.
  • Created taskCompleteScorer to verify if research and writing tasks are fully completed based on content structure and context.
  • Developed discordWebhookTool for posting messages to a Discord webhook, including input validation and error handling.
  • Added comprehensive tests for the new supervisor scorers to ensure they evaluate responses effectively based on various criteria such as routing discipline, evidence grounding, request coverage, actionability, uncertainty handling, and conciseness.

Summary by Sourcery

Enhance market data tools, research agents, and evaluation scorers while adding browser automation, Discord notifications, and Google OAuth/One Tap auth flows.

New Features:

  • Support batch symbol requests and improved fallbacks in Binance, Yahoo Finance, and Stooq market data tools.
  • Introduce a Discord webhook tool and integrate it into the research agent alongside a new browserAgent using a shared Chrome/CDP runtime.
  • Add supervisor-focused evaluation scorers and custom research scorers to better assess routing, evidence grounding, coverage, actionability, and uncertainty handling.
  • Enable Google OAuth and One Tap authentication in the auth client and UI, including a unified callback route and updated Better Auth configuration.
  • Expose external tool providers (Composio, Arcade) in the Mastra editor and wire agents, workflows, and networks to Libsql-based memory instead of Postgres.

Enhancements:

  • Refine research agent experiments, prompts, and workspace configuration for richer evaluation scenarios and safer filesystem access.
  • Standardize browser configuration with shared viewport/screencast settings and improve structured logging via a file-backed Pino logger.
  • Update README documentation, diagrams, and examples to describe live browser automation, sandboxes, and LibSQL integration.

Build:

  • Bump package version and several AI-related dependencies, add Chrome debug script, and expand overrides for security and tooling packages.

Tests:

  • Extend market data helper tests for new snapshot builders and Yahoo chart fallbacks, and relocate/augment eval tests for keyword coverage, context relevance, and noise sensitivity.
  • Add placeholder or updated tests for new supervisor scorers and agent evaluation behaviors.

- Introduced `factualityScorer` to evaluate claims for factual accuracy and evidence support.
- Added `financialDataScorer` to assess the integrity of financial analysis outputs, checking for required fields and data sanity.
- Implemented `keywordCoverageScorer` to measure the coverage of required keywords in outputs.
- Created `taskCompleteScorer` to verify if research and writing tasks are fully completed based on content structure and context.
- Developed `discordWebhookTool` for posting messages to a Discord webhook, including input validation and error handling.
- Added comprehensive tests for the new supervisor scorers to ensure they evaluate responses effectively based on various criteria such as routing discipline, evidence grounding, request coverage, actionability, uncertainty handling, and conciseness.
Copilot AI review requested due to automatic review settings April 15, 2026 02:07
@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented Apr 15, 2026

Reviewer's Guide

Adds new supervisor scoring utilities and Discord/web browser tooling, improves financial and market data tools (Binance, Yahoo, Stooq) with batching and fallbacks, switches persistence and logging to LibSQL and file-based logs, wires new tools into research agents and auth (Google OAuth/OneTap), and updates docs, dependencies, and tests to cover the new behavior.

Sequence diagram for browser agent using shared Chrome CDP runtime

sequenceDiagram
    actor Developer
    participant BrowserAgent as browserAgent
    participant AgentBrowser as agentBrowser
    participant Chrome as LocalChrome
    participant CDP as ChromeCDPServer
    participant Page as TargetWebApp

    Developer->>BrowserAgent: Send browser task request
    BrowserAgent->>AgentBrowser: initializeSession()
    AgentBrowser->>CDP: connect(cdpUrl)
    CDP-->>AgentBrowser: connectionEstablished
    AgentBrowser-->>Chrome: attachToExistingSession

    BrowserAgent->>AgentBrowser: navigate(url)
    AgentBrowser->>CDP: Page.navigate(url)
    CDP-->>AgentBrowser: navigationComplete

    BrowserAgent->>AgentBrowser: snapshotPage()
    AgentBrowser->>CDP: DOM.getDocument
    CDP-->>AgentBrowser: domTree
    AgentBrowser-->>BrowserAgent: structuredSnapshot

    BrowserAgent->>AgentBrowser: performAction(ref,action)
    AgentBrowser->>CDP: Runtime.evaluate(click or type)
    CDP-->>AgentBrowser: actionResult

    AgentBrowser-->>BrowserAgent: finalPageState
    BrowserAgent-->>Developer: structuredResult (dom,console,screenshots)
Loading

Sequence diagram for Discord webhook tool execution

sequenceDiagram
    participant Agent as ResearchAgent
    participant Tool as discordWebhookTool
    participant Env as EnvVariables
    participant Discord as DiscordWebhookAPI

    Agent->>Tool: execute(input)
    Tool->>Env: read(DISCORD_WEBHOOK_URL)
    Env-->>Tool: webhookUrl
    Tool->>Tool: validate input.content
    Tool->>Discord: POST /webhook?wait=true
    Note right of Tool: Body includes content, username, avatarUrl, tts
    Discord-->>Tool: Response(status,json)
    alt response.ok
        Tool->>Tool: build DiscordWebhookOutput(ok,status,content,id,channel_id,timestamp)
        Tool-->>Agent: output
    else error
        Tool->>Tool: read error text
        Tool-->>Agent: throw Error("Discord webhook request failed")
    end
Loading

Sequence diagram for enhanced Yahoo and Stooq quote fallbacks

sequenceDiagram
    participant Client as AgentOrToolUser
    participant YahooTool as yahooFinanceStockQuotesTool
    participant StooqTool as stooqStockQuotesTool
    participant YahooAPI as YahooFinanceAPI
    participant StooqAPI as StooqAPI

    Client->>YahooTool: execute({symbol,function})
    YahooTool->>YahooTool: normalizedSymbol = trim().toUpperCase()
    YahooTool->>YahooAPI: GET /v7/finance/quote?symbols=normalizedSymbol
    YahooAPI-->>YahooTool: quoteResponse
    alt quote row exists
        YahooTool->>YahooTool: map row to primary quote snapshot
        YahooTool-->>Client: output with quote
    else no quote row
        YahooTool->>YahooAPI: GET /v8/finance/chart/normalizedSymbol
        YahooAPI-->>YahooTool: chartPayload
        YahooTool->>YahooTool: fallbackQuote = buildYahooQuoteSnapshotFromChart(chartPayload,normalizedSymbol)
        alt fallbackQuote exists
            YahooTool-->>Client: output with fallbackQuote
        else no candles
            YahooTool-->>Client: throw Error("No Yahoo Finance quote returned")
        end
    end

    Client->>StooqTool: execute({symbol,function})
    StooqTool->>StooqTool: requestedSymbol = trim()
    StooqTool->>StooqAPI: GET /q/l/?s=stooqSymbol
    StooqAPI-->>StooqTool: csvRowOrEmpty
    alt csv row exists
        StooqTool->>StooqTool: data = buildStooqQuoteSnapshot(row,stooqSymbol)
        StooqTool-->>Client: output with quote
    else no row
        StooqTool->>StooqAPI: GET /q/d/l/?s=stooqSymbol&i=d&e=csv
        StooqAPI-->>StooqTool: historyRows
        StooqTool->>StooqTool: historyRow = lastRow
        alt historyRow exists
            StooqTool->>StooqTool: data = buildStooqQuoteSnapshot(historyRow,stooqSymbol)
            StooqTool-->>Client: output with fallback quote
        else no history
            StooqTool-->>Client: throw Error("No Stooq quote returned")
        end
    end
Loading

Class diagram for new supervisor scorers and custom research scorers

classDiagram
    direction LR

    class SupervisorSnapshot {
      +string userMessage
      +number inputMessageCount
      +number systemMessageCount
      +string systemPrompt
      +string responseText
      +number responseMessageCount
      +string reasoningText
      +number toolCount
      +string[] toolNames
      +number paragraphCount
      +number wordCount
      +string[] keyTerms
    }

    class SupervisorSignals {
      +boolean hasResponse
      +boolean hasStructure
      +boolean hasDirectSummary
      +boolean hasNextSteps
      +boolean hasCaveat
      +boolean hasEvidence
      +boolean hasRoutingChatter
      +boolean hasPriorityLanguage
      +boolean hasActionLanguage
      +boolean hasUncertaintyLanguage
      +number keyTermCoverage
    }

    SupervisorSignals --|> SupervisorSnapshot

    class SupervisorScorerConfig {
      <<interface>>
      +string id
      +string name
      +string description
      +string type
    }

    class SupervisorRoutingDisciplineScorer {
      +string id
      +string name
      +string description
      +string type
      +preprocess(run)
      +analyze(results)
      +generateScore(results)
      +generateReason(results,score)
    }

    class SupervisorEvidenceGroundingScorer {
      +preprocess(run)
      +analyze(results)
      +generateScore(results)
      +generateReason(results,score)
    }

    class SupervisorCoverageScorer {
      +preprocess(run)
      +analyze(results)
      +generateScore(results)
      +generateReason(results,score)
    }

    class SupervisorActionabilityScorer {
      +preprocess(run)
      +analyze(results)
      +generateScore(results)
      +generateReason(results,score)
    }

    class SupervisorUncertaintyHandlingScorer {
      +preprocess(run)
      +analyze(results)
      +generateScore(results)
      +generateReason(results,score)
    }

    class SupervisorConcisenessScorer {
      +preprocess(run)
      +analyze(results)
      +generateScore(results)
      +generateReason(results,score)
    }

    class SupervisorScorersRegistry {
      +routingDiscipline
      +evidenceGrounding
      +requestCoverage
      +actionability
      +uncertaintyHandling
      +conciseness
    }

    SupervisorRoutingDisciplineScorer ..> SupervisorSnapshot : uses
    SupervisorRoutingDisciplineScorer ..> SupervisorSignals : uses

    SupervisorEvidenceGroundingScorer ..> SupervisorSnapshot
    SupervisorEvidenceGroundingScorer ..> SupervisorSignals

    SupervisorCoverageScorer ..> SupervisorSnapshot
    SupervisorCoverageScorer ..> SupervisorSignals

    SupervisorActionabilityScorer ..> SupervisorSnapshot
    SupervisorActionabilityScorer ..> SupervisorSignals

    SupervisorUncertaintyHandlingScorer ..> SupervisorSnapshot
    SupervisorUncertaintyHandlingScorer ..> SupervisorSignals

    SupervisorConcisenessScorer ..> SupervisorSnapshot
    SupervisorConcisenessScorer ..> SupervisorSignals

    SupervisorScorersRegistry o-- SupervisorRoutingDisciplineScorer
    SupervisorScorersRegistry o-- SupervisorEvidenceGroundingScorer
    SupervisorScorersRegistry o-- SupervisorCoverageScorer
    SupervisorScorersRegistry o-- SupervisorActionabilityScorer
    SupervisorScorersRegistry o-- SupervisorUncertaintyHandlingScorer
    SupervisorScorersRegistry o-- SupervisorConcisenessScorer

    class SourceDiversityScorer {
      +string id
      +string name
      +string description
      +string type
      +preprocess(run)
      +analyze(results)
      +generateScore(results)
    }

    class ResearchCompletenessScorer {
      +preprocess(run)
      +analyze(results)
      +generateScore(results)
    }

    class ParsedResearchOutput {
      +Source[] sources
      +Learning[] learnings
      +string summary
      +string data
    }

    class Source {
      +string url
      +string title
    }

    class Learning {
      +string insight
      +string followUp
    }

    SourceDiversityScorer ..> ParsedResearchOutput : parses
    SourceDiversityScorer ..> Source : evaluates
    ResearchCompletenessScorer ..> ParsedResearchOutput : parses
    ResearchCompletenessScorer ..> Learning : evaluates

    class ScorerUtils {
      +clamp(value,min,max)
      +safeJsonParse(text)
      +extractUrls(text)
      +normalizeDomain(url)
      +getTopDomains(domains)
    }

    SourceDiversityScorer ..> ScorerUtils : uses
    ResearchCompletenessScorer ..> ScorerUtils : uses
Loading

Class diagram for new tools, browser agent, and logging/memory infrastructure

classDiagram
    direction LR

    class BinanceCryptoInput {
      +string function
      +string symbol
      +string[] symbols
      +string quoteAsset
      +number limit
      +string interval
      +number fromId
      +number startTime
      +number endTime
      +string timeZone
    }

    class BinanceSymbolSelection {
      +string symbol
      +string[] symbols
      +string quoteAsset
    }

    class BinanceMarketRequestInput {
      +number limit
      +string interval
      +number fromId
      +number startTime
      +number endTime
      +string timeZone
    }

    class BinanceSpotMarketDataItem {
      +string symbol
      +BinanceSpotMarketDataData data
    }

    class BinanceSpotMarketDataData {
      <<union>>
    }

    class BinanceSpotMarketDataOutput {
      +BinanceSpotMarketDataData data
      +metadata
    }

    class BinanceSpotMetadata {
      +string source
      +string function
      +string symbol
      +string[] symbols
      +string market
    }

    BinanceSpotMarketDataOutput o-- BinanceSpotMarketDataData
    BinanceSpotMarketDataOutput o-- BinanceSpotMetadata
    BinanceSpotMarketDataData o-- BinanceSpotMarketDataItem

    class BinanceSpotMarketDataTool {
      +string id
      +string description
      +onInputStart(context)
      +execute(input,context)
    }

    BinanceSpotMarketDataTool ..> BinanceCryptoInput : inputSchema
    BinanceSpotMarketDataTool ..> BinanceSpotMarketDataOutput : outputSchema
    BinanceSpotMarketDataTool ..> BinanceSymbolSelection : uses
    BinanceSpotMarketDataTool ..> BinanceMarketRequestInput : uses
    BinanceSpotMarketDataTool ..> BinanceSpotMarketDataItem : returns

    class DiscordWebhookTool {
      +string id
      +string description
      +execute(input) DiscordWebhookOutput
    }

    class DiscordWebhookInput {
      +string content
      +string username
      +string avatarUrl
      +boolean tts
    }

    class DiscordWebhookMessage {
      +string id
      +string channel_id
      +string timestamp
    }

    class DiscordWebhookOutput {
      +boolean ok
      +number status
      +string content
      +string id
      +string channel_id
      +string timestamp
    }

    DiscordWebhookTool ..> DiscordWebhookInput : inputSchema
    DiscordWebhookTool ..> DiscordWebhookOutput : outputSchema
    DiscordWebhookOutput --|> DiscordWebhookMessage

    class AgentBrowser {
      +boolean headless
      +viewport
      +number timeout
      +string scope
      +screencast
      +onLaunch(callback)
      +onClose(callback)
    }

    class StagehandBrowser {
      +string env
      +string scope
      +number verbose
      +viewport
      +screencast
    }

    class BrowsersConfig {
      +agentBrowser
      +stagehand
      +resolveChromeCdpUrl()
    }

    BrowsersConfig o-- AgentBrowser
    BrowsersConfig o-- StagehandBrowser

    class BrowserAgent {
      +string id
      +string name
      +string description
      +string model
    }

    BrowserAgent ..> AgentBrowser : browser

    class LibsqlStore {
      +string url
      +string apiKey
      +string tableName
      +string vectorTableName
      +string namespace
    }

    class LibSQLVectorStore {
      +string url
      +string apiKey
      +string tableName
      +string namespace
    }

    class LibsqlMemoryConfig {
      +LibsqlStore store
      +LibSQLVectorStore vectorStore
      +template
    }

    class LibsqlMemory {
      +initialize()
      +load()
      +save()
    }

    LibsqlMemory o-- LibsqlMemoryConfig
    LibsqlMemoryConfig o-- LibsqlStore
    LibsqlMemoryConfig o-- LibSQLVectorStore

    class PinoLogger {
      +string name
      +string level
      +boolean prettyPrint
      +LoggerTransport[] transports
      +info(message,data)
      +error(message,data)
    }

    class LoggerTransport {
      <<interface>>
      +log(level,message,data)
    }

    class FileTransport {
      +string path
      +log(level,message,data)
    }

    FileTransport ..|> LoggerTransport

    class LoggerHelpers {
      +logWorkflowStart(workflowId,input)
      +logWorkflowEnd(workflowId,output)
      +logStepStart(workflowId,stepId,input)
      +logStepEnd(workflowId,stepId,output)
      +logToolExecution(toolId,input,output)
      +logAgentActivity(agentId,input,output)
      +logError(message,error)
      +logProgress(message,data)
    }

    LoggerHelpers ..> PinoLogger : uses
    PinoLogger o-- FileTransport

    class ResearchAgent {
      +string id
      +string name
      +string description
      +browser
      +memory
      +tools
      +channels
    }

    ResearchAgent ..> BinanceSpotMarketDataTool
    ResearchAgent ..> DiscordWebhookTool
    ResearchAgent ..> BrowserAgent
    ResearchAgent ..> LibsqlMemory

    class MastraEditorConfig {
      +log
      +toolProviders
      +sandboxes
    }

    class MastraEditorInstance {
      +MastraEditorConfig config
    }

    class ArcadeToolProvider {
      +string apiKey
    }

    class ComposioToolProvider {
      +string apiKey
    }

    MastraEditorConfig o-- ArcadeToolProvider
    MastraEditorConfig o-- ComposioToolProvider
Loading

File-Level Changes

Change Details Files
Extend Binance spot market data tool to support batch symbol requests and shared response handling.
  • Make symbol optional and introduce a symbols array in the Binance tool input schema, with validation that at least one is provided.
  • Add helper types and a resolveBinanceSymbols function to normalize and deduplicate requested symbols.
  • Refactor request execution into fetchBinanceMarketDataForSymbol and iterate over all symbols to build either a single payload or an array of per-symbol results.
  • Update metadata, progress messages, and countBinanceMarketDataItems to handle batched responses and symbol lists.
src/mastra/tools/binance-crypto-market.tool.ts
Improve Yahoo Finance and Stooq stock tools and helpers with normalized quote snapshots and better fallbacks.
  • Add StooqQuoteSnapshot and YahooQuoteSnapshot shapes plus helper builders in market-data.helpers, including chart-based snapshot construction.
  • Normalize Yahoo Finance stock tool to trim/uppercase symbols and fall back to chart history when quote endpoint returns no row, using the new snapshot helper.
  • Refactor Stooq stock tool quote path to build snapshots from either the primary CSV endpoint or daily history as a fallback, and to return trimmed symbol metadata.
  • Expand tests in market-data.helpers.test to cover new helpers and Binance symbol resolution shape.
src/mastra/tools/market-data.helpers.ts
src/mastra/tools/yahoo-finance-stock.tool.ts
src/mastra/tools/stooq-stock-market-data.tool.ts
src/mastra/tools/tests/market-data.helpers.test.ts
Introduce supervisor-focused scorers and custom research scorers, and rewire eval experiments/tests to use them from their new locations.
  • Add supervisor-scorers module implementing routing discipline, evidence grounding, request coverage, actionability, uncertainty handling, and conciseness scorers using lightweight heuristics over agent runs.
  • Add custom-scorers with source diversity and research completeness logic that parse JSON-ish research outputs and URLs.
  • Move eval experiment imports/tests from src/mastra/evals to src/mastra/agents/evals, adjust prompts to be more realistic, and point keyword coverage/textual difference/source diversity experiments at the new scorer modules.
src/mastra/scorers/supervisor-scorers.ts
src/mastra/scorers/custom-scorers.ts
src/mastra/agents/evals/agent-experiments.ts
src/mastra/agents/evals/context-relevance.test.ts
src/mastra/agents/evals/keyword-coverage.test.ts
Add Discord webhook tool and integrate Discord/browser capabilities into the research and browser agents.
  • Create discordWebhookTool that posts validated messages to a configured DISCORD_WEBHOOK_URL and surfaces status/IDs in the response.
  • Export the new tool and wire it into researchAgent tools so research flows can push summaries to Discord.
  • Define a dedicated browserAgent using the shared AgentBrowser CDP runtime and LibsqlMemory, and update researchAgent to use agentBrowser and AgentChannels with a Discord adapter instead of Stagehand.
src/mastra/tools/discord-webhook.tool.ts
src/mastra/tools/index.ts
src/mastra/agents/browserAgent.ts
src/mastra/agents/researchAgent.ts
Refine shared browser runtime configuration for Chrome CDP-based automation.
  • Introduce resolveChromeCdpUrl with environment-variable resolution and sensible default, plus shared viewport/screencast configs.
  • Instantiate AgentBrowser with shared scope, timeout, screencast, and launch/close logging hooks using the central logger.
  • Reuse the shared viewport/screencast in StagehandBrowser to keep behaviors aligned.
src/mastra/browsers.ts
Switch logging to a file-based Pino transport while keeping workflow logging helpers.
  • Create a logs directory and app.log file at process root if missing, and configure a FileTransport for PinoLogger.
  • Enable prettyPrint on the Pino logger configuration and drop custom log-to-file wrapper and associated functions.
  • Keep existing workflow logging helper functions but have them only emit through PinoLogger.
src/mastra/config/logger.ts
Adopt LibSQL-backed memory and tools across agents, networks, workflows, and tests.
  • Replace pgMemory/pgQueryTool with LibsqlMemory/libsqlQueryTool in agent examples, the telephone-game workflow, and agent network.
  • Adjust README snippets, test mocks, and workspace config references from pg-storage to libsql and add extra writable skill path.
  • Tighten LibsqlMemory template to a more structured context format and log the LibSQL URL at initialization.
src/mastra/config/libsql.ts
src/mastra/workflows/telephone-game.ts
src/mastra/networks/index.ts
src/mastra/agents/my-agent.ts (README examples)
src/mastra/agents/index.test.ts
src/mastra/agents/reportAgent.ts
src/mastra/agents/scriptWriterAgent.ts
src/mastra/workspaces.ts
src/mastra/evals/tests/* (mocks)
Enhance Better Auth integration with Google OAuth/OneTap and front-end auth flows.
  • Extend authClient with oneTapClient plugin, configured via GOOGLE_CLIENT_ID, and wire up prompt backoff options.
  • Add Google OAuth provider configuration in mastra/auth, including redirect URI and support for localhost/127.0.0.1 trusted origins.
  • Update login and signup pages to trigger Better Auth One Tap when unauthenticated and add explicit "Continue with Google" buttons using new signIn.social flow; minor layout tweak to card radius.
  • Expose auth callback route by re-exporting auth handler from a dedicated /api/callback endpoint.
lib/auth-client.ts
src/mastra/auth.ts
app/login/page.tsx
app/login/signup/page.tsx
app/api/callback/route.ts
Improve docs, versioning, and dependency set to reflect new browser/workspace capabilities and security patches.
  • Update README feature matrices and sections to describe live browser automation, workspaces/sandboxes, browser runtime setup (chrome:debug), and LibSQL-backed shared utilities; update version strings and last updated dates.
  • Replace a performance xychart with a browser automation sequence diagram and add a detailed browser-runtime section including packages and architecture.
  • Add npm chrome:debug script, bump project version to 1.0.43, and refresh several dependency versions plus override set (ai, OpenRouter provider, voice-google, prettier, lodash/tar/hono/etc.) for features and security.
README.md
package.json
package-lock.json
next.config.ts
Miscellaneous fixes and test/data updates for new behavior.
  • Relax MDX remark plugin list by removing remark-highlight, likely fixing dev bundler issues.
  • Add Yahoo/Stooq helper tests and noise/completeness/tone/tool-call eval tests in the new agents/evals path (files partially shown).
  • Update test-results.json and some public workspace data to reflect the new tools and behaviors.
next.config.ts
src/mastra/tools/tests/market-data.helpers.test.ts
src/mastra/agents/evals/*.test.ts
tests/test-results/test-results.json
src/mastra/public/workspace/workspace/Agent tools

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 15, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agent-stack Error Error Apr 15, 2026 2:11am

@github-actions
Copy link
Copy Markdown

🤖 Hi @ssdeanx, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 15, 2026

Caution

Review failed

Pull request was closed or merged during review

Summary by CodeRabbit

Release Notes

  • New Features

    • Google OAuth and One-Tap Sign-In for simplified authentication
    • Discord webhook integration for agent messaging
    • Live browser automation via Chrome DevTools Protocol
  • Improvements

    • Enhanced stock and crypto market data tools with batch symbol support
    • Updated research agent with Discord messaging capabilities
    • Improved fallback mechanisms for financial data retrieval
  • Configuration

    • Added new environment variables for Google OAuth, Discord, and browser automation
    • Updated authentication and workspace settings

Walkthrough

This pull request introduces Google OAuth and one-tap sign-in authentication, Chrome DevTools Protocol browser automation support, Discord webhook integration, supervisor-focused evaluation scorers, and reorganized eval infrastructure. It adds environment variables for Chrome CDP, Discord, and Google configuration, updates multiple agents to use LibSQL-backed memory, enhances market data tools with batch symbol support, and includes comprehensive test coverage for new scoring capabilities.

Changes

Cohort / File(s) Summary
Authentication & OAuth Integration
lib/auth-client.ts, src/mastra/auth.ts, app/login/page.tsx, app/login/signup/page.tsx, app/api/callback/route.ts
Added Google OAuth provider configuration and one-tap sign-in support. Integrated Google one-tap client with auto-select enabled and redirect UX mode. Added callback route re-export and updated login/signup UI with "Continue with Google" buttons, including loading states and error handling.
Browser Automation & Chrome CDP
src/mastra/browsers.ts, src/mastra/agents/browserAgent.ts, .env.example, package.json
Introduced Chrome DevTools Protocol (CDP) support with agentBrowser exported as configurable browser instance. Added browserAgent with Gemini 3.1 Flash Lite model and LibSQL memory. Added CHROME_CDP_URL and CHROME_REMOTE_DEBUGGING_URL environment variables. Added chrome:debug npm script for launching Chrome with remote debugging.
Discord Integration
src/mastra/tools/discord-webhook.tool.ts, src/mastra/tools/index.ts, src/mastra/agents/researchAgent.ts, .env.example
Added discordWebhookTool for sending messages via Discord webhooks. Integrated Discord webhook capability into researchAgent with channel-based messaging via createDiscordAdapter(). Added Discord bot and webhook environment variables (DISCORD_BOT_TOKEN, DISCORD_PUBLIC_KEY, DISCORD_CLIENT_ID, DISCORD_SECRET_KEY, DISCORD_WEBHOOK_URL).
Evaluation & Scoring Infrastructure
src/mastra/scorers/supervisor-scorers.ts, src/mastra/scorers/supervisor-scorers.test.ts, src/mastra/scorers/custom-scorers.ts, src/mastra/agents/evals/completeness.test.ts, src/mastra/agents/evals/context-precision.test.ts, src/mastra/agents/evals/context-relevance.test.ts, src/mastra/agents/evals/keyword-coverage.test.ts, src/mastra/agents/evals/noise-sensitivity.test.ts, src/mastra/agents/evals/tone-consistency.test.ts, src/mastra/agents/evals/tool-call-accuracy.test.ts
Reorganized eval modules from evals/scorers/ to scorers/ directory structure. Introduced six supervisor-focused scorers (routing discipline, evidence grounding, request coverage, actionability, uncertainty handling, conciseness) with comprehensive test coverage. Migrated custom scorers (source diversity, research completeness) to new location with updated imports throughout.
Market Data & Stock Tools Enhancement
src/mastra/tools/binance-crypto-market.tool.ts, src/mastra/tools/market-data.helpers.ts, src/mastra/tools/stooq-stock-market-data.tool.ts, src/mastra/tools/yahoo-finance-stock.tool.ts, src/mastra/tools/tests/market-data.helpers.test.ts
Extended Binance tool to support batch symbol queries (1–10 symbols) with normalized symbol handling and fallback logic. Added StooqQuoteSnapshot and YahooQuoteSnapshot helpers for quote normalization. Updated Stooq/Yahoo tools with symbol trimming, fallback chart-based quote resolution, and consistent metadata handling.
Memory Migration to LibSQL
src/mastra/agents/index.test.ts, src/mastra/config/libsql.ts, src/mastra/networks/index.ts, src/mastra/workflows/telephone-game.ts, src/mastra/agents/reportAgent.ts, src/mastra/agents/scriptWriterAgent.ts
Migrated all agents from pgMemory to LibsqlMemory backend. Updated libsql configuration with conditional user context fields and improved logging. Removed unused pgMemory imports.
Logging & Configuration Updates
src/mastra/config/logger.ts, src/mastra/config/libsql.ts, next.config.ts, package.json
Refactored logger to use Pino FileTransport instead of synchronous file operations. Updated MDX configuration to remove remark-highlight plugin. Bumped version to 1.0.43 and updated AI SDK dependencies. Added @better-auth/agent-auth and various override packages.
Agent Integrations & Tool Updates
src/mastra/agents/evals/agent-experiments.ts, src/mastra/agents/researchAgent.ts, src/mastra/index.ts, src/mastra/tools/document-chunking.tool.ts
Updated researchAgent to use agentBrowser, integrated Discord webhooks, and added channel messaging. Updated import paths for scorers and evals across modules. Refreshed experiment prompts for richer deliverables (KPIs, risk mitigation, multi-variant outputs). Migrated document-chunking tool to LibSQL vector storage.
Documentation & Configuration
README.md, .env.example, src/mastra/public/workspace/..., src/mastra/evals/AGENTS.md, src/mastra/workspaces.ts
Updated README with version 1.0.43, added Live Browser Automation section with CDP flow diagram, replaced MDX throughput chart with sequence diagram, and switched example code snippets from pgQueryTool/pgMemory to libsqlQueryTool/LibsqlMemory. Added Google OAuth callback URLs documentation. Removed AGENTS.md eval documentation. Added .agents/skills to workspace read-write paths.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant LoginPage as Login Page
    participant AuthClient as Auth Client
    participant GoogleAPI as Google OAuth
    participant CallbackAPI as Callback Route
    participant Session as Session Store

    User->>LoginPage: Visit login page
    LoginPage->>AuthClient: authClient.oneTap()
    AuthClient->>GoogleAPI: Load one-tap UI
    GoogleAPI-->>LoginPage: Display one-tap widget
    
    User->>GoogleAPI: Click "Continue with Google"
    GoogleAPI->>AuthClient: Credential response
    AuthClient->>CallbackAPI: Redirect with auth token
    CallbackAPI->>Session: Validate & create session
    Session-->>LoginPage: Session established
    LoginPage->>User: Redirect to dashboard
Loading
sequenceDiagram
    participant Agent as Browser Agent
    participant BrowserClient as AgentBrowser
    participant CDPServer as Chrome CDP Server
    participant ChromeInstance as Chrome Instance

    Agent->>BrowserClient: initialize()
    BrowserClient->>CDPServer: Connect (cdpUrl)
    CDPServer-->>BrowserClient: Connection established
    BrowserClient->>ChromeInstance: Launch session
    
    Agent->>BrowserClient: navigate(url)
    BrowserClient->>ChromeInstance: Navigate to URL
    ChromeInstance-->>BrowserClient: Page loaded
    
    Agent->>BrowserClient: screenshot()
    BrowserClient->>ChromeInstance: Capture viewport
    ChromeInstance-->>BrowserClient: PNG image
    
    Agent->>Agent: Analyze snapshot
    BrowserClient->>CDPServer: Close session
Loading
sequenceDiagram
    participant ResearchAgent as Research Agent
    participant DiscordTool as Discord Webhook Tool
    participant DiscordAPI as Discord API
    participant Channel as Discord Channel

    ResearchAgent->>ResearchAgent: Complete research
    ResearchAgent->>DiscordTool: Send summary message
    DiscordTool->>DiscordTool: Read DISCORD_WEBHOOK_URL
    DiscordTool->>DiscordAPI: POST to webhook (wait=true)
    DiscordAPI->>Channel: Post message
    Channel-->>DiscordAPI: Message created
    DiscordAPI-->>DiscordTool: JSON response (id, timestamp)
    DiscordTool-->>ResearchAgent: { ok, status, content, id }
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • sourcery-ai

Poem

🐰 Whiskers twitch with auth delight,
Google signs in, one-tap so bright,
Discord speaks through webhook song,
Browser hops where Chrome belongs,
Scorers measure truth with care—
What a bounty, rich and rare! 🌟

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The description relates to the changeset by highlighting new scorers, tools, and features; however, it mentions specific scorers (factualityScorer, financialDataScorer, taskCompleteScorer, keywordCoverageScorer) that do not appear in the actual codebase changes. Verify that the mentioned scorers (factualityScorer, financialDataScorer, taskCompleteScorer, keywordCoverageScorer) are actually implemented in the PR; the summaries show supervisorScorers and custom-scorers but not these exact names.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main additions: new scorers (supervisor-focused and custom research scorers) and new tools (Discord webhook, market data enhancements), plus browser automation and authentication features.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch develop

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@kilo-code-bot
Copy link
Copy Markdown
Contributor

kilo-code-bot Bot commented Apr 15, 2026

Code Review Summary

Status: No Issues Found | Recommendation: Merge

Files Reviewed (9 files)
  • .env.example - Added environment variables for Chrome automation and OAuth
  • README.md - Updated version and added feature documentation
  • app/api/callback/route.ts - New OAuth callback route
  • app/login/page.tsx - Added Google OAuth sign-in functionality
  • app/login/signup/page.tsx - Added Google OAuth sign-in functionality
  • lib/auth-client.ts - Added Better Auth plugins for OAuth
  • next.config.ts - Removed unused remark plugin
  • package-lock.json - Dependency updates
  • package.json - Version bump

Reviewed by grok-code-fast-1:optimized:free · 219,300 tokens

@github-actions
Copy link
Copy Markdown

🤖 I'm sorry @ssdeanx, but I was unable to process your request. Please see the logs for more details.

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 4 issues, and left some high level feedback:

  • In lib/auth-client.ts, GOOGLE_CLIENT_ID is read from process.env in client-side code; consider switching to a NEXT_PUBLIC_GOOGLE_CLIENT_ID (or similar) so the value is actually available in the browser bundle and to avoid relying on server-only env semantics in React code.
  • The chrome:debug script in package.json is hardcoded to a Windows-style cmd /c start path to Chrome; if this project is expected to run on macOS/Linux as well, consider adding cross-platform variants or a small Node helper to resolve the Chrome executable per OS.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `lib/auth-client.ts`, `GOOGLE_CLIENT_ID` is read from `process.env` in client-side code; consider switching to a `NEXT_PUBLIC_GOOGLE_CLIENT_ID` (or similar) so the value is actually available in the browser bundle and to avoid relying on server-only env semantics in React code.
- The `chrome:debug` script in `package.json` is hardcoded to a Windows-style `cmd /c start` path to Chrome; if this project is expected to run on macOS/Linux as well, consider adding cross-platform variants or a small Node helper to resolve the Chrome executable per OS.

## Individual Comments

### Comment 1
<location path="src/mastra/config/logger.ts" line_range="14-15" />
<code_context>
-const logsDir: string = path.join(process.cwd(), 'data', 'logs')
-if (!fs.existsSync(logsDir)) {
-    fs.mkdirSync(logsDir, { recursive: true })
+if (!fs.existsSync(agentFsLogFile)) {
+    fs.writeFileSync(agentFsLogFile, '')
 }
-// Enhanced PinoLogger
</code_context>
<issue_to_address>
**issue (bug_risk):** Initializing the log file with `writeFileSync` can truncate existing logs

`fs.writeFileSync(agentFsLogFile, '')` will truncate any existing file at that path, including in cases like a broken symlink or a race between `existsSync` and this call. If this code ever re-runs (e.g., due to module reload), it could wipe existing logs. Prefer letting `FileTransport` create the file, or use `fs.openSync(agentFsLogFile, 'a')` to create/ensure it without truncation.
</issue_to_address>

### Comment 2
<location path="src/mastra/config/logger.ts" line_range="23-27" />
<code_context>
+})
+
+// Logger intentionally contains no tracing logic. Observability exporters/bridges handle traces separately.
 export const log = new PinoLogger({
     name: 'MastraLogger',
     level: 'info',
-    // Enable pretty printing in development
-    ...(process.env.NODE_ENV === 'development' && {
-        transport: {
-            target: 'pino-pretty',
-            options: {
-                colorize: true,
-                translateTime: 'SYS:standard',
-                ignore: 'pid,hostname',
-            },
-        },
-    }),
+    prettyPrint: true, // Set to false in production to disable pretty printing and output raw JSON
+    transports: {
+        file: fileTransport,
+    },
</code_context>
<issue_to_address>
**suggestion (performance):** Consider making pretty-print conditional to avoid verbose logs in production

Hardcoding `prettyPrint: true` is useful locally but will increase log volume and make logs harder to consume programmatically in production. Consider toggling this based on `NODE_ENV` (or a dedicated env flag) so production keeps compact JSON while dev uses pretty output.
</issue_to_address>

### Comment 3
<location path="src/mastra/tools/tests/market-data.helpers.test.ts" line_range="15" />
<code_context>
     parseStooqCsv,
 } from '../market-data.helpers'

+function resolveBinanceSymbolsForTest(symbol: string, quoteAsset = 'USDT'): string[] {
+    return [buildBinanceSymbol(symbol, quoteAsset)]
+}
</code_context>
<issue_to_address>
**issue (testing):** Test helper re-implements logic instead of exercising the new `resolveBinanceSymbols` function

This helper re-creates `resolveBinanceSymbols` behavior instead of using the real function, so changes to the implementation won’t be caught by these tests.

Prefer importing and exercising `resolveBinanceSymbols` directly (or moving it to a shared helper) and add cases for:
- `symbols` only
- both `symbol` and `symbols` (precedence)
- cross-symbol deduplication
- truncation to 10 symbols

That way the batching and routing behavior is validated end-to-end.
</issue_to_address>

### Comment 4
<location path="src/mastra/tools/tests/market-data.helpers.test.ts" line_range="49-58" />
<code_context>
         expect(rows[0].Date).toBe('2026-04-14')
     })

+    it('should build a Stooq quote snapshot from a row', () => {
+        const snapshot = buildStooqQuoteSnapshot(
+            {
</code_context>
<issue_to_address>
**suggestion (testing):** Add edge-case tests for Stooq/Yahoo snapshot helpers (no data, partial data, invalid timestamps)

The current tests exercise only the happy path for these helpers. Please also add parameterized cases for:

- `buildStooqQuoteSnapshot`: rows with missing numeric fields (e.g. empty `Volume`, missing `Close`) returning `null` (not `NaN`), and rows missing `Symbol`/`Name` falling back to the `symbol` argument and `null`.
- `buildYahooQuoteSnapshotFromChart`: `normalizeYahooChartHistory` returning `[]` (function returns `null`), last candle with an invalid/unparsable `timestamp` (sets `regularMarketTime` to `null`), and a single-candle chart (no previous close, so `marketChange`/`marketChangePercent` are `null`).

Suggested implementation:

```typescript
    it('should build a Stooq quote snapshot from a row', () => {
        const snapshot = buildStooqQuoteSnapshot(
            {
                Symbol: 'AAPL.US',
                Date: '2026-04-14',
                Time: '16:00:00',
                Open: '210.12',
                High: '214.88',
                Low: '209.44',
                Close: '213.75',
                Volume: '12345678',
                Name: 'Apple Inc',
            },
            'AAPL',
        )

        expect(snapshot).toEqual(
            expect.objectContaining({
                symbol: 'AAPL',
                shortName: 'Apple Inc',
                regularMarketPrice: 213.75,
                regularMarketOpen: 210.12,
                regularMarketDayHigh: 214.88,
                regularMarketDayLow: 209.44,
                regularMarketVolume: 12345678,
            }),
        )
    })

    it.each([
        [
            'missing close price',
            {
                Symbol: 'AAPL.US',
                Date: '2026-04-14',
                Time: '16:00:00',
                Open: '210.12',
                High: '214.88',
                Low: '209.44',
                // Close is intentionally missing
                Volume: '12345678',
                Name: 'Apple Inc',
            },
        ],
        [
            'empty volume',
            {
                Symbol: 'AAPL.US',
                Date: '2026-04-14',
                Time: '16:00:00',
                Open: '210.12',
                High: '214.88',
                Low: '209.44',
                Close: '213.75',
                Volume: '',
                Name: 'Apple Inc',
            },
        ],
    ])(
        'should return null for Stooq snapshot when %s',
        (_description, row) => {
            const snapshot = buildStooqQuoteSnapshot(row as any, 'AAPL')
            expect(snapshot).toBeNull()
        },
    )

    it('should fall back to symbol argument and null name when Stooq row is missing Symbol/Name', () => {
        const snapshot = buildStooqQuoteSnapshot(
            {
                Date: '2026-04-14',
                Time: '16:00:00',
                Open: '210.12',
                High: '214.88',
                Low: '209.44',
                Close: '213.75',
                Volume: '12345678',
            } as any,
            'MSFT',
        )

        expect(snapshot).not.toBeNull()
        expect(snapshot!.symbol).toBe('MSFT')
        // Some Stooq helpers may expose `shortName` or `longName`; we accept either as long as it falls back to null
        expect(snapshot!.shortName ?? (snapshot as any).longName ?? null).toBeNull()
    })

```

`) in the section of the file where the Yahoo helpers are currently tested.

Here are the concrete changes for the Stooq tests:

```xml
<file_operations>
  <file_operation operation="edit" file_path="src/mastra/tools/tests/market-data.helpers.test.ts">
<<<<<<< SEARCH
    it('should build a Stooq quote snapshot from a row', () => {
        const snapshot = buildStooqQuoteSnapshot(
            {
                Symbol: 'AAPL.US',
                Date: '2026-04-14',
                Time: '16:00:00',
                Open: '210.12',
                High: '214.88',
                Low: '209.44',
                Close: '213.75',
                Volume: '12345678',
                Name: 'Apple Inc',
            },
=======
    it('should build a Stooq quote snapshot from a row', () => {
        const snapshot = buildStooqQuoteSnapshot(
            {
                Symbol: 'AAPL.US',
                Date: '2026-04-14',
                Time: '16:00:00',
                Open: '210.12',
                High: '214.88',
                Low: '209.44',
                Close: '213.75',
                Volume: '12345678',
                Name: 'Apple Inc',
            },
            'AAPL',
        )

        expect(snapshot).toEqual(
            expect.objectContaining({
                symbol: 'AAPL',
                shortName: 'Apple Inc',
                regularMarketPrice: 213.75,
                regularMarketOpen: 210.12,
                regularMarketDayHigh: 214.88,
                regularMarketDayLow: 209.44,
                regularMarketVolume: 12345678,
            }),
        )
    })

    it.each([
        [
            'missing close price',
            {
                Symbol: 'AAPL.US',
                Date: '2026-04-14',
                Time: '16:00:00',
                Open: '210.12',
                High: '214.88',
                Low: '209.44',
                // Close is intentionally missing
                Volume: '12345678',
                Name: 'Apple Inc',
            },
        ],
        [
            'empty volume',
            {
                Symbol: 'AAPL.US',
                Date: '2026-04-14',
                Time: '16:00:00',
                Open: '210.12',
                High: '214.88',
                Low: '209.44',
                Close: '213.75',
                Volume: '',
                Name: 'Apple Inc',
            },
        ],
    ])(
        'should return null for Stooq snapshot when %s',
        (_description, row) => {
            const snapshot = buildStooqQuoteSnapshot(row as any, 'AAPL')
            expect(snapshot).toBeNull()
        },
    )

    it('should fall back to symbol argument and null name when Stooq row is missing Symbol/Name', () => {
        const snapshot = buildStooqQuoteSnapshot(
            {
                Date: '2026-04-14',
                Time: '16:00:00',
                Open: '210.12',
                High: '214.88',
                Low: '209.44',
                Close: '213.75',
                Volume: '12345678',
            } as any,
            'MSFT',
        )

        expect(snapshot).not.toBeNull()
        expect(snapshot!.symbol).toBe('MSFT')
        // Some Stooq helpers may expose `shortName` or `longName`; we accept either as long as it falls back to null
        expect(snapshot!.shortName ?? (snapshot as any).longName ?? null).toBeNull()
    })
>>>>>>> REPLACE
  </file_operation>
</file_operations>
```

<additional_changes>
To fully implement your comment for Yahoo helpers in `src/mastra/tools/tests/market-data.helpers.test.ts`, you should also:

1. Locate the existing tests around `buildYahooQuoteSnapshotFromChart` / `normalizeYahooChartHistory` and add:

   - A test where `normalizeYahooChartHistory` returns `[]`:
     - Use `jest.spyOn` on the module that exports `normalizeYahooChartHistory` and `mockReturnValue([])`.
     - Call `buildYahooQuoteSnapshotFromChart(...)` and expect the result to be `null`.

   - A test where the last candle has an invalid/unparsable `timestamp`:
     - Mock `normalizeYahooChartHistory` to return at least one candle where the last element has `timestamp: NaN` or an invalid value.
     - Assert that `snapshot.regularMarketTime` is `null` while other fields are still populated if possible.

   - A test for a single-candle chart:
     - Mock `normalizeYahooChartHistory` to return an array with exactly one candle (with valid OHLC/volume).
     - Assert that `snapshot.previousClose` is `null` (or absent, depending on the current implementation), and both `snapshot.marketChange` and `snapshot.marketChangePercent` are `null`.

2. Ensure your `jest.spyOn` targets the correct module object, e.g.:

   ```ts
   import * as marketDataHelpers from '../market-data.helpers'

   jest.spyOn(marketDataHelpers, 'normalizeYahooChartHistory').mockReturnValue([...])
   ```

   If the file already imports helpers differently (e.g. named imports), adapt the spy usage to match the existing import style.

3. If your snapshot type uses different property names than `regularMarketTime`, `marketChange`, or `marketChangePercent`, adjust the expectations in the new Yahoo tests to match the actual shape of the snapshot object returned by `buildYahooQuoteSnapshotFromChart`.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +14 to +15
if (!fs.existsSync(agentFsLogFile)) {
fs.writeFileSync(agentFsLogFile, '')
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Initializing the log file with writeFileSync can truncate existing logs

fs.writeFileSync(agentFsLogFile, '') will truncate any existing file at that path, including in cases like a broken symlink or a race between existsSync and this call. If this code ever re-runs (e.g., due to module reload), it could wipe existing logs. Prefer letting FileTransport create the file, or use fs.openSync(agentFsLogFile, 'a') to create/ensure it without truncation.

Comment on lines 23 to +27
export const log = new PinoLogger({
name: 'MastraLogger',
level: 'info',
// Enable pretty printing in development
...(process.env.NODE_ENV === 'development' && {
transport: {
target: 'pino-pretty',
options: {
colorize: true,
translateTime: 'SYS:standard',
ignore: 'pid,hostname',
},
},
}),
prettyPrint: true, // Set to false in production to disable pretty printing and output raw JSON
transports: {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (performance): Consider making pretty-print conditional to avoid verbose logs in production

Hardcoding prettyPrint: true is useful locally but will increase log volume and make logs harder to consume programmatically in production. Consider toggling this based on NODE_ENV (or a dedicated env flag) so production keeps compact JSON while dev uses pretty output.

parseStooqCsv,
} from '../market-data.helpers'

function resolveBinanceSymbolsForTest(symbol: string, quoteAsset = 'USDT'): string[] {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (testing): Test helper re-implements logic instead of exercising the new resolveBinanceSymbols function

This helper re-creates resolveBinanceSymbols behavior instead of using the real function, so changes to the implementation won’t be caught by these tests.

Prefer importing and exercising resolveBinanceSymbols directly (or moving it to a shared helper) and add cases for:

  • symbols only
  • both symbol and symbols (precedence)
  • cross-symbol deduplication
  • truncation to 10 symbols

That way the batching and routing behavior is validated end-to-end.

Comment on lines +49 to +58
it('should build a Stooq quote snapshot from a row', () => {
const snapshot = buildStooqQuoteSnapshot(
{
Symbol: 'AAPL.US',
Date: '2026-04-14',
Time: '16:00:00',
Open: '210.12',
High: '214.88',
Low: '209.44',
Close: '213.75',
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Add edge-case tests for Stooq/Yahoo snapshot helpers (no data, partial data, invalid timestamps)

The current tests exercise only the happy path for these helpers. Please also add parameterized cases for:

  • buildStooqQuoteSnapshot: rows with missing numeric fields (e.g. empty Volume, missing Close) returning null (not NaN), and rows missing Symbol/Name falling back to the symbol argument and null.
  • buildYahooQuoteSnapshotFromChart: normalizeYahooChartHistory returning [] (function returns null), last candle with an invalid/unparsable timestamp (sets regularMarketTime to null), and a single-candle chart (no previous close, so marketChange/marketChangePercent are null).

Suggested implementation:

    it('should build a Stooq quote snapshot from a row', () => {
        const snapshot = buildStooqQuoteSnapshot(
            {
                Symbol: 'AAPL.US',
                Date: '2026-04-14',
                Time: '16:00:00',
                Open: '210.12',
                High: '214.88',
                Low: '209.44',
                Close: '213.75',
                Volume: '12345678',
                Name: 'Apple Inc',
            },
            'AAPL',
        )

        expect(snapshot).toEqual(
            expect.objectContaining({
                symbol: 'AAPL',
                shortName: 'Apple Inc',
                regularMarketPrice: 213.75,
                regularMarketOpen: 210.12,
                regularMarketDayHigh: 214.88,
                regularMarketDayLow: 209.44,
                regularMarketVolume: 12345678,
            }),
        )
    })

    it.each([
        [
            'missing close price',
            {
                Symbol: 'AAPL.US',
                Date: '2026-04-14',
                Time: '16:00:00',
                Open: '210.12',
                High: '214.88',
                Low: '209.44',
                // Close is intentionally missing
                Volume: '12345678',
                Name: 'Apple Inc',
            },
        ],
        [
            'empty volume',
            {
                Symbol: 'AAPL.US',
                Date: '2026-04-14',
                Time: '16:00:00',
                Open: '210.12',
                High: '214.88',
                Low: '209.44',
                Close: '213.75',
                Volume: '',
                Name: 'Apple Inc',
            },
        ],
    ])(
        'should return null for Stooq snapshot when %s',
        (_description, row) => {
            const snapshot = buildStooqQuoteSnapshot(row as any, 'AAPL')
            expect(snapshot).toBeNull()
        },
    )

    it('should fall back to symbol argument and null name when Stooq row is missing Symbol/Name', () => {
        const snapshot = buildStooqQuoteSnapshot(
            {
                Date: '2026-04-14',
                Time: '16:00:00',
                Open: '210.12',
                High: '214.88',
                Low: '209.44',
                Close: '213.75',
                Volume: '12345678',
            } as any,
            'MSFT',
        )

        expect(snapshot).not.toBeNull()
        expect(snapshot!.symbol).toBe('MSFT')
        // Some Stooq helpers may expose `shortName` or `longName`; we accept either as long as it falls back to null
        expect(snapshot!.shortName ?? (snapshot as any).longName ?? null).toBeNull()
    })

`) in the section of the file where the Yahoo helpers are currently tested.

Here are the concrete changes for the Stooq tests:

<file_operations>
  <file_operation operation="edit" file_path="src/mastra/tools/tests/market-data.helpers.test.ts">
<<<<<<< SEARCH
    it('should build a Stooq quote snapshot from a row', () => {
        const snapshot = buildStooqQuoteSnapshot(
            {
                Symbol: 'AAPL.US',
                Date: '2026-04-14',
                Time: '16:00:00',
                Open: '210.12',
                High: '214.88',
                Low: '209.44',
                Close: '213.75',
                Volume: '12345678',
                Name: 'Apple Inc',
            },
=======
    it('should build a Stooq quote snapshot from a row', () => {
        const snapshot = buildStooqQuoteSnapshot(
            {
                Symbol: 'AAPL.US',
                Date: '2026-04-14',
                Time: '16:00:00',
                Open: '210.12',
                High: '214.88',
                Low: '209.44',
                Close: '213.75',
                Volume: '12345678',
                Name: 'Apple Inc',
            },
            'AAPL',
        )

        expect(snapshot).toEqual(
            expect.objectContaining({
                symbol: 'AAPL',
                shortName: 'Apple Inc',
                regularMarketPrice: 213.75,
                regularMarketOpen: 210.12,
                regularMarketDayHigh: 214.88,
                regularMarketDayLow: 209.44,
                regularMarketVolume: 12345678,
            }),
        )
    })

    it.each([
        [
            'missing close price',
            {
                Symbol: 'AAPL.US',
                Date: '2026-04-14',
                Time: '16:00:00',
                Open: '210.12',
                High: '214.88',
                Low: '209.44',
                // Close is intentionally missing
                Volume: '12345678',
                Name: 'Apple Inc',
            },
        ],
        [
            'empty volume',
            {
                Symbol: 'AAPL.US',
                Date: '2026-04-14',
                Time: '16:00:00',
                Open: '210.12',
                High: '214.88',
                Low: '209.44',
                Close: '213.75',
                Volume: '',
                Name: 'Apple Inc',
            },
        ],
    ])(
        'should return null for Stooq snapshot when %s',
        (_description, row) => {
            const snapshot = buildStooqQuoteSnapshot(row as any, 'AAPL')
            expect(snapshot).toBeNull()
        },
    )

    it('should fall back to symbol argument and null name when Stooq row is missing Symbol/Name', () => {
        const snapshot = buildStooqQuoteSnapshot(
            {
                Date: '2026-04-14',
                Time: '16:00:00',
                Open: '210.12',
                High: '214.88',
                Low: '209.44',
                Close: '213.75',
                Volume: '12345678',
            } as any,
            'MSFT',
        )

        expect(snapshot).not.toBeNull()
        expect(snapshot!.symbol).toBe('MSFT')
        // Some Stooq helpers may expose `shortName` or `longName`; we accept either as long as it falls back to null
        expect(snapshot!.shortName ?? (snapshot as any).longName ?? null).toBeNull()
    })
>>>>>>> REPLACE
  </file_operation>
</file_operations>

<additional_changes>
To fully implement your comment for Yahoo helpers in src/mastra/tools/tests/market-data.helpers.test.ts, you should also:

  1. Locate the existing tests around buildYahooQuoteSnapshotFromChart / normalizeYahooChartHistory and add:

    • A test where normalizeYahooChartHistory returns []:

      • Use jest.spyOn on the module that exports normalizeYahooChartHistory and mockReturnValue([]).
      • Call buildYahooQuoteSnapshotFromChart(...) and expect the result to be null.
    • A test where the last candle has an invalid/unparsable timestamp:

      • Mock normalizeYahooChartHistory to return at least one candle where the last element has timestamp: NaN or an invalid value.
      • Assert that snapshot.regularMarketTime is null while other fields are still populated if possible.
    • A test for a single-candle chart:

      • Mock normalizeYahooChartHistory to return an array with exactly one candle (with valid OHLC/volume).
      • Assert that snapshot.previousClose is null (or absent, depending on the current implementation), and both snapshot.marketChange and snapshot.marketChangePercent are null.
  2. Ensure your jest.spyOn targets the correct module object, e.g.:

    import * as marketDataHelpers from '../market-data.helpers'
    
    jest.spyOn(marketDataHelpers, 'normalizeYahooChartHistory').mockReturnValue([...])

    If the file already imports helpers differently (e.g. named imports), adapt the spy usage to match the existing import style.

  3. If your snapshot type uses different property names than regularMarketTime, marketChange, or marketChangePercent, adjust the expectations in the new Yahoo tests to match the actual shape of the snapshot object returned by buildYahooQuoteSnapshotFromChart.

@ssdeanx ssdeanx merged commit e545992 into main Apr 15, 2026
55 of 61 checks passed
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands AgentStack’s evaluation and tooling surface by adding new scorers (including supervisor-focused scoring) and a Discord webhook tool, while also introducing broader platform changes (auth, browser automation, storage/memory, and workspace/runtime configuration).

Changes:

  • Added new scorers (factuality, financial integrity, keyword coverage, task completeness, and supervisor scorers) plus new/relocated eval utilities and tests.
  • Added a discordWebhookTool and enhanced market-data tools with additional fallbacks, normalization, and batch symbol support.
  • Introduced broader infrastructure updates (Google One Tap auth wiring, shared Chrome/CDP browser runtime, LibSQL memory usage, logger transport changes, sandbox workspace write paths).

Reviewed changes

Copilot reviewed 48 out of 55 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
tests/test-results/test-results.json Updates stored test run summary output.
src/mastra/workspaces.ts Adjusts AgentFS workspace sandbox writable paths.
src/mastra/workflows/telephone-game.ts Switches workflow agent memory backend to LibSQL.
src/mastra/tools/yahoo-finance-stock.tool.ts Normalizes symbols and adds chart-based quote fallback when quote endpoint is empty.
src/mastra/tools/tests/market-data.helpers.test.ts Adds tests for new snapshot builders and a binance symbol shape helper.
src/mastra/tools/stooq-stock-market-data.tool.ts Adds quote fallback to historical endpoint and uses shared snapshot builder.
src/mastra/tools/market-data.helpers.ts Adds Stooq/Yahoo quote snapshot helper types and builders.
src/mastra/tools/index.ts Exports the new Discord webhook tool.
src/mastra/tools/document-chunking.tool.ts Repoints pgVector import to LibSQL vector storage alias.
src/mastra/tools/discord-webhook.tool.ts Adds a tool to post messages to a configured Discord webhook.
src/mastra/tools/binance-crypto-market.tool.ts Adds batch symbol support and refactors fetching into helpers.
src/mastra/scorers/task-complete-scorer.ts Adds a heuristic scorer for task completeness.
src/mastra/scorers/supervisor-scorers.ts Adds multiple supervisor-focused scorers (routing discipline, evidence, coverage, actionability, uncertainty, conciseness).
src/mastra/scorers/supervisor-scorers.test.ts Adds tests for several supervisor scorers.
src/mastra/scorers/keyword-coverage.ts Adds keyword coverage scorer driven by requestContext.requiredKeywords.
src/mastra/scorers/financial-scorers.ts Adds a financial output integrity scorer with JSON parsing + sanity checks.
src/mastra/scorers/factuality.scorer.ts Adds an LLM-judged factuality scorer with a structured output schema.
src/mastra/scorers/custom-scorers.ts Adds richer source diversity + research completeness scorer logic and parsing helpers.
src/mastra/public/workspace/workspace/Agent tools Updates internal tool usage notes and summary.
src/mastra/networks/index.ts Switches network memory backend to LibSQL.
src/mastra/index.ts Rewires scorer imports and adds Mastra Editor tool providers (Arcade/Composio).
src/mastra/evals/tests/tool-call-accuracy.test.ts Removes old eval test location (moved under agents/evals).
src/mastra/evals/tests/tone-consistency.test.ts Removes old eval test location (moved under agents/evals).
src/mastra/evals/tests/context-precision.test.ts Removes old eval test location (moved under agents/evals).
src/mastra/evals/tests/completeness.test.ts Removes old eval test location (moved under agents/evals).
src/mastra/evals/scorers/custom-scorers.ts Removes old scorer implementation location (replaced by src/mastra/scorers/custom-scorers.ts).
src/mastra/evals/AGENTS.md Removes old eval folder documentation file.
src/mastra/config/logger.ts Switches to file transport-based logging and always-on pretty printing.
src/mastra/config/libsql.ts Updates LibSQL memory template content and logs initialization metadata.
src/mastra/browsers.ts Adds shared viewport/screencast config and CDP URL resolution + logging hooks.
src/mastra/auth.ts Adds Google provider + One Tap plugin and updates trusted origins + proxy URL config.
src/mastra/agents/scriptWriterAgent.ts Removes pgMemory import (memory migration cleanup).
src/mastra/agents/researchAgent.ts Adds Discord webhook tool, switches browser runtime, and configures Discord channel adapter.
src/mastra/agents/reportAgent.ts Removes pgMemory import (memory migration cleanup).
src/mastra/agents/index.test.ts Updates scorer import path after scorer relocation.
src/mastra/agents/evals/utils.ts Adds local re-export shim for eval utils/types.
src/mastra/agents/evals/tool-call-accuracy.test.ts Adds relocated tool-call-accuracy tests using new utils.
src/mastra/agents/evals/tone-consistency.test.ts Adds relocated tone-consistency tests using new utils.
src/mastra/agents/evals/prebuilt.ts Adds/relocates prebuilt eval scorer implementations and helpers.
src/mastra/agents/evals/noise-sensitivity.test.ts Updates imports to new agents/evals prebuilt + utils locations.
src/mastra/agents/evals/keyword-coverage.test.ts Updates keyword coverage test to new utils + scorer import path.
src/mastra/agents/evals/context-relevance.test.ts Updates imports and expands test message setup.
src/mastra/agents/evals/context-precision.test.ts Adds relocated context precision tests.
src/mastra/agents/evals/completeness.test.ts Adds relocated completeness tests.
src/mastra/agents/evals/agent-experiments.ts Updates imports and expands prompt inputs for experiments.
src/mastra/agents/browserAgent.ts Adds a dedicated browser agent wired to the shared CDP browser runtime.
package.json Bumps version, adds chrome:debug script, and updates dependencies/overrides.
next.config.ts Adjusts MDX remark plugin list.
lib/auth-client.ts Adds Better Auth One Tap client plugin configuration in the browser client.
app/login/signup/page.tsx Adds One Tap trigger + a “Continue with Google” button and handler.
app/login/page.tsx Adds One Tap trigger + a “Continue with Google” button and handler.
app/api/callback/route.ts Adds callback route re-export for auth handler.
README.md Updates README metadata/version and adds browser automation documentation + other updates.
.env.example Adds Chrome/CDP, Discord, and Google auth env placeholders and notes.

Comment thread src/mastra/index.ts
Comment on lines +198 to +205
toolProviders: {
composio: new ComposioToolProvider({
apiKey: process.env.COMPOSIO_API_KEY!,
}),
// Add other tool providers here
arcade: new ArcadeToolProvider({
apiKey: process.env.ARCADE_API_KEY!,
}),
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

process.env.COMPOSIO_API_KEY! and process.env.ARCADE_API_KEY! use non-null assertions. In environments where these keys are intentionally unset, this will still pass type-checking but can cause runtime failures during Mastra initialization.

Suggested fix: validate these env vars before constructing the providers (and either omit the provider when missing, or throw a clear configuration error message that names the missing variable).

Copilot uses AI. Check for mistakes.
Comment thread src/mastra/index.ts
Comment on lines 173 to +205
import { weatherWorkflow } from './workflows/weather-workflow'
//import { OtelBridge } from '@mastra/otel-bridge'
import { DefaultExporter } from '@mastra/observability'
import { keywordCoverageScorer } from './evals/scorers/keyword-coverage'
import { createCompletenessScorer } from './evals/scorers/prebuilt'
import { keywordCoverageScorer } from './scorers/keyword-coverage'
import { createCompletenessScorer } from './agents/evals/prebuilt'
//import { createToolCallAccuracyScorerCode } from '@mastra/evals/scorers/prebuilt'
import {
researchCompletenessScorer,
sourceDiversityScorer,
} from './evals/scorers/custom-scorers'
} from './scorers/custom-scorers'
// Harness
import { mainHarness } from './harness'
import { supervisorAgent } from './agents/supervisor-agent'
import { mastraAuth } from './auth'
import { agentFsWorkspace } from './workspaces'
import { MastraEditor } from '@mastra/editor'
import { MastraCompositeStore } from '@mastra/core/storage'
import { ArcadeToolProvider } from '@mastra/editor/arcade'
import { ComposioToolProvider } from '@mastra/editor/composio'

export const mastra = new Mastra({
workspace: agentFsWorkspace,
editor: new MastraEditor(
{
logger: log,
toolProviders: {
composio: new ComposioToolProvider({
apiKey: process.env.COMPOSIO_API_KEY!,
}),
// Add other tool providers here
arcade: new ArcadeToolProvider({
apiKey: process.env.ARCADE_API_KEY!,
}),
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description focuses on adding new scorers and a Discord webhook tool, but this diff also includes several unrelated, high-impact changes (e.g. Better Auth Google One Tap wiring, browser/CDP runtime setup, logger transport changes, workspace sandbox path changes, and memory backend swaps). This makes review and risk assessment harder.

Suggested fix: either update the PR description to explicitly cover these additional changes and their rollout implications, or split the non-eval/tool changes into separate PRs.

Copilot uses AI. Check for mistakes.
Comment thread src/mastra/workspaces.ts
Comment on lines 447 to +451
readWritePaths: ['/tmp/extra',
'../.agents/**/skills',
'/skills',
'/'
'/',
'.agents/skills'
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readWritePaths includes '/', which effectively grants the sandbox write access to the entire filesystem. This is a major sandbox-escape / data-loss risk and undermines the purpose of a restricted workspace.

Suggested fix: remove '/' and scope readWritePaths to the minimal required directories (e.g., the AgentFS working directory and specific skills folders). If broad access is required for a specific workflow, gate it behind an explicit config flag or approval-only workspace.

Copilot uses AI. Check for mistakes.
Comment thread app/login/signup/page.tsx
Comment on lines +71 to +73
void authClient.oneTap({
callbackURL: nextPath,
})
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The authClient.oneTap(...) call is not awaited and has no rejection handler. One Tap failures (blocked scripts, missing client id, user dismissal) can cause unhandled promise rejections.

Suggested fix: add .catch(...) inside the effect to swallow/handle expected failures (and optionally show a non-blocking message only when helpful).

Suggested change
void authClient.oneTap({
callbackURL: nextPath,
})
void authClient
.oneTap({
callbackURL: nextPath,
})
.catch((error) => {
console.debug('Google One Tap was unavailable or dismissed.', error)
})

Copilot uses AI. Check for mistakes.
Comment on lines +483 to +532
export const supervisorConcisenessScorer = createScorer({
id: 'supervisor-conciseness',
name: 'Supervisor Conciseness',
description:
'Checks whether the supervisor response is substantial enough to be useful without becoming excessively long.',
type: 'agent',
}).preprocess(({ run }) => buildSupervisorSnapshot(run))
.analyze(({ results }) => {
const signals = buildSignals(results.preprocessStepResult)

return {
hasResponse: signals.hasResponse,
wordCount: signals.wordCount,
paragraphCount: signals.paragraphCount,
hasStructure: signals.hasStructure,
}
})
.generateScore(({ results }) => {
const signals = buildSignals(results.preprocessStepResult)

if (!signals.hasResponse) {
return 0
}

const wordScore =
signals.wordCount < 90
? signals.wordCount / 90
: signals.wordCount <= 900
? 1
: clamp(1 - (signals.wordCount - 900) / 1500)

const paragraphScore =
signals.paragraphCount === 0
? 0
: signals.paragraphCount <= 5
? 1
: clamp(1 - (signals.paragraphCount - 5) / 10)

return clamp(wordScore * 0.7 + paragraphScore * 0.2 + (signals.hasStructure ? 0.1 : 0))
})
.generateReason(({ results, score }) => {
const signals = buildSignals(results.preprocessStepResult)
const details: string[] = []

details.push(`it is ${signals.wordCount} word(s) long`)
if (signals.paragraphCount > 0) details.push(`it uses ${signals.paragraphCount} paragraph(s)`)
if (signals.hasStructure) details.push('it has clear structure')

return generateSupervisorReason('Conciseness', score, details)
})
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

supervisorConcisenessScorer is newly introduced but isn’t covered by src/mastra/scorers/supervisor-scorers.test.ts (tests currently cover routing, evidence, coverage, actionability, uncertainty only). This leaves the word/paragraph threshold behavior unverified.

Suggested fix: add at least two tests asserting high score for a well-sized response (e.g. 150–300 words) and lower score for extremely short and extremely long outputs, plus one test for excessive paragraph splitting.

Copilot uses AI. Check for mistakes.
Comment thread package.json
"build:mastra": "mastra build",
"start:mastra": "mastra start",
"start:next": "next start",
"chrome:debug": "cmd /c start \"\" \"C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe\" --remote-debugging-port=9222 --user-data-dir=%TEMP%\\chrome-debug",
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chrome:debug uses a Windows-specific cmd /c start invocation and a hard-coded Chrome path. This will fail on macOS/Linux and on Windows machines where Chrome is installed elsewhere.

Suggested fix: either (a) document it explicitly as Windows-only and add platform-specific alternatives (e.g. chrome:debug:mac, chrome:debug:linux), or (b) switch to a cross-platform launcher (e.g. start via open/xdg-open, or playwright’s bundled Chromium) with an overridable CHROME_PATH env var.

Suggested change
"chrome:debug": "cmd /c start \"\" \"C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe\" --remote-debugging-port=9222 --user-data-dir=%TEMP%\\chrome-debug",
"chrome:debug": "node -e \"const { spawn } = require('child_process'); const os = require('os'); const path = require('path'); const platform = process.platform; const chromePath = process.env.CHROME_PATH || (platform === 'win32' ? 'C:\\\\Program Files\\\\Google\\\\Chrome\\\\Application\\\\chrome.exe' : platform === 'darwin' ? '/Applications/Google Chrome.app/Contents/MacOS/Google Chrome' : 'google-chrome'); const userDataDir = path.join(os.tmpdir(), 'chrome-debug'); const args = ['--remote-debugging-port=9222', '--user-data-dir=' + userDataDir]; const child = spawn(chromePath, args, { detached: true, stdio: 'ignore', shell: platform === 'win32' }); child.on('error', (error) => { console.error('Failed to launch Chrome. Set CHROME_PATH to your Chrome/Chromium executable. ' + error.message); process.exit(1); }); child.unref();\"",

Copilot uses AI. Check for mistakes.
Comment on lines +321 to +331
return {
symbol: row.Symbol ?? symbol,
date: row.Date ?? null,
time: row.Time ?? null,
open: row.Open ? Number(row.Open) : null,
high: row.High ? Number(row.High) : null,
low: row.Low ? Number(row.Low) : null,
close: row.Close ? Number(row.Close) : null,
volume: row.Volume ? Number(row.Volume) : null,
name: row.Name ?? null,
}
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buildStooqQuoteSnapshot converts string fields with Number(...), which will produce NaN for non-numeric values (e.g. empty string, "N/A"). Returning NaN breaks the advertised number | null contract and can poison downstream calculations.

Suggested fix: parse with Number(...) (or parseFloat) and convert non-finite results to null (e.g. const open = row.Open ? Number(row.Open) : null; return Number.isFinite(open) ? open : null). Apply to open/high/low/close/volume.

Copilot uses AI. Check for mistakes.
Comment on lines +5 to +10
const discordWebhookInputSchema = z.object({
content: z.string().min(1).describe('Message content to post to Discord.'),
username: z.string().min(1).optional().describe('Optional webhook display name override.'),
avatarUrl: z.url().optional().describe('Optional avatar image URL override.'),
tts: z.boolean().optional().describe('Whether Discord should read the message aloud.'),
})
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discord webhooks have strict payload limits (notably content max 2000 chars, and username/avatar constraints). Without enforcing these in the Zod schema, the tool will fail at runtime with a 400 and the error will be less actionable.

Suggested fix: add explicit .max(...) validations (e.g. content.max(2000)) and, if needed, cap/validate username and avatarUrl according to Discord webhook limits.

Copilot uses AI. Check for mistakes.
Comment thread lib/auth-client.ts
Comment on lines +16 to +28
oneTapClient({
clientId: process.env.GOOGLE_CLIENT_ID ?? 'your-google-client-id',
autoSelect: true,
context: 'signin',
uxMode: 'redirect',
additionalOptions: {
// Any extra options for the Google initialize method
},
// Configure prompt behavior and exponential backoff:
promptOptions: {
baseDelay: 1000, // Base delay in ms (default: 1000)
maxAttempts: 5 // Maximum number of attempts before triggering onPromptNotification (default: 5)
}
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oneTapClient is configured with process.env.GOOGLE_CLIENT_ID, but this module is imported by client components (e.g. app/login/page.tsx). In Next.js, non-NEXT_PUBLIC_ env vars are not available in the browser bundle, so this will typically become undefined at runtime and break One Tap.

Suggested fix: use a NEXT_PUBLIC_* env var for the client-side Google client id (or pass it in from the server), and remove the 'your-google-client-id' fallback so misconfiguration fails fast in development.

Copilot uses AI. Check for mistakes.
Comment thread app/login/page.tsx
Comment on lines +62 to +64
void authClient.oneTap({
callbackURL: nextPath,
})
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The authClient.oneTap(...) call is void'd inside useEffect without handling rejections. If One Tap fails (blocked third-party cookies, missing client id, user dismissed), this can surface as an unhandled promise rejection in the browser.

Suggested fix: attach a .catch(...) handler (and optionally log at debug level) so failures are handled gracefully without noisy console errors.

Suggested change
void authClient.oneTap({
callbackURL: nextPath,
})
void authClient
.oneTap({
callbackURL: nextPath,
})
.catch(() => {
return
})

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces live browser automation, Google OAuth integration (including One-Tap support), and a new Discord webhook tool. It also migrates agent memory and storage to LibSQL, adds a suite of supervisor agent scorers, and enhances market data tools with batch processing and fallback mechanisms. Feedback focuses on correcting a Zod schema error in the Discord tool, ensuring the CDP URL is properly resolved in the browser configuration, and improving performance in the Binance tool by parallelizing batch requests.

const discordWebhookInputSchema = z.object({
content: z.string().min(1).describe('Message content to post to Discord.'),
username: z.string().min(1).optional().describe('Optional webhook display name override.'),
avatarUrl: z.url().optional().describe('Optional avatar image URL override.'),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The Zod schema uses z.url(), which is not a valid method in the standard Zod library. This will cause a runtime error when the schema is initialized. It should be replaced with z.string().url() to correctly validate the URL format.

Suggested change
avatarUrl: z.url().optional().describe('Optional avatar image URL override.'),
avatarUrl: z.string().url().optional().describe('Optional avatar image URL override.'),

Comment thread src/mastra/browsers.ts
height: 720,
viewport: sharedViewport,
timeout: 30000,
cdpUrl: resolveChromeCdpUrl,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The cdpUrl property is being assigned the function reference resolveChromeCdpUrl instead of its return value. This will likely cause connection issues as the AgentBrowser constructor typically expects a string URL. Please invoke the function to pass the resolved URL string.

Suggested change
cdpUrl: resolveChromeCdpUrl,
cdpUrl: resolveChromeCdpUrl(),

Comment on lines +416 to 431
for (const currentSymbol of symbols) {
if (abortSignal?.aborted) {
throw new Error('Binance crypto request cancelled')
}
default:
throw new Error(`Unsupported Binance function: ${requestFunction}`)

const data = await fetchBinanceMarketDataForSymbol(
currentSymbol,
requestFunction,
input
)

results.push({
symbol: currentSymbol,
data,
})
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Batch requests for multiple symbols are currently executed sequentially in a for...of loop, which can lead to significant performance bottlenecks as the number of symbols increases (up to 10). Consider using Promise.all to parallelize these requests and improve the tool's efficiency.

Suggested change
for (const currentSymbol of symbols) {
if (abortSignal?.aborted) {
throw new Error('Binance crypto request cancelled')
}
default:
throw new Error(`Unsupported Binance function: ${requestFunction}`)
const data = await fetchBinanceMarketDataForSymbol(
currentSymbol,
requestFunction,
input
)
results.push({
symbol: currentSymbol,
data,
})
}
const results = await Promise.all(
symbols.map(async (currentSymbol) => {
if (abortSignal?.aborted) {
throw new Error('Binance crypto request cancelled');
}
const data = await fetchBinanceMarketDataForSymbol(
currentSymbol,
requestFunction,
input
);
return {
symbol: currentSymbol,
data,
};
})
);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants