Skip to content

Feat/#38 mock jobposting with data#96

Open
shinae1023 wants to merge 7 commits into
mainfrom
feat/#38-mock-jobposting-with-data
Open

Feat/#38 mock jobposting with data#96
shinae1023 wants to merge 7 commits into
mainfrom
feat/#38-mock-jobposting-with-data

Conversation

@shinae1023

@shinae1023 shinae1023 commented Jun 15, 2026

Copy link
Copy Markdown
Member

✨ 어떤 이유로 PR를 하셨나요?

  • feature 병합
  • 버그 수정(아래에 issue #를 남겨주세요)
  • 코드 개선
  • 코드 수정
  • 배포
  • 기타(아래에 자세한 내용 기입해주세요)

📋 세부 내용 - 왜 해당 PR이 필요한지 작업 내용을 자세하게 설명해주세요

  • pgvector 기반 JD/문항 검색 로직을 공용 CorpusRetrievalService로 분리했습니다.
  • 검색 대상은 항상 corpus embedding 테이블로 고정하고, 사용자 jobPosting/선택 직무는 query 생성 기준으로만 사용하도록 정리했습니다.
  • 유사도 검색 fallback 전략을 적용했습니다.
    • 1순위 회사명 + 직무(detailClassification)
    • 2순위 직무(detailClassification)
    • 3순위 대분류 + 중분류(job_group_l1 + job_family_l2)
  • 자소서 분석 흐름이 공용 retrieval 서비스를 사용하도록 리팩터링했습니다.
  • 분석 프롬프트에 유사 JD와 유사 자소서 문항 검색 결과를 함께 주입하도록 변경했습니다.
  • 모의공고 생성 흐름이 기존 job_postings 참조 대신 pgvector corpus retrieval 결과를 참고하도록 변경했습니다.
  • 모의공고 생성 프롬프트에 유사 JD뿐 아니라 유사 자소서 문항도 함께 참고 자료로 포함하도록 확장했습니다.
  • 추천 질문 생성 흐름도 공용 retrieval 서비스를 재사용하도록 변경했습니다.
  • 추천 질문 생성 시 실제 선택한 회사와 직무 기준으로 유사 문항/JD를 참고하도록 수정했습니다.
  • retrieval 결과를 확인할 수 있는 admin/debug API 흐름이 공용 retrieval 서비스를 사용하도록 정리했습니다.
  • 기존 mock 공고/추천 질문/분석 흐름을 유지하면서 retrieval 책임만 공통 서비스로 모아 중복 로직을 제거했습니다.

📸 작업 화면 스크린샷

⚠️ PR하기 전에 확인해주세요

  • 로컬테스트를 진행하셨나요?
  • 머지할 브랜치를 확인하셨나요?
  • 관련 label을 선택하셨나요?

🚨 관련 이슈 번호 [#38]

Summary by CodeRabbit

  • New Features

    • Added admin endpoint for previewing analysis retrieval data with semantically similar job postings and questions
    • Integrated corpus retrieval into analysis to enrich AI responses with contextually relevant reference content
  • Bug Fixes

    • Added validation for corpus import file paths to prevent unauthorized access
    • Improved error handling for missing entities during bootstrap
  • Refactor

    • Optimized vector search with proper dimension constraints and HNSW indexes for efficient similarity queries
    • Enhanced embedding client with configurable request timeouts

@shinae1023 shinae1023 requested a review from whc9999 June 15, 2026 05:34
@shinae1023 shinae1023 self-assigned this Jun 15, 2026
@shinae1023 shinae1023 added the ✨ feat New feature or request label Jun 15, 2026
@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@shinae1023, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 30 minutes and 40 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 03e7b674-ceef-4032-892a-c68461783c89

📥 Commits

Reviewing files that changed from the base of the PR and between 31762f7 and bd98089.

📒 Files selected for processing (13)
  • ops/db/migrations/20260615_mock_embedding_vector_1024.sql
  • src/main/java/com/jobdri/jobdri_api/domain/analysis/service/AnalysisAiClient.java
  • src/main/java/com/jobdri/jobdri_api/domain/corpus/controller/CorpusAdminController.java
  • src/main/java/com/jobdri/jobdri_api/domain/corpus/repository/MockQuestionCorpusRepository.java
  • src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusAdminRunner.java
  • src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusEmbeddingSyncService.java
  • src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusImportService.java
  • src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusRetrievalService.java
  • src/main/java/com/jobdri/jobdri_api/domain/jobposting/entity/MockQuestionCache.java
  • src/main/java/com/jobdri/jobdri_api/domain/jobposting/repository/MockQuestionCacheRepository.java
  • src/main/java/com/jobdri/jobdri_api/domain/jobposting/service/JobPostingAiService.java
  • src/main/java/com/jobdri/jobdri_api/domain/jobposting/service/MockQuestionCacheService.java
  • src/test/java/com/jobdri/jobdri_api/domain/jobposting/service/MockQuestionCacheServiceTest.java
📝 Walkthrough

Walkthrough

Introduces CorpusRetrievalService for pgvector-based semantic similarity search with tiered fallback, and integrates retrieval context into both AnalysisAiClient and JobPostingAiService prompts. Adds an admin debug endpoint for retrieval preview. Hardens corpus import, embedding sync, path validation, and classification resolution; updates schema with fixed vector dimensions and HNSW indexes.

Changes

RAG Corpus Retrieval Integration

Layer / File(s) Summary
Schema, embedding client interface, and configuration
src/main/resources/schema.sql, src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusEmbeddingClient.java, src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CohereCorpusEmbeddingClient.java, src/main/resources/application*.yaml
mock_*_embeddings columns narrowed to vector(1024) with HNSW cosine indexes added. CorpusEmbeddingClient gains InputType enum and embedDocuments/embedQuery defaults. CohereCorpusEmbeddingClient.embed updated to accept InputType with HTTP timeouts. YAML profiles add allowed-root, jd-limit, and question-limit properties; defer-datasource-initialization set to true.
CorpusRetrievalService: pgvector similarity search with tiered fallback
src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusRetrievalService.java
New @Service with retrieveForAnalysis and retrieveForMockGeneration entry points. Embeds query text via CorpusEmbeddingClient, then runs three-tier fallback pgvector SQL (company+detail → detail-only → hierarchy) for both job posting and question corpora. Results mapped to nested RetrievedJobPostingReference/RetrievedQuestionReference records wrapped in RetrievalContext. A StatementBinder functional interface binds PreparedStatement parameters.
Corpus import, sync, classification, and admin hardening
src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusImportService.java, src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusEmbeddingSyncService.java, src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusClassificationResolver.java, src/main/java/com/jobdri/jobdri_api/domain/corpus/controller/CorpusAdminController.java, src/main/java/com/jobdri/jobdri_api/domain/corpus/repository/Mock*CorpusRepository.java, src/main/java/com/jobdri/jobdri_api/domain/classification/repository/DetailClassificationRepository.java, src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusAdminRunner.java, src/main/java/com/jobdri/jobdri_api/domain/corpus/service/BootstrapAdminService.java
CorpusImportService adds per-import company cache, required-header validation, and robust integer parsing. CorpusEmbeddingSyncService switches to paginated repository queries, Spring DataSourceUtils connection management, and embedDocuments. CorpusClassificationResolver uses case-insensitive detail name lookup and validates inputs in registerMapping. CorpusAdminController adds allowed-root path traversal guard. Repositories add paginated/unpaginated validForEmbedding finders replacing embedding-text-filtered variants. CorpusAdminRunner wraps startup operations in try/catch. BootstrapAdminService logs missing users.
Analysis AI prompt enrichment and admin debug endpoint
src/main/java/com/jobdri/jobdri_api/domain/analysis/service/AnalysisAiClient.java, src/main/java/com/jobdri/jobdri_api/domain/analysis/service/AnalysisAdminDebugService.java, src/main/java/com/jobdri/jobdri_api/domain/analysis/controller/AnalysisAdminController.java, src/main/java/com/jobdri/jobdri_api/domain/analysis/dto/request/AnalysisRetrievalPreviewRequest.java, src/main/java/com/jobdri/jobdri_api/domain/analysis/dto/response/AnalysisRetrievalPreviewResponse.java
AnalysisAiClient injects CorpusRetrievalService, retrieves RetrievalContext on each analyze call, and adds formatted similar-JD and similar-question sections to the OpenAI prompt. New AnalysisAdminDebugService exposes previewRetrieval building a full AnalysisRetrievalPreviewResponse from retrieval results. AnalysisAdminController adds POST /api/admin/analysis/retrieval-preview with validated request and OpenAPI docs.
Mock generation prompt enrichment, company context, and test updates
src/main/java/com/jobdri/jobdri_api/domain/jobposting/service/JobPostingAiService.java, src/main/java/com/jobdri/jobdri_api/domain/jobposting/service/MockQuestionCacheService.java, src/test/java/com/jobdri/jobdri_api/domain/jobposting/service/JobPostingAiServiceTest.java, src/test/java/com/jobdri/jobdri_api/domain/jobposting/service/MockQuestionCacheServiceTest.java
JobPostingAiService replaces JobPostingRepository with CorpusRetrievalService, injects RetrievalContext into mock job posting and recommended question prompt builders, and updates fallback response construction to use RetrievedJobPostingReference fields. MockQuestionCacheService resolves Company by id and passes it to generateMockRecommendedQuestions. Tests updated to mock CorpusRetrievalService and stub retrieveForMockGeneration with RetrievalContext.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant AnalysisAiClient
  participant JobPostingAiService
  participant CorpusRetrievalService
  participant CorpusEmbeddingClient
  participant pgvector as pgvector DB

  rect rgba(100, 149, 237, 0.5)
    note over Client, AnalysisAiClient: Analysis flow
    Client->>AnalysisAiClient: analyze(jobPosting, questions)
    AnalysisAiClient->>CorpusRetrievalService: retrieveForAnalysis(jobPosting, questions)
  end

  rect rgba(144, 238, 144, 0.5)
    note over Client, JobPostingAiService: Mock generation flow
    Client->>JobPostingAiService: generateMockJobPosting(request, company)
    JobPostingAiService->>CorpusRetrievalService: retrieveForMockGeneration(company, detailClassification)
  end

  CorpusRetrievalService->>CorpusEmbeddingClient: embedQuery(queryText)
  CorpusEmbeddingClient-->>CorpusRetrievalService: float[] embedding
  CorpusRetrievalService->>pgvector: tier-1 query (company + detail)
  pgvector-->>CorpusRetrievalService: results or empty
  CorpusRetrievalService->>pgvector: tier-2 query (detail-only, if empty)
  pgvector-->>CorpusRetrievalService: results or empty
  CorpusRetrievalService->>pgvector: tier-3 query (hierarchy, if empty)
  pgvector-->>CorpusRetrievalService: RetrievalContext

  CorpusRetrievalService-->>AnalysisAiClient: RetrievalContext
  AnalysisAiClient->>AnalysisAiClient: buildPrompt with similar JD + question sections
  AnalysisAiClient-->>Client: AnalysisLlmResponse

  CorpusRetrievalService-->>JobPostingAiService: RetrievalContext
  JobPostingAiService->>JobPostingAiService: buildMockGenerationPrompt with reference texts
  JobPostingAiService-->>Client: MockJobPostingResponse
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~100 minutes

Possibly related PRs

  • JobDri-Developer/BackEnd#94: Directly related — introduced the mock job posting/question embedding tables and Cohere embedding provider that this PR extends with InputType parameterization and HNSW indexes.
  • JobDri-Developer/BackEnd#61: Directly related — modified AnalysisAiClient prompt construction and AnalysisLlmResponse handling that this PR further extends with retrieval context injection.

Suggested labels

♻️ refactor

🐇 A bunny hops through the vector space,
finding similar JDs at lightning pace.
Three tiers of search, cosine and HNSW,
the prompt grows richer with each retrieval queue.
RAG is here — the rabbit cheers! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title references feature #38 about mock job posting with data, which aligns with the main objective of refactoring mock job posting generation using pgvector corpus retrieval.
Description check ✅ Passed The description fully addresses the template with feature selection checked, detailed technical implementation items listed with checkmarks, issue number referenced, but some template checkboxes (local testing, branch verification, labels) remain unchecked.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/#38-mock-jobposting-with-data

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/main/resources/schema.sql (1)

8-17: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Add an explicit migration path for existing deployments.

Line 8 and Line 17 only affect fresh table creation. Existing environments with pre-existing embedding columns will not be converted to vector(1024), so this change may not actually take effect where it matters.

🛠️ Suggested migration (apply with data-shape validation/backfill plan)
+-- one-time migration for existing environments (after handling non-1024 rows)
+ALTER TABLE mock_job_posting_embeddings
+    ALTER COLUMN embedding TYPE vector(1024) USING embedding::vector(1024);
+
+ALTER TABLE mock_question_embeddings
+    ALTER COLUMN embedding TYPE vector(1024) USING embedding::vector(1024);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/main/resources/schema.sql` around lines 8 - 17, The schema.sql changes on
lines 8 and 17 only affect new table creation and will not migrate existing
embedding columns in deployed environments to the vector(1024) type. Create an
explicit database migration script that converts the existing embedding columns
in the mock_question_corpus and mock_question_embeddings tables from their
current type to vector(1024), including proper data validation and backup
considerations for existing deployments. This migration should be separate from
schema.sql and should include appropriate checks to handle both fresh
installations and upgrades from previous versions.
src/main/java/com/jobdri/jobdri_api/domain/jobposting/service/MockQuestionCacheService.java (1)

33-35: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Cache key is now incorrect for company-aware generation.

Lines [48]-[56] make generation company-dependent, but lines [33]-[35] and [40]-[42] still cache only by detailClassificationId + promptVersion. This can serve another company’s generated questions.

To fix, include companyId in the cache identity and repository lookup (entity + unique constraint/index + repository method + service call sites), and backfill/migrate existing cache rows accordingly.

Also applies to: 40-42, 48-56

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/main/java/com/jobdri/jobdri_api/domain/jobposting/service/MockQuestionCacheService.java`
around lines 33 - 35, The cache lookup in the
findByDetailClassification_IdAndPromptVersion repository method uses only
detailClassificationId and promptVersion as the cache key, but the question
generation logic is now company-dependent (lines 48-56), which means different
companies could receive cached questions generated for another company. To fix
this, add companyId as a parameter to the cache entity's unique constraint and
index, update the repository method signature to include companyId in the query
(changing findByDetailClassification_IdAndPromptVersion to accept and filter by
companyId as well), update all call sites (lines 33-35 and 40-42) to pass
request.companyId() to the repository method, and handle backfill or migration
of existing cache rows that lack company association to ensure data consistency.
🧹 Nitpick comments (4)
src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusImportService.java (1)

301-315: 💤 Low value

Locale-aware fallback parsing may behave unexpectedly.

DecimalFormat.getInstance() uses the default locale, which could interpret numbers differently based on server locale (e.g., German locale uses comma as decimal separator). Since commas are stripped earlier, this is less likely to cause issues, but if the server runs with an unexpected locale, parsing might fail silently or produce wrong values.

Consider using DecimalFormat.getInstance(Locale.US) for consistent behavior, or document the expected number format in the import specification.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusImportService.java`
around lines 301 - 315, The DecimalFormat.getInstance() call uses the default
system locale, which can cause inconsistent number parsing across different
server environments. In the NumberFormatException catch block where
DecimalFormat.getInstance().parse(normalized) is called, replace
DecimalFormat.getInstance() with DecimalFormat.getInstance(Locale.US) to ensure
consistent parsing behavior regardless of the server's locale settings. You will
need to import java.util.Locale if it is not already imported.
src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusAdminRunner.java (1)

32-50: 💤 Low value

Consider removing throws Exception declaration.

Since both import and sync operations are now wrapped in try-catch blocks that log and continue, the throws Exception on run(ApplicationArguments args) is no longer necessary. It could be removed for clarity.

The decision to log and continue rather than fail startup is reasonable for optional bootstrap operations.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusAdminRunner.java`
around lines 32 - 50, The run method in the CorpusAdminRunner class declares
throws Exception but this is no longer necessary since all exceptions are caught
and handled within try-catch blocks that log errors and allow execution to
continue. Remove the throws Exception declaration from the
run(ApplicationArguments args) method signature to accurately reflect that the
method does not propagate exceptions to its caller.
src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusEmbeddingSyncService.java (1)

60-65: 💤 Low value

Individual method @Transactional annotations are redundant.

syncJobPostingEmbeddings and syncQuestionEmbeddings are called from within syncAll, which is already @Transactional. The nested annotations don't create new transactions (default propagation is REQUIRED), so they're redundant when called internally. If these methods can be called independently, the annotations are appropriate; otherwise, they add noise.

Also applies to: 67-73, 75-81

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusEmbeddingSyncService.java`
around lines 60 - 65, The `@Transactional` annotations on syncJobPostingEmbeddings
(lines 67-73) and syncQuestionEmbeddings (lines 75-81) are redundant when these
methods are called from the already-transactional syncAll method (line 60),
since the default REQUIRED propagation reuses the parent transaction rather than
creating a new one. Evaluate whether these two methods need to be callable
independently outside of syncAll; if they are only called internally by syncAll
and nowhere else, remove their `@Transactional` annotations to reduce noise; if
they are called independently elsewhere in the codebase, keep the annotations to
maintain transactionality in those contexts.
src/main/java/com/jobdri/jobdri_api/domain/corpus/controller/CorpusAdminController.java (1)

61-85: 💤 Low value

Symlink resolution is not performed, which may be acceptable for this use case.

The path validation uses normalize() which handles .. segments but does not resolve symbolic links. If the file or any parent directory is a symlink, a malicious path like /allowed-root/symlink-to-outside/../secret could potentially escape. If high-security guarantees are required, consider using toRealPath() after verifying the file exists, or ensure the allowed-root itself cannot contain symlinks.

For an admin-only endpoint, the current implementation is likely sufficient.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/controller/CorpusAdminController.java`
around lines 61 - 85, The validateImportPath method uses normalize() which does
not resolve symbolic links, potentially allowing path traversal through symlink
exploitation. Enhance the path validation by using toRealPath() instead of or in
addition to normalize() to resolve symbolic links to their actual targets, and
verify that the resolved path starts with the allowed root path. Ensure the file
exists before calling toRealPath() to handle cases where the path does not yet
exist, or alternatively, add validation to ensure the allowed-root itself does
not contain symbolic links. This prevents malicious paths like symlinks to
outside directories from escaping the allowed import root in the
validateImportPath method.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@src/main/java/com/jobdri/jobdri_api/domain/analysis/service/AnalysisAiClient.java`:
- Around line 74-75: The formatJobPostingReferences and formatQuestionReferences
methods (and similar formatting methods used elsewhere in the prompt
construction) return full corpus text without size limits, which can cause
excessive token usage and OpenAI failures. Add truncation logic to cap the
maximum size of the text returned by these formatting methods before the
formatted strings are injected into prompts. Apply the same truncation approach
consistently across all reference formatting calls throughout the
AnalysisAiClient class to ensure no large corpus fields are passed untruncated
into any prompt templates.

In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/repository/MockQuestionCorpusRepository.java`:
- Around line 17-19: The two methods findAllByValidForEmbeddingTrueOrderByIdAsc
are missing the EmbeddingTextIsNotNull filtering condition, which allows records
with null embeddingText values to be returned and subsequently break embedding
batch requests. Reintroduce the null check for embeddingText by adding the
EmbeddingTextIsNotNull condition to both method signatures using Spring Data's
query method naming convention, so that only records with both validForEmbedding
set to true AND non-null embeddingText are returned. Apply this change to both
the paginated version (with Pageable parameter) and the non-paginated version.

In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusImportService.java`:
- Around line 268-274: The validateRequiredHeaders method throws
IllegalArgumentException for missing required headers, but this runtime
exception is not caught at the CorpusAdminController level and may propagate
unexpectedly to the global exception handler. Replace the
IllegalArgumentException with GeneralException using INVALID_PARAMETER status
code to ensure consistent error handling across the service, matching the
pattern used in other validation methods like
CorpusClassificationResolver.registerMapping.

In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusRetrievalService.java`:
- Around line 46-51: The method retrieveForMockGeneration builds a single
baseQuery string but then passes it to both findSimilarJobPostings and
findSimilarQuestions methods separately, causing each method to independently
embed the same query text and duplicate API calls. Refactor by computing the
embedding vector once from baseQuery before making the two method calls, then
overload findSimilarJobPostings and findSimilarQuestions to accept the
pre-computed float[] vector parameter in addition to the existing string-based
parameters. Update the calls on lines 49-50 to pass both the baseQuery string
(for backward compatibility or logging if needed) and the pre-computed embedding
vector to avoid duplicate embedding API calls.

In
`@src/main/java/com/jobdri/jobdri_api/domain/jobposting/service/JobPostingAiService.java`:
- Line 84: Corpus retrieval calls are executing before protected AI/fallback
flow, causing transient failures to abort requests instead of degrading
gracefully. At
src/main/java/com/jobdri/jobdri_api/domain/jobposting/service/JobPostingAiService.java#L84,
wrap the corpusRetrievalService.retrieveForMockGeneration call in try-catch
logic that defaults to an empty RetrievalContext on failure. At
src/main/java/com/jobdri/jobdri_api/domain/jobposting/service/JobPostingAiService.java#L113,
apply the same guard/default pattern for the recommended-question generation
retrieval call. At
src/main/java/com/jobdri/jobdri_api/domain/analysis/service/AnalysisAiClient.java#L35,
handle the retrieval failure locally within a try-catch block and continue with
empty references so analysis remains available even when retrieval fails.

---

Outside diff comments:
In
`@src/main/java/com/jobdri/jobdri_api/domain/jobposting/service/MockQuestionCacheService.java`:
- Around line 33-35: The cache lookup in the
findByDetailClassification_IdAndPromptVersion repository method uses only
detailClassificationId and promptVersion as the cache key, but the question
generation logic is now company-dependent (lines 48-56), which means different
companies could receive cached questions generated for another company. To fix
this, add companyId as a parameter to the cache entity's unique constraint and
index, update the repository method signature to include companyId in the query
(changing findByDetailClassification_IdAndPromptVersion to accept and filter by
companyId as well), update all call sites (lines 33-35 and 40-42) to pass
request.companyId() to the repository method, and handle backfill or migration
of existing cache rows that lack company association to ensure data consistency.

In `@src/main/resources/schema.sql`:
- Around line 8-17: The schema.sql changes on lines 8 and 17 only affect new
table creation and will not migrate existing embedding columns in deployed
environments to the vector(1024) type. Create an explicit database migration
script that converts the existing embedding columns in the mock_question_corpus
and mock_question_embeddings tables from their current type to vector(1024),
including proper data validation and backup considerations for existing
deployments. This migration should be separate from schema.sql and should
include appropriate checks to handle both fresh installations and upgrades from
previous versions.

---

Nitpick comments:
In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/controller/CorpusAdminController.java`:
- Around line 61-85: The validateImportPath method uses normalize() which does
not resolve symbolic links, potentially allowing path traversal through symlink
exploitation. Enhance the path validation by using toRealPath() instead of or in
addition to normalize() to resolve symbolic links to their actual targets, and
verify that the resolved path starts with the allowed root path. Ensure the file
exists before calling toRealPath() to handle cases where the path does not yet
exist, or alternatively, add validation to ensure the allowed-root itself does
not contain symbolic links. This prevents malicious paths like symlinks to
outside directories from escaping the allowed import root in the
validateImportPath method.

In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusAdminRunner.java`:
- Around line 32-50: The run method in the CorpusAdminRunner class declares
throws Exception but this is no longer necessary since all exceptions are caught
and handled within try-catch blocks that log errors and allow execution to
continue. Remove the throws Exception declaration from the
run(ApplicationArguments args) method signature to accurately reflect that the
method does not propagate exceptions to its caller.

In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusEmbeddingSyncService.java`:
- Around line 60-65: The `@Transactional` annotations on syncJobPostingEmbeddings
(lines 67-73) and syncQuestionEmbeddings (lines 75-81) are redundant when these
methods are called from the already-transactional syncAll method (line 60),
since the default REQUIRED propagation reuses the parent transaction rather than
creating a new one. Evaluate whether these two methods need to be callable
independently outside of syncAll; if they are only called internally by syncAll
and nowhere else, remove their `@Transactional` annotations to reduce noise; if
they are called independently elsewhere in the codebase, keep the annotations to
maintain transactionality in those contexts.

In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusImportService.java`:
- Around line 301-315: The DecimalFormat.getInstance() call uses the default
system locale, which can cause inconsistent number parsing across different
server environments. In the NumberFormatException catch block where
DecimalFormat.getInstance().parse(normalized) is called, replace
DecimalFormat.getInstance() with DecimalFormat.getInstance(Locale.US) to ensure
consistent parsing behavior regardless of the server's locale settings. You will
need to import java.util.Locale if it is not already imported.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 55a0ceac-1ad7-468d-b61d-036af4d37af2

📥 Commits

Reviewing files that changed from the base of the PR and between fe5cd60 and 31762f7.

📒 Files selected for processing (25)
  • src/main/java/com/jobdri/jobdri_api/domain/analysis/controller/AnalysisAdminController.java
  • src/main/java/com/jobdri/jobdri_api/domain/analysis/dto/request/AnalysisRetrievalPreviewRequest.java
  • src/main/java/com/jobdri/jobdri_api/domain/analysis/dto/response/AnalysisRetrievalPreviewResponse.java
  • src/main/java/com/jobdri/jobdri_api/domain/analysis/service/AnalysisAdminDebugService.java
  • src/main/java/com/jobdri/jobdri_api/domain/analysis/service/AnalysisAiClient.java
  • src/main/java/com/jobdri/jobdri_api/domain/classification/repository/DetailClassificationRepository.java
  • src/main/java/com/jobdri/jobdri_api/domain/corpus/controller/CorpusAdminController.java
  • src/main/java/com/jobdri/jobdri_api/domain/corpus/repository/MockJobPostingCorpusRepository.java
  • src/main/java/com/jobdri/jobdri_api/domain/corpus/repository/MockQuestionCorpusRepository.java
  • src/main/java/com/jobdri/jobdri_api/domain/corpus/service/BootstrapAdminService.java
  • src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CohereCorpusEmbeddingClient.java
  • src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusAdminRunner.java
  • src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusClassificationResolver.java
  • src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusEmbeddingClient.java
  • src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusEmbeddingSyncService.java
  • src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusImportService.java
  • src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusRetrievalService.java
  • src/main/java/com/jobdri/jobdri_api/domain/jobposting/service/JobPostingAiService.java
  • src/main/java/com/jobdri/jobdri_api/domain/jobposting/service/MockQuestionCacheService.java
  • src/main/resources/application-dev.yaml
  • src/main/resources/application-prod.yaml
  • src/main/resources/application.yaml
  • src/main/resources/schema.sql
  • src/test/java/com/jobdri/jobdri_api/domain/jobposting/service/JobPostingAiServiceTest.java
  • src/test/java/com/jobdri/jobdri_api/domain/jobposting/service/MockQuestionCacheServiceTest.java

validateMiddleClassification(request, detailClassification);

List<JobPosting> referencePostings = findMockReferencePostings(request, company);
RetrievalContext retrievalContext = corpusRetrievalService.retrieveForMockGeneration(company, detailClassification);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Shared resilience gap: retrieval calls run outside guarded execution paths. The same root cause appears in both services: corpus retrieval executes before the protected AI/fallback flow, so transient retrieval failures abort requests instead of degrading gracefully.

  • src/main/java/com/jobdri/jobdri_api/domain/jobposting/service/JobPostingAiService.java#L84-L84: wrap retrieval in guarded logic and default to empty RetrievalContext on failure.
  • src/main/java/com/jobdri/jobdri_api/domain/jobposting/service/JobPostingAiService.java#L113-L113: apply the same guard/default pattern for recommended-question generation.
  • src/main/java/com/jobdri/jobdri_api/domain/analysis/service/AnalysisAiClient.java#L35-L35: handle retrieval failures locally and continue with empty references so analysis remains available.
📍 Affects 2 files
  • src/main/java/com/jobdri/jobdri_api/domain/jobposting/service/JobPostingAiService.java#L84-L84 (this comment)
  • src/main/java/com/jobdri/jobdri_api/domain/jobposting/service/JobPostingAiService.java#L113-L113
  • src/main/java/com/jobdri/jobdri_api/domain/analysis/service/AnalysisAiClient.java#L35-L35
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@src/main/java/com/jobdri/jobdri_api/domain/jobposting/service/JobPostingAiService.java`
at line 84, Corpus retrieval calls are executing before protected AI/fallback
flow, causing transient failures to abort requests instead of degrading
gracefully. At
src/main/java/com/jobdri/jobdri_api/domain/jobposting/service/JobPostingAiService.java#L84,
wrap the corpusRetrievalService.retrieveForMockGeneration call in try-catch
logic that defaults to an empty RetrievalContext on failure. At
src/main/java/com/jobdri/jobdri_api/domain/jobposting/service/JobPostingAiService.java#L113,
apply the same guard/default pattern for the recommended-question generation
retrieval call. At
src/main/java/com/jobdri/jobdri_api/domain/analysis/service/AnalysisAiClient.java#L35,
handle the retrieval failure locally within a try-catch block and continue with
empty references so analysis remains available even when retrieval fails.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

✨ feat New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant