Feat/#38 mock jobposting with data#96
Conversation
|
Warning Review limit reached
More reviews will be available in 30 minutes and 40 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (13)
📝 WalkthroughWalkthroughIntroduces ChangesRAG Corpus Retrieval Integration
Sequence Diagram(s)sequenceDiagram
participant Client
participant AnalysisAiClient
participant JobPostingAiService
participant CorpusRetrievalService
participant CorpusEmbeddingClient
participant pgvector as pgvector DB
rect rgba(100, 149, 237, 0.5)
note over Client, AnalysisAiClient: Analysis flow
Client->>AnalysisAiClient: analyze(jobPosting, questions)
AnalysisAiClient->>CorpusRetrievalService: retrieveForAnalysis(jobPosting, questions)
end
rect rgba(144, 238, 144, 0.5)
note over Client, JobPostingAiService: Mock generation flow
Client->>JobPostingAiService: generateMockJobPosting(request, company)
JobPostingAiService->>CorpusRetrievalService: retrieveForMockGeneration(company, detailClassification)
end
CorpusRetrievalService->>CorpusEmbeddingClient: embedQuery(queryText)
CorpusEmbeddingClient-->>CorpusRetrievalService: float[] embedding
CorpusRetrievalService->>pgvector: tier-1 query (company + detail)
pgvector-->>CorpusRetrievalService: results or empty
CorpusRetrievalService->>pgvector: tier-2 query (detail-only, if empty)
pgvector-->>CorpusRetrievalService: results or empty
CorpusRetrievalService->>pgvector: tier-3 query (hierarchy, if empty)
pgvector-->>CorpusRetrievalService: RetrievalContext
CorpusRetrievalService-->>AnalysisAiClient: RetrievalContext
AnalysisAiClient->>AnalysisAiClient: buildPrompt with similar JD + question sections
AnalysisAiClient-->>Client: AnalysisLlmResponse
CorpusRetrievalService-->>JobPostingAiService: RetrievalContext
JobPostingAiService->>JobPostingAiService: buildMockGenerationPrompt with reference texts
JobPostingAiService-->>Client: MockJobPostingResponse
Estimated code review effort🎯 5 (Critical) | ⏱️ ~100 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/main/resources/schema.sql (1)
8-17:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftAdd an explicit migration path for existing deployments.
Line 8 and Line 17 only affect fresh table creation. Existing environments with pre-existing
embeddingcolumns will not be converted tovector(1024), so this change may not actually take effect where it matters.🛠️ Suggested migration (apply with data-shape validation/backfill plan)
+-- one-time migration for existing environments (after handling non-1024 rows) +ALTER TABLE mock_job_posting_embeddings + ALTER COLUMN embedding TYPE vector(1024) USING embedding::vector(1024); + +ALTER TABLE mock_question_embeddings + ALTER COLUMN embedding TYPE vector(1024) USING embedding::vector(1024);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/main/resources/schema.sql` around lines 8 - 17, The schema.sql changes on lines 8 and 17 only affect new table creation and will not migrate existing embedding columns in deployed environments to the vector(1024) type. Create an explicit database migration script that converts the existing embedding columns in the mock_question_corpus and mock_question_embeddings tables from their current type to vector(1024), including proper data validation and backup considerations for existing deployments. This migration should be separate from schema.sql and should include appropriate checks to handle both fresh installations and upgrades from previous versions.src/main/java/com/jobdri/jobdri_api/domain/jobposting/service/MockQuestionCacheService.java (1)
33-35:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftCache key is now incorrect for company-aware generation.
Lines [48]-[56] make generation company-dependent, but lines [33]-[35] and [40]-[42] still cache only by
detailClassificationId + promptVersion. This can serve another company’s generated questions.To fix, include
companyIdin the cache identity and repository lookup (entity + unique constraint/index + repository method + service call sites), and backfill/migrate existing cache rows accordingly.Also applies to: 40-42, 48-56
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/main/java/com/jobdri/jobdri_api/domain/jobposting/service/MockQuestionCacheService.java` around lines 33 - 35, The cache lookup in the findByDetailClassification_IdAndPromptVersion repository method uses only detailClassificationId and promptVersion as the cache key, but the question generation logic is now company-dependent (lines 48-56), which means different companies could receive cached questions generated for another company. To fix this, add companyId as a parameter to the cache entity's unique constraint and index, update the repository method signature to include companyId in the query (changing findByDetailClassification_IdAndPromptVersion to accept and filter by companyId as well), update all call sites (lines 33-35 and 40-42) to pass request.companyId() to the repository method, and handle backfill or migration of existing cache rows that lack company association to ensure data consistency.
🧹 Nitpick comments (4)
src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusImportService.java (1)
301-315: 💤 Low valueLocale-aware fallback parsing may behave unexpectedly.
DecimalFormat.getInstance()uses the default locale, which could interpret numbers differently based on server locale (e.g., German locale uses comma as decimal separator). Since commas are stripped earlier, this is less likely to cause issues, but if the server runs with an unexpected locale, parsing might fail silently or produce wrong values.Consider using
DecimalFormat.getInstance(Locale.US)for consistent behavior, or document the expected number format in the import specification.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusImportService.java` around lines 301 - 315, The DecimalFormat.getInstance() call uses the default system locale, which can cause inconsistent number parsing across different server environments. In the NumberFormatException catch block where DecimalFormat.getInstance().parse(normalized) is called, replace DecimalFormat.getInstance() with DecimalFormat.getInstance(Locale.US) to ensure consistent parsing behavior regardless of the server's locale settings. You will need to import java.util.Locale if it is not already imported.src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusAdminRunner.java (1)
32-50: 💤 Low valueConsider removing
throws Exceptiondeclaration.Since both import and sync operations are now wrapped in try-catch blocks that log and continue, the
throws Exceptiononrun(ApplicationArguments args)is no longer necessary. It could be removed for clarity.The decision to log and continue rather than fail startup is reasonable for optional bootstrap operations.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusAdminRunner.java` around lines 32 - 50, The run method in the CorpusAdminRunner class declares throws Exception but this is no longer necessary since all exceptions are caught and handled within try-catch blocks that log errors and allow execution to continue. Remove the throws Exception declaration from the run(ApplicationArguments args) method signature to accurately reflect that the method does not propagate exceptions to its caller.src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusEmbeddingSyncService.java (1)
60-65: 💤 Low valueIndividual method
@Transactionalannotations are redundant.
syncJobPostingEmbeddingsandsyncQuestionEmbeddingsare called from withinsyncAll, which is already@Transactional. The nested annotations don't create new transactions (default propagation is REQUIRED), so they're redundant when called internally. If these methods can be called independently, the annotations are appropriate; otherwise, they add noise.Also applies to: 67-73, 75-81
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusEmbeddingSyncService.java` around lines 60 - 65, The `@Transactional` annotations on syncJobPostingEmbeddings (lines 67-73) and syncQuestionEmbeddings (lines 75-81) are redundant when these methods are called from the already-transactional syncAll method (line 60), since the default REQUIRED propagation reuses the parent transaction rather than creating a new one. Evaluate whether these two methods need to be callable independently outside of syncAll; if they are only called internally by syncAll and nowhere else, remove their `@Transactional` annotations to reduce noise; if they are called independently elsewhere in the codebase, keep the annotations to maintain transactionality in those contexts.src/main/java/com/jobdri/jobdri_api/domain/corpus/controller/CorpusAdminController.java (1)
61-85: 💤 Low valueSymlink resolution is not performed, which may be acceptable for this use case.
The path validation uses
normalize()which handles..segments but does not resolve symbolic links. If the file or any parent directory is a symlink, a malicious path like/allowed-root/symlink-to-outside/../secretcould potentially escape. If high-security guarantees are required, consider usingtoRealPath()after verifying the file exists, or ensure the allowed-root itself cannot contain symlinks.For an admin-only endpoint, the current implementation is likely sufficient.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/main/java/com/jobdri/jobdri_api/domain/corpus/controller/CorpusAdminController.java` around lines 61 - 85, The validateImportPath method uses normalize() which does not resolve symbolic links, potentially allowing path traversal through symlink exploitation. Enhance the path validation by using toRealPath() instead of or in addition to normalize() to resolve symbolic links to their actual targets, and verify that the resolved path starts with the allowed root path. Ensure the file exists before calling toRealPath() to handle cases where the path does not yet exist, or alternatively, add validation to ensure the allowed-root itself does not contain symbolic links. This prevents malicious paths like symlinks to outside directories from escaping the allowed import root in the validateImportPath method.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@src/main/java/com/jobdri/jobdri_api/domain/analysis/service/AnalysisAiClient.java`:
- Around line 74-75: The formatJobPostingReferences and formatQuestionReferences
methods (and similar formatting methods used elsewhere in the prompt
construction) return full corpus text without size limits, which can cause
excessive token usage and OpenAI failures. Add truncation logic to cap the
maximum size of the text returned by these formatting methods before the
formatted strings are injected into prompts. Apply the same truncation approach
consistently across all reference formatting calls throughout the
AnalysisAiClient class to ensure no large corpus fields are passed untruncated
into any prompt templates.
In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/repository/MockQuestionCorpusRepository.java`:
- Around line 17-19: The two methods findAllByValidForEmbeddingTrueOrderByIdAsc
are missing the EmbeddingTextIsNotNull filtering condition, which allows records
with null embeddingText values to be returned and subsequently break embedding
batch requests. Reintroduce the null check for embeddingText by adding the
EmbeddingTextIsNotNull condition to both method signatures using Spring Data's
query method naming convention, so that only records with both validForEmbedding
set to true AND non-null embeddingText are returned. Apply this change to both
the paginated version (with Pageable parameter) and the non-paginated version.
In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusImportService.java`:
- Around line 268-274: The validateRequiredHeaders method throws
IllegalArgumentException for missing required headers, but this runtime
exception is not caught at the CorpusAdminController level and may propagate
unexpectedly to the global exception handler. Replace the
IllegalArgumentException with GeneralException using INVALID_PARAMETER status
code to ensure consistent error handling across the service, matching the
pattern used in other validation methods like
CorpusClassificationResolver.registerMapping.
In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusRetrievalService.java`:
- Around line 46-51: The method retrieveForMockGeneration builds a single
baseQuery string but then passes it to both findSimilarJobPostings and
findSimilarQuestions methods separately, causing each method to independently
embed the same query text and duplicate API calls. Refactor by computing the
embedding vector once from baseQuery before making the two method calls, then
overload findSimilarJobPostings and findSimilarQuestions to accept the
pre-computed float[] vector parameter in addition to the existing string-based
parameters. Update the calls on lines 49-50 to pass both the baseQuery string
(for backward compatibility or logging if needed) and the pre-computed embedding
vector to avoid duplicate embedding API calls.
In
`@src/main/java/com/jobdri/jobdri_api/domain/jobposting/service/JobPostingAiService.java`:
- Line 84: Corpus retrieval calls are executing before protected AI/fallback
flow, causing transient failures to abort requests instead of degrading
gracefully. At
src/main/java/com/jobdri/jobdri_api/domain/jobposting/service/JobPostingAiService.java#L84,
wrap the corpusRetrievalService.retrieveForMockGeneration call in try-catch
logic that defaults to an empty RetrievalContext on failure. At
src/main/java/com/jobdri/jobdri_api/domain/jobposting/service/JobPostingAiService.java#L113,
apply the same guard/default pattern for the recommended-question generation
retrieval call. At
src/main/java/com/jobdri/jobdri_api/domain/analysis/service/AnalysisAiClient.java#L35,
handle the retrieval failure locally within a try-catch block and continue with
empty references so analysis remains available even when retrieval fails.
---
Outside diff comments:
In
`@src/main/java/com/jobdri/jobdri_api/domain/jobposting/service/MockQuestionCacheService.java`:
- Around line 33-35: The cache lookup in the
findByDetailClassification_IdAndPromptVersion repository method uses only
detailClassificationId and promptVersion as the cache key, but the question
generation logic is now company-dependent (lines 48-56), which means different
companies could receive cached questions generated for another company. To fix
this, add companyId as a parameter to the cache entity's unique constraint and
index, update the repository method signature to include companyId in the query
(changing findByDetailClassification_IdAndPromptVersion to accept and filter by
companyId as well), update all call sites (lines 33-35 and 40-42) to pass
request.companyId() to the repository method, and handle backfill or migration
of existing cache rows that lack company association to ensure data consistency.
In `@src/main/resources/schema.sql`:
- Around line 8-17: The schema.sql changes on lines 8 and 17 only affect new
table creation and will not migrate existing embedding columns in deployed
environments to the vector(1024) type. Create an explicit database migration
script that converts the existing embedding columns in the mock_question_corpus
and mock_question_embeddings tables from their current type to vector(1024),
including proper data validation and backup considerations for existing
deployments. This migration should be separate from schema.sql and should
include appropriate checks to handle both fresh installations and upgrades from
previous versions.
---
Nitpick comments:
In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/controller/CorpusAdminController.java`:
- Around line 61-85: The validateImportPath method uses normalize() which does
not resolve symbolic links, potentially allowing path traversal through symlink
exploitation. Enhance the path validation by using toRealPath() instead of or in
addition to normalize() to resolve symbolic links to their actual targets, and
verify that the resolved path starts with the allowed root path. Ensure the file
exists before calling toRealPath() to handle cases where the path does not yet
exist, or alternatively, add validation to ensure the allowed-root itself does
not contain symbolic links. This prevents malicious paths like symlinks to
outside directories from escaping the allowed import root in the
validateImportPath method.
In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusAdminRunner.java`:
- Around line 32-50: The run method in the CorpusAdminRunner class declares
throws Exception but this is no longer necessary since all exceptions are caught
and handled within try-catch blocks that log errors and allow execution to
continue. Remove the throws Exception declaration from the
run(ApplicationArguments args) method signature to accurately reflect that the
method does not propagate exceptions to its caller.
In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusEmbeddingSyncService.java`:
- Around line 60-65: The `@Transactional` annotations on syncJobPostingEmbeddings
(lines 67-73) and syncQuestionEmbeddings (lines 75-81) are redundant when these
methods are called from the already-transactional syncAll method (line 60),
since the default REQUIRED propagation reuses the parent transaction rather than
creating a new one. Evaluate whether these two methods need to be callable
independently outside of syncAll; if they are only called internally by syncAll
and nowhere else, remove their `@Transactional` annotations to reduce noise; if
they are called independently elsewhere in the codebase, keep the annotations to
maintain transactionality in those contexts.
In
`@src/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusImportService.java`:
- Around line 301-315: The DecimalFormat.getInstance() call uses the default
system locale, which can cause inconsistent number parsing across different
server environments. In the NumberFormatException catch block where
DecimalFormat.getInstance().parse(normalized) is called, replace
DecimalFormat.getInstance() with DecimalFormat.getInstance(Locale.US) to ensure
consistent parsing behavior regardless of the server's locale settings. You will
need to import java.util.Locale if it is not already imported.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 55a0ceac-1ad7-468d-b61d-036af4d37af2
📒 Files selected for processing (25)
src/main/java/com/jobdri/jobdri_api/domain/analysis/controller/AnalysisAdminController.javasrc/main/java/com/jobdri/jobdri_api/domain/analysis/dto/request/AnalysisRetrievalPreviewRequest.javasrc/main/java/com/jobdri/jobdri_api/domain/analysis/dto/response/AnalysisRetrievalPreviewResponse.javasrc/main/java/com/jobdri/jobdri_api/domain/analysis/service/AnalysisAdminDebugService.javasrc/main/java/com/jobdri/jobdri_api/domain/analysis/service/AnalysisAiClient.javasrc/main/java/com/jobdri/jobdri_api/domain/classification/repository/DetailClassificationRepository.javasrc/main/java/com/jobdri/jobdri_api/domain/corpus/controller/CorpusAdminController.javasrc/main/java/com/jobdri/jobdri_api/domain/corpus/repository/MockJobPostingCorpusRepository.javasrc/main/java/com/jobdri/jobdri_api/domain/corpus/repository/MockQuestionCorpusRepository.javasrc/main/java/com/jobdri/jobdri_api/domain/corpus/service/BootstrapAdminService.javasrc/main/java/com/jobdri/jobdri_api/domain/corpus/service/CohereCorpusEmbeddingClient.javasrc/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusAdminRunner.javasrc/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusClassificationResolver.javasrc/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusEmbeddingClient.javasrc/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusEmbeddingSyncService.javasrc/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusImportService.javasrc/main/java/com/jobdri/jobdri_api/domain/corpus/service/CorpusRetrievalService.javasrc/main/java/com/jobdri/jobdri_api/domain/jobposting/service/JobPostingAiService.javasrc/main/java/com/jobdri/jobdri_api/domain/jobposting/service/MockQuestionCacheService.javasrc/main/resources/application-dev.yamlsrc/main/resources/application-prod.yamlsrc/main/resources/application.yamlsrc/main/resources/schema.sqlsrc/test/java/com/jobdri/jobdri_api/domain/jobposting/service/JobPostingAiServiceTest.javasrc/test/java/com/jobdri/jobdri_api/domain/jobposting/service/MockQuestionCacheServiceTest.java
| validateMiddleClassification(request, detailClassification); | ||
|
|
||
| List<JobPosting> referencePostings = findMockReferencePostings(request, company); | ||
| RetrievalContext retrievalContext = corpusRetrievalService.retrieveForMockGeneration(company, detailClassification); |
There was a problem hiding this comment.
Shared resilience gap: retrieval calls run outside guarded execution paths. The same root cause appears in both services: corpus retrieval executes before the protected AI/fallback flow, so transient retrieval failures abort requests instead of degrading gracefully.
src/main/java/com/jobdri/jobdri_api/domain/jobposting/service/JobPostingAiService.java#L84-L84: wrap retrieval in guarded logic and default to emptyRetrievalContexton failure.src/main/java/com/jobdri/jobdri_api/domain/jobposting/service/JobPostingAiService.java#L113-L113: apply the same guard/default pattern for recommended-question generation.src/main/java/com/jobdri/jobdri_api/domain/analysis/service/AnalysisAiClient.java#L35-L35: handle retrieval failures locally and continue with empty references so analysis remains available.
📍 Affects 2 files
src/main/java/com/jobdri/jobdri_api/domain/jobposting/service/JobPostingAiService.java#L84-L84(this comment)src/main/java/com/jobdri/jobdri_api/domain/jobposting/service/JobPostingAiService.java#L113-L113src/main/java/com/jobdri/jobdri_api/domain/analysis/service/AnalysisAiClient.java#L35-L35
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@src/main/java/com/jobdri/jobdri_api/domain/jobposting/service/JobPostingAiService.java`
at line 84, Corpus retrieval calls are executing before protected AI/fallback
flow, causing transient failures to abort requests instead of degrading
gracefully. At
src/main/java/com/jobdri/jobdri_api/domain/jobposting/service/JobPostingAiService.java#L84,
wrap the corpusRetrievalService.retrieveForMockGeneration call in try-catch
logic that defaults to an empty RetrievalContext on failure. At
src/main/java/com/jobdri/jobdri_api/domain/jobposting/service/JobPostingAiService.java#L113,
apply the same guard/default pattern for the recommended-question generation
retrieval call. At
src/main/java/com/jobdri/jobdri_api/domain/analysis/service/AnalysisAiClient.java#L35,
handle the retrieval failure locally within a try-catch block and continue with
empty references so analysis remains available even when retrieval fails.
✨ 어떤 이유로 PR를 하셨나요?
📋 세부 내용 - 왜 해당 PR이 필요한지 작업 내용을 자세하게 설명해주세요
CorpusRetrievalService로 분리했습니다.jobPosting/선택 직무는 query 생성 기준으로만 사용하도록 정리했습니다.회사명 + 직무(detailClassification)직무(detailClassification)대분류 + 중분류(job_group_l1 + job_family_l2)job_postings참조 대신 pgvector corpus retrieval 결과를 참고하도록 변경했습니다.📸 작업 화면 스크린샷
🚨 관련 이슈 번호 [#38]
Summary by CodeRabbit
New Features
Bug Fixes
Refactor