Refactor/infra separation#58
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThe PR refactors core and VM state management by introducing a Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
service/vm.go (2)
38-183:⚠️ Potential issue | 🟠 MajorTOCTOU on core selection can oversubscribe a core and underflow
FreeMemory/FreeCPU/FreeDisk(uint32).The suitability check at lines 40‑77 runs under
RLock(). The lock is released at line 79, then CMS and Guacamole network calls happen (lines 140‑160), and only at line 174 isLock()re‑acquired to decrementselectedCore.Free*. Nothing revalidates thatselectedCorestill has capacity at that point.If two
CreateVMgoroutines both pick the same core underRLock(each seeing enough headroom for its own request, but not for both combined), both will reach lines 179‑181 and subtract, pushingFree*below zero. BecauseFreeMemory,FreeCPU,FreeDiskareuint32, the subtraction wraps around to ~4 G, after which the core effectively "has infinite free resources" and will be chosen by every subsequent request.Recommended fix: do select‑and‑reserve atomically under a single write lock, validate capacity again under that lock, and only then release to perform network calls. On any later failure,
cleanup()already restores the counters — that part is fine.🔒 Sketch: reserve under a single write lock with re‑validation
- // Cores/FreeMemory 등 공유 상태를 읽으므로 RLock (동시 읽기는 허용, 쓰기만 차단) - contextStruct.Resources.RLock() - for i := range contextStruct.Resources.Cores { - core := &contextStruct.Resources.Cores[i] + // Reserve-on-select: take the write lock, pick a core, and deduct Free* atomically + // so two concurrent CreateVM calls can't double-book the same capacity. + contextStruct.Resources.Lock() + for i := range contextStruct.Resources.Cores { + core := &contextStruct.Resources.Cores[i] ... if memoryOk && cpuOk && diskOk { selectedCore = core selectedCoreIndex = i - log.DebugInfo("core found: %s", selectedCore.IP) break } } - contextStruct.Resources.RUnlock() + if selectedCore != nil { + selectedCore.FreeMemory -= req.HardwareInfo.Memory + selectedCore.FreeCPU -= req.HardwareInfo.CPU + selectedCore.FreeDisk -= req.HardwareInfo.Disk + coreResourcesAllocated = true + } + contextStruct.Resources.Unlock()Then lines 174‑183 only need to set
selectedCore.VMInfoIdx[uuid] = newVM(still underLock()), since the counters were already reserved up front.cleanup()already refunds them on failure.Separately, the "no suitable core" logging path at lines 82‑96 reads
Resources.CoresafterRUnlock()with no lock held; under the reserve‑on‑select approach above, that diagnostic can stay inside the held write lock (or snapshot what it needs before releasing).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@service/vm.go` around lines 38 - 183, The current selection under contextStruct.Resources.RLock then releasing before network calls causes a TOCTOU: move the selection-and-reservation into a single write-critical section by using contextStruct.Resources.Lock() when choosing selectedCore so you revalidate memory/cpu/disk (FreeMemory/FreeCPU/FreeDisk) and immediately decrement/reserve those counters and set selectedCore.VMInfoIdx[uuid] before releasing the lock; keep the long CMS/Guacamole calls (cmsClient.AddCmsSubnet/NewCmsSubnet, guacamole.Configure) outside the lock, and ensure cleanup() still refunds counters on failure; also capture any diagnostic info about alive cores while the lock is held (instead of reading Resources.Cores after RUnlock) so logs remain consistent.
219-230:⚠️ Potential issue | 🟠 Major
VMRepo.AddInstancefailure path logs but does not return.On DB insert failure,
cleanup()runs but execution continues to thenewSubnetAllocatedbranch, the Redis/HTTP side effects stay in place, and we then reach theLock()‑guarded block at 234‑237 that adds the VM toVMLocationandAliveVM— producing an in‑memory entry for a VM that is not persisted in the DB. A subsequentFindCoreByVmUUID(which now hits the DB viaVMRepo.GetInstanceLocation) will not find it, soDeleteVM/StartVMwill appear broken for this UUID.Propagate the error and return after cleanup:
err = contextStruct.VMRepo.AddInstance(newVM, selectedCoreIndex) if err != nil { log.Error("Error database instance insertion failed: %v", err, true) cleanup() // 직접 지우지 말고 요 함수 하나로-- + return fmt.Errorf("CreateVM: failed to persist instance: %w", err) }The same pattern applies to the
UPDATE subnet ...failure at line 228 — currently only logged.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@service/vm.go` around lines 219 - 230, On failure of VMRepo.AddInstance(newVM, selectedCoreIndex) call, call cleanup() as already done but then propagate the error back immediately (return the error) so execution does not continue into the newSubnetAllocated branch or the Lock()-guarded block that inserts into VMLocation and AliveVM; similarly, if contextStruct.DB.Exec("UPDATE subnet ...") fails when newSubnetAllocated is true, perform cleanup (if not already), log the error, and return the error instead of proceeding. Locate the VMRepo.AddInstance call, the cleanup() invocation, the newSubnetAllocated conditional that runs DB Exec, and the Lock()-guarded code that updates VMLocation and AliveVM and ensure both failure paths return after cleanup so in-memory state is not added for a non-persistent VM.startup/init.go (1)
189-241:⚠️ Potential issue | 🟠 MajorGoroutines capture a pointer into
Resources.Coresthat a subsequentappendcan invalidate, and they mutate core state without a lock.Two related problems in this loop:
- Line 196
append(infra.Resources.Cores, newCore)may reallocate the backing array. Line 197 then takes&infra.Resources.Cores[len-1]and line 203 captures it ascurrentCoreinside the goroutine closure. The next iteration'sappendcan re‑allocate again, at which pointcurrentCorein already‑started goroutines points at the old, detached backing array — all writes on lines 209, 214, 231‑237 are lost, andfindCoreon subsequent iterations may also return a stale pointer.- Even without reallocation, concurrent goroutines write
currentCore.IsAlive,CoreInfoIdx.*,FreeMemory,FreeCPU,FreeDiskwithout holdingResources.Lock(), whileservice/vm.goreads these fields underResources.RLock()— technically a data race, though it only races with anything that touchesResourcesduring startup.Safer pattern: finish all
appends first, then spawn goroutines that reference cores by index (&infra.Resources.Cores[idx]), and/or have each goroutine return its collected info and apply it afterg.Wait()underResources.Lock().🔒 Sketch of a fix (index-based + post-wait apply)
type coreUpdate struct { idx int mem, disk client.MemoryOrDiskResp // whatever your types are alive bool err error } // Phase 1: build the slice synchronously. type pending struct{ idx int } var pendings []pending for _, coreAddress := range coreAddresses { // ... parse ip/port ... if c := findCore(infra.Resources.Cores, ip, uint16(port)); c == nil { infra.Resources.Cores = append(infra.Resources.Cores, structure.Core{IP: ip, Port: uint16(port), IsAlive: true}) pendings = append(pendings, pending{idx: len(infra.Resources.Cores) - 1}) } else { c.IsAlive = true // compute idx for pendings } } // Phase 2: fetch concurrently, apply afterwards. updates := make([]coreUpdate, len(pendings)) g, ctx := errgroup.WithContext(context.Background()) for i, p := range pendings { i, p := i, p g.Go(func() error { // call core APIs, fill updates[i] return nil }) } if err := g.Wait(); err != nil { /* ... */ } infra.Resources.Lock() for i, u := range updates { c := &infra.Resources.Cores[pendings[i].idx] // apply fields _ = u; _ = c } infra.Resources.Unlock()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@startup/init.go` around lines 189 - 241, The goroutines capture pointers into infra.Resources.Cores that can be invalidated by append and also mutate core fields without holding Resources.Lock(), causing lost updates and data races; fix by separating the phases: first ensure all cores are appended/located synchronously (use findCore and append to infra.Resources.Cores), record each core's index (not its address) into a pendings slice, then launch g.Go workers that fetch data (via coreClient.GetCoreMachineMemoryInfo / GetCoreMachineDiskInfo) but return results (or populate an updates slice keyed by index) instead of mutating the core directly; after g.Wait(), acquire infra.Resources.Lock() and apply updates to &infra.Resources.Cores[idx] (set IsAlive, CoreInfoIdx, FreeMemory/FreeDisk/FreeCPU) to avoid pointer invalidation and races.
🧹 Nitpick comments (3)
tests/docker-compose.test.yml (1)
43-43: Avoid masking startup crashes in the integration target.
restart: on-failure:3can letcontrol-testpass readiness after earlier crash-on-start failures. For a blackbox regression test, prefer failing fast, or assert the container restart count is zero at the end of the test run.♻️ Proposed fail-fast option
- restart: on-failure:3🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/docker-compose.test.yml` at line 43, The restart policy "restart: on-failure:3" masks startup crashes for the integration target (control-test); change the service definition to fail fast by removing the restart line or setting it to "restart: \"no\"" for the control-test service, or alternatively add a post-run assertion that the container restart count is zero; locate the entry containing the literal "restart: on-failure:3" and replace it with the fail-fast option or add the restart-count assertion in the integration test teardown.structure/repository.go (1)
5-12: Optional: consider returning a struct fromGetAllInstanceInfoinstead of parallel slices.
([]VMInfo, []int, error)requires the caller to keep the two slices in lockstep by index (seestartup/init.golines 252‑253). A small value type, e.g.type InstanceLocation struct { Info VMInfo; CoreIdx int }returning([]InstanceLocation, error), removes the implicit length‑matching invariant and is slightly less error‑prone.Also worth documenting the expected error semantics for
GetInstance/GetInstanceLocationwhen the UUID is not found (sentinel error likeErrNotFound?), since callers such asFindCoreByVmUUIDcurrently just treat any error as "not found" and log it as an error — which can be noisy for legitimate misses.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@structure/repository.go` around lines 5 - 12, GetAllInstanceInfo currently returns parallel slices ([]VMInfo, []int, error) which forces callers to keep them in lockstep; change the signature of VMRepository.GetAllInstanceInfo to return a single slice of a small struct (e.g., type InstanceLocation struct { Info VMInfo; CoreIdx int }) so callers consume []InstanceLocation, update all implementations and call sites (e.g., FindCoreByVmUUID / startup init usage) accordingly, and document/update the behaviour of GetInstance and GetInstanceLocation to return a clear sentinel error (e.g., ErrNotFound) for missing UUIDs so callers can distinguish “not found” from real errors and avoid noisy logging.structure/resource_manager.go (1)
7-23: Consider encapsulating access instead of leakingLock/Unlockand raw fields.Exposing both the mutex (via
Lock/Unlock/RLock/RUnlock) and the underlying fields (Cores,AliveVM,VMLocation) means every call site must remember to acquire the correct lock before reading/writing. This is already visible inmain.go(unlocked read ofResources.Cores) and in theservice/vm.go“no suitable core” logging path, wherelen(contextStruct.Resources.Cores)/ iteration happens afterRUnlock().A safer shape long‑term: keep
muprivate, keep fields unexported, and provide small helpers likeWithRead(func(...)),AddCore,RegisterVM(uuid, core, vm),UnregisterVM(uuid),AllocateOnCore(coreIdx, mem, cpu, disk) error, etc., so that locking is guaranteed at the boundary and underflow/oversubscription onFree*can be enforced in one place.Not a blocker for this PR given the stated "follow‑up refactoring", but worth tracking.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@structure/resource_manager.go` around lines 7 - 23, The ResourceManager currently exposes Lock/Unlock/RLock/RUnlock and exported fields Cores, AliveVM, VMLocation which leaks synchronization responsibility to callers; change the API to encapsulate state: make the fields unexported (cores, aliveVM, vmLocation), remove the Lock/Unlock/RLock/RUnlock helper methods from the public API, and add small, locked helper methods on ResourceManager such as WithRead(fn func(r *ResourceManager)), WithWrite(fn func(r *ResourceManager)), AddCore(core Core), RegisterVM(id UUID, core *Core, vm *VMInfo), UnregisterVM(id UUID), AllocateOnCore(coreIdx int, mem, cpu, disk int) error and corresponding deallocate/unregister helpers so all access to Cores/AliveVM/VMLocation goes through these methods and locking is done internally within NewResourceManager/these methods.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@startup/init.go`:
- Around line 159-164: The loop over infra.Resources.Cores is a no-op because
structure.NewResourceManager() initializes Cores as nil; remove this loop and
instead ensure defaults are set when cores are actually created: set IsAlive =
false on the Core object at construction time (where new cores are appended to
infra.Resources.Cores) and populate infra.Resources.VMLocation from the Core's
VMInfoIdx at that same core-registration path so VMLocation seeding and IsAlive
initialization happen when new cores are added (referencing
infra.Resources.Cores, infra.Resources.VMLocation, VMInfoIdx, IsAlive, and
structure.NewResourceManager).
In `@structure/control_infra.go`:
- Around line 18-31: FindCoreByVmUUID currently does a DB lookup on every call;
change it to first consult the in-memory VMLocation map on c.Resources
(ResourceManager.VMLocation) and only call c.VMRepo.GetInstanceLocation(uuid) on
a miss, then update VMLocation with the returned core index so future calls are
O(1). Keep the bounds checks for coreIdx and handle the repo error path as
before; when returning &c.Resources.Cores[coreIdx] take Resources.RLock() (and
unlock) around reading the slice/header to avoid a race and document the
startup-only-growth invariant on ResourceManager.Cores.
In `@structure/mysql_vm_repository.go`:
- Around line 161-183: After calling tx.QueryContext(...) you need to
immediately defer rows.Close() and after the rows.Next() loop check if
rows.Err() is non-nil; if it is, return the error instead of committing. Update
the function that builds VMInfoList/coreIdxList to add defer rows.Close() right
after the QueryContext call, check err := rows.Err() after the loop and return
nil, nil, err when non-nil, and only call tx.Commit() when iteration succeeded.
- Around line 20-30: Create the context with its timeout before starting a
transaction and use DB.BeginTx(ctx, nil) instead of DB.Begin() in
MySQLVMRepository.AddInstance (and likewise in UpdateInstance, DeleteInstance,
GetInstance, GetInstanceLocation, GetAllInstanceInfo), and replace the bare
defer tx.Rollback() with an explicit deferred rollback that checks and logs
errors (ignoring sql.ErrTxDone) so rollback failures aren't silently dropped;
ensure each method uses its existing timeout duration when creating the context.
- Around line 65-70: UpdateInstance currently executes tx.ExecContext without
including guac_pass, so VMInfo.GuacPassword updates are ignored; either persist
guac_pass or explicitly document immutability. Fix by adding guac_pass to the
UPDATE SQL and including instanceInfo.GuacPassword in the tx.ExecContext
parameter list within UpdateInstance (the same pattern used in
AddInstance/GetInstance/GetAllInstanceInfo), or alternatively update the
function/method comment and signature to indicate GuacPassword is immutable so
callers know changes are ignored.
In `@tests/docker-compose.test.yml`:
- Around line 13-14: The healthcheck window (start_period, retries, interval)
currently allows up to ~140s for MySQL to pass while tests/blackbox_io_test.sh
uses MAX_WAIT=90s; either shorten the docker-compose healthcheck values (e.g.,
reduce start_period and/or retries on the service healthcheck keys:
start_period, retries, interval) so total wait <= MAX_WAIT, or increase the
MAX_WAIT constant used by tests/blackbox_io_test.sh to match the healthcheck
window; update the healthcheck fields or the MAX_WAIT variable accordingly to
keep both timeouts aligned.
- Line 10: Update the MySQL healthcheck command that currently uses the test
array invoking mysql (the line starting with test: ["CMD", "mysql", "-uroot",
"-ptestpass", "-h", "127.0.0.1", "-P", "3306"]) to include an explicit query
probe by adding the -e "SELECT 1" argument so the healthcheck both authenticates
and executes a trivial SQL statement to confirm readiness.
---
Outside diff comments:
In `@service/vm.go`:
- Around line 38-183: The current selection under contextStruct.Resources.RLock
then releasing before network calls causes a TOCTOU: move the
selection-and-reservation into a single write-critical section by using
contextStruct.Resources.Lock() when choosing selectedCore so you revalidate
memory/cpu/disk (FreeMemory/FreeCPU/FreeDisk) and immediately decrement/reserve
those counters and set selectedCore.VMInfoIdx[uuid] before releasing the lock;
keep the long CMS/Guacamole calls (cmsClient.AddCmsSubnet/NewCmsSubnet,
guacamole.Configure) outside the lock, and ensure cleanup() still refunds
counters on failure; also capture any diagnostic info about alive cores while
the lock is held (instead of reading Resources.Cores after RUnlock) so logs
remain consistent.
- Around line 219-230: On failure of VMRepo.AddInstance(newVM,
selectedCoreIndex) call, call cleanup() as already done but then propagate the
error back immediately (return the error) so execution does not continue into
the newSubnetAllocated branch or the Lock()-guarded block that inserts into
VMLocation and AliveVM; similarly, if contextStruct.DB.Exec("UPDATE subnet ...")
fails when newSubnetAllocated is true, perform cleanup (if not already), log the
error, and return the error instead of proceeding. Locate the VMRepo.AddInstance
call, the cleanup() invocation, the newSubnetAllocated conditional that runs DB
Exec, and the Lock()-guarded code that updates VMLocation and AliveVM and ensure
both failure paths return after cleanup so in-memory state is not added for a
non-persistent VM.
In `@startup/init.go`:
- Around line 189-241: The goroutines capture pointers into
infra.Resources.Cores that can be invalidated by append and also mutate core
fields without holding Resources.Lock(), causing lost updates and data races;
fix by separating the phases: first ensure all cores are appended/located
synchronously (use findCore and append to infra.Resources.Cores), record each
core's index (not its address) into a pendings slice, then launch g.Go workers
that fetch data (via coreClient.GetCoreMachineMemoryInfo /
GetCoreMachineDiskInfo) but return results (or populate an updates slice keyed
by index) instead of mutating the core directly; after g.Wait(), acquire
infra.Resources.Lock() and apply updates to &infra.Resources.Cores[idx] (set
IsAlive, CoreInfoIdx, FreeMemory/FreeDisk/FreeCPU) to avoid pointer invalidation
and races.
---
Nitpick comments:
In `@structure/repository.go`:
- Around line 5-12: GetAllInstanceInfo currently returns parallel slices
([]VMInfo, []int, error) which forces callers to keep them in lockstep; change
the signature of VMRepository.GetAllInstanceInfo to return a single slice of a
small struct (e.g., type InstanceLocation struct { Info VMInfo; CoreIdx int })
so callers consume []InstanceLocation, update all implementations and call sites
(e.g., FindCoreByVmUUID / startup init usage) accordingly, and document/update
the behaviour of GetInstance and GetInstanceLocation to return a clear sentinel
error (e.g., ErrNotFound) for missing UUIDs so callers can distinguish “not
found” from real errors and avoid noisy logging.
In `@structure/resource_manager.go`:
- Around line 7-23: The ResourceManager currently exposes
Lock/Unlock/RLock/RUnlock and exported fields Cores, AliveVM, VMLocation which
leaks synchronization responsibility to callers; change the API to encapsulate
state: make the fields unexported (cores, aliveVM, vmLocation), remove the
Lock/Unlock/RLock/RUnlock helper methods from the public API, and add small,
locked helper methods on ResourceManager such as WithRead(fn func(r
*ResourceManager)), WithWrite(fn func(r *ResourceManager)), AddCore(core Core),
RegisterVM(id UUID, core *Core, vm *VMInfo), UnregisterVM(id UUID),
AllocateOnCore(coreIdx int, mem, cpu, disk int) error and corresponding
deallocate/unregister helpers so all access to Cores/AliveVM/VMLocation goes
through these methods and locking is done internally within
NewResourceManager/these methods.
In `@tests/docker-compose.test.yml`:
- Line 43: The restart policy "restart: on-failure:3" masks startup crashes for
the integration target (control-test); change the service definition to fail
fast by removing the restart line or setting it to "restart: \"no\"" for the
control-test service, or alternatively add a post-run assertion that the
container restart count is zero; locate the entry containing the literal
"restart: on-failure:3" and replace it with the fail-fast option or add the
restart-count assertion in the integration test teardown.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 12f79c2f-f147-410b-be74-116449c11eaa
📒 Files selected for processing (10)
main.goservice/guacamole.goservice/network.goservice/vm.gostartup/init.gostructure/control_infra.gostructure/mysql_vm_repository.gostructure/repository.gostructure/resource_manager.gotests/docker-compose.test.yml
| for i := range infra.Resources.Cores { | ||
| for vmUUID := range infra.Resources.Cores[i].VMInfoIdx { | ||
| infra.Resources.VMLocation[vmUUID] = &infra.Resources.Cores[i] | ||
| } | ||
| infra.Cores[i].IsAlive = false | ||
| infra.Resources.Cores[i].IsAlive = false | ||
| } |
There was a problem hiding this comment.
Dead loop — infra.Resources.Cores is empty at this point.
infra.Resources = structure.NewResourceManager() (line 150) constructs a ResourceManager with a nil Cores slice. The for i := range infra.Resources.Cores loop therefore never executes, so neither the VMLocation seeding nor the IsAlive = false reset does anything. The actual core registration happens below at lines 189‑201.
Either delete this block or move the IsAlive = false default into the new‑core construction path if it was intended as a guard.
🧹 Proposed cleanup
- // 모든 Core 정의
- for i := range infra.Resources.Cores {
- for vmUUID := range infra.Resources.Cores[i].VMInfoIdx {
- infra.Resources.VMLocation[vmUUID] = &infra.Resources.Cores[i]
- }
- infra.Resources.Cores[i].IsAlive = false
- }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for i := range infra.Resources.Cores { | |
| for vmUUID := range infra.Resources.Cores[i].VMInfoIdx { | |
| infra.Resources.VMLocation[vmUUID] = &infra.Resources.Cores[i] | |
| } | |
| infra.Cores[i].IsAlive = false | |
| infra.Resources.Cores[i].IsAlive = false | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@startup/init.go` around lines 159 - 164, The loop over infra.Resources.Cores
is a no-op because structure.NewResourceManager() initializes Cores as nil;
remove this loop and instead ensure defaults are set when cores are actually
created: set IsAlive = false on the Core object at construction time (where new
cores are appended to infra.Resources.Cores) and populate
infra.Resources.VMLocation from the Core's VMInfoIdx at that same
core-registration path so VMLocation seeding and IsAlive initialization happen
when new cores are added (referencing infra.Resources.Cores,
infra.Resources.VMLocation, VMInfoIdx, IsAlive, and
structure.NewResourceManager).
| func (c *ControlContext) FindCoreByVmUUID(uuid UUID) *Core { | ||
| log := util.GetLogger() | ||
|
|
||
| coreIdx, err := c.GetInstanceLocation(uuid) | ||
| coreIdx, err := c.VMRepo.GetInstanceLocation(uuid) | ||
| if err != nil { | ||
| log.Error("Core not found for VM UUID %s", uuid, true) | ||
| return nil | ||
| } | ||
| return &c.Cores[coreIdx] | ||
| } | ||
|
|
||
| func (contextStructure *ControlContext) AddInstance(instanceInfo *VMInfo, coreIdx int) error { | ||
| log := util.GetLogger() | ||
| tx, err := contextStructure.DB.Begin() | ||
| if err != nil { | ||
| log.Error("Failed to start transaction %v", err) | ||
| return err | ||
| } | ||
| defer tx.Rollback() | ||
|
|
||
| // Set timeout for the transaction | ||
| ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) | ||
| defer cancel() | ||
|
|
||
| _, err = tx.ExecContext(ctx, "INSERT INTO inst_info (uuid, inst_ip, guac_pass, inst_mem, inst_vcpu, inst_disk) VALUES (?, ?, ?, ?, ?, ?)", | ||
| string(instanceInfo.UUID), | ||
| instanceInfo.IP_VM, | ||
| instanceInfo.GuacPassword, | ||
| instanceInfo.Memory, | ||
| instanceInfo.Cpu, | ||
| instanceInfo.Disk) | ||
| if err != nil { | ||
| log.Error("Failed to insert instance info: %v", err) | ||
| return err | ||
| } | ||
| _, err = tx.ExecContext(ctx, "INSERT INTO inst_loc (uuid, core) VALUES (?, ?)", | ||
| string(instanceInfo.UUID), | ||
| coreIdx) | ||
| if err != nil { | ||
| log.Error("Failed to insert instance core relation: %v", err) | ||
| return err | ||
| } | ||
| return tx.Commit() | ||
| } | ||
|
|
||
| // 현재 미사용중 | ||
| // 이건 미사용중이면 안되지않나? 인스턴스정보 DB에 업데이트 하는거같은데 | ||
| func (contextStructure *ControlContext) UpdateInstance(instanceInfo *VMInfo) error { | ||
| log := util.GetLogger() | ||
| tx, err := contextStructure.DB.Begin() | ||
| if err != nil { | ||
| log.Error("Failed to start transaction %v", err) | ||
| return err | ||
| } | ||
| defer tx.Rollback() | ||
|
|
||
| //Set timeout for the transaction | ||
| ctx, cancel := context.WithTimeout(context.Background(), 3*time.Second) | ||
| defer cancel() | ||
|
|
||
| _, err = tx.ExecContext(ctx, "UPDATE inst_info SET inst_ip = ?, inst_mem = ?, inst_vcpu = ?, inst_disk = ? WHERE uuid = ?", | ||
| instanceInfo.IP_VM, | ||
| instanceInfo.Memory, | ||
| instanceInfo.Cpu, | ||
| instanceInfo.Disk, | ||
| string(instanceInfo.UUID)) | ||
| if err != nil { | ||
| log.Error("Failed to update instance info: %v", err) | ||
| return err | ||
| } | ||
| return tx.Commit() | ||
| } | ||
|
|
||
| func (contextStructure *ControlContext) DeleteInstance(uuid UUID) error { | ||
| log := util.GetLogger() | ||
| tx, err := contextStructure.DB.Begin() | ||
| if err != nil { | ||
| log.Error("Failed to start transaction: %v", err) | ||
| return err | ||
| } | ||
| defer tx.Rollback() | ||
| // Set timeout for the transaction | ||
| ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) | ||
| defer cancel() | ||
|
|
||
| _, err = tx.ExecContext(ctx, "DELETE FROM inst_info WHERE uuid = ?", uuid) | ||
| if err != nil { | ||
| return err | ||
| } | ||
| _, err = tx.ExecContext(ctx, "DELETE FROM inst_loc WHERE uuid = ?", uuid) | ||
| if err != nil { | ||
| return err | ||
| } | ||
| return tx.Commit() | ||
| } | ||
|
|
||
| func (contextStructure *ControlContext) GetInstance(uuid UUID) (*VMInfo, error) { | ||
| log := util.GetLogger() | ||
| tx, err := contextStructure.DB.Begin() | ||
| if err != nil { | ||
| log.Error("Failed to start transaction: %v", err) | ||
| return nil, err | ||
| } | ||
| defer tx.Rollback() | ||
| //set timeout for the transaction | ||
| ctx, cancel := context.WithTimeout(context.Background(), 3*time.Second) | ||
| defer cancel() | ||
|
|
||
| var instance VMInfo | ||
| err = tx.QueryRowContext(ctx, "SELECT uuid, inst_ip, guac_pass, inst_mem, inst_vcpu, inst_disk FROM inst_info WHERE uuid = ?", uuid).Scan( | ||
| &instance.UUID, | ||
| &instance.IP_VM, | ||
| &instance.GuacPassword, | ||
| &instance.Memory, | ||
| &instance.Cpu, | ||
| &instance.Disk) | ||
| if err != nil { | ||
| log.Error("Failed to get instance info: %v", err) | ||
| return nil, err | ||
| } | ||
| return &instance, tx.Commit() | ||
| } | ||
|
|
||
| func (contextStructure *ControlContext) GetInstanceLocation(uuid UUID) (int, error) { | ||
| log := util.GetLogger() | ||
| tx, err := contextStructure.DB.Begin() | ||
| if err != nil { | ||
| log.Error("Failed to start transaction: %v", err) | ||
| return 0, err | ||
| } | ||
| defer tx.Rollback() | ||
| //set timeout for the transaction | ||
| ctx, cancel := context.WithTimeout(context.Background(), 3*time.Second) | ||
| defer cancel() | ||
|
|
||
| var coreIdx int | ||
| err = tx.QueryRowContext(ctx, "SELECT core FROM inst_loc WHERE uuid = ?", uuid).Scan(&coreIdx) | ||
| if err != nil { | ||
| log.Error("Failed to get instance location: %v", err) | ||
| return 0, err | ||
| } | ||
| return coreIdx, tx.Commit() | ||
| } | ||
|
|
||
| func (contextStructure *ControlContext) GetAllInstanceInfo() ([]VMInfo, []int, error) { | ||
| log := util.GetLogger() | ||
| tx, err := contextStructure.DB.Begin() | ||
| if err != nil { | ||
| log.Error("Failed to start transaction: %v", err) | ||
| return nil, nil, err | ||
| } | ||
| defer tx.Rollback() | ||
| //set timeout for the transaction | ||
| ctx, cancel := context.WithTimeout(context.Background(), 3*time.Second) | ||
| defer cancel() | ||
|
|
||
| var rows *sql.Rows | ||
| rows, err = tx.QueryContext(ctx, "SELECT info.uuid, loc.core, info.inst_ip, info.guac_pass, info.inst_vcpu, info.inst_mem, info.inst_disk FROM inst_loc loc JOIN inst_info info ON loc.uuid = info.uuid") | ||
| if err != nil { | ||
| log.Error("Failed to get joined instance info: %v", err) | ||
| return nil, nil, err | ||
| } | ||
|
|
||
| var coreIdxList []int | ||
| var VMInfoList []VMInfo | ||
|
|
||
| for rows.Next() { | ||
| var coreIdx int | ||
| var info VMInfo | ||
|
|
||
| if err := rows.Scan(&info.UUID, &coreIdx, &info.IP_VM, &info.GuacPassword, &info.Cpu, &info.Memory, &info.Disk); err != nil { | ||
| log.Error("Failed to scan instance location: %v", err) | ||
| return nil, nil, err | ||
| } | ||
| log.DebugInfo("Found instance: %s on core %d", info.UUID, coreIdx) | ||
| VMInfoList = append(VMInfoList, info) | ||
| coreIdxList = append(coreIdxList, coreIdx) | ||
| if coreIdx < 0 || coreIdx >= len(c.Resources.Cores) { | ||
| log.Error("Core index %d out of range for VM UUID %s", coreIdx, uuid, true) | ||
| return nil | ||
| } | ||
| return VMInfoList, coreIdxList, tx.Commit() | ||
| return &c.Resources.Cores[coreIdx] | ||
| } |
There was a problem hiding this comment.
Performance regression: this used to be an in‑memory map lookup, now every call is a DB query.
FindCoreByVmUUID is on the hot path for GetGuacamoleToken, GetVMIPByUUID indirectly, DeleteVM, StartVM, ShutdownVM, GetVMCpuInfo, GetVMMemoryInfo, GetVMDiskInfo. With this change, each of those incurs a MySQL SELECT core FROM inst_loc WHERE uuid = ? plus a transaction begin/commit (see MySQLVMRepository.GetInstanceLocation) — adding latency and load to every VM request, and making read paths fail when the DB is unavailable even though the answer is sitting in memory.
The ResourceManager.VMLocation map was populated in startup/init.go precisely for this O(1) lookup. Prefer using it, and fall back to the repository only on miss (e.g., for warm‑start / eventual‑consistency cases).
♻️ Proposed fix
func (c *ControlContext) FindCoreByVmUUID(uuid UUID) *Core {
log := util.GetLogger()
- coreIdx, err := c.VMRepo.GetInstanceLocation(uuid)
- if err != nil {
- log.Error("Core not found for VM UUID %s", uuid, true)
- return nil
- }
- if coreIdx < 0 || coreIdx >= len(c.Resources.Cores) {
- log.Error("Core index %d out of range for VM UUID %s", coreIdx, uuid, true)
- return nil
- }
- return &c.Resources.Cores[coreIdx]
+ c.Resources.RLock()
+ core, ok := c.Resources.VMLocation[uuid]
+ c.Resources.RUnlock()
+ if ok {
+ return core
+ }
+
+ // Fallback: VM not in in-memory index (e.g., created on another replica).
+ coreIdx, err := c.VMRepo.GetInstanceLocation(uuid)
+ if err != nil {
+ log.Error("Core not found for VM UUID %s", uuid, true)
+ return nil
+ }
+ c.Resources.RLock()
+ defer c.Resources.RUnlock()
+ if coreIdx < 0 || coreIdx >= len(c.Resources.Cores) {
+ log.Error("Core index %d out of range for VM UUID %s", coreIdx, uuid, true)
+ return nil
+ }
+ return &c.Resources.Cores[coreIdx]
}Also note that the bare return &c.Resources.Cores[coreIdx] currently reads the slice header without holding Resources.RLock(), which is fine as long as Cores is only ever grown during startup — worth documenting that invariant on ResourceManager to prevent future regressions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@structure/control_infra.go` around lines 18 - 31, FindCoreByVmUUID currently
does a DB lookup on every call; change it to first consult the in-memory
VMLocation map on c.Resources (ResourceManager.VMLocation) and only call
c.VMRepo.GetInstanceLocation(uuid) on a miss, then update VMLocation with the
returned core index so future calls are O(1). Keep the bounds checks for coreIdx
and handle the repo error path as before; when returning
&c.Resources.Cores[coreIdx] take Resources.RLock() (and unlock) around reading
the slice/header to avoid a race and document the startup-only-growth invariant
on ResourceManager.Cores.
| func (r *MySQLVMRepository) AddInstance(instanceInfo *VMInfo, coreIdx int) error { | ||
| log := util.GetLogger() | ||
| tx, err := r.DB.Begin() | ||
| if err != nil { | ||
| log.Error("Failed to start transaction %v", err) | ||
| return err | ||
| } | ||
| defer tx.Rollback() | ||
|
|
||
| ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) | ||
| defer cancel() |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Verify no context-less transaction starts or bare rollback defers remain.
# Expected: no output after the fix.
rg -nP '\.Begin\(\)|defer\s+\w+\.Rollback\(\)' --type go || trueRepository: easy-cloud-Knet/KWS_Control
Length of output: 885
🏁 Script executed:
cat -n structure/mysql_vm_repository.go | head -160 | tail -140Repository: easy-cloud-Knet/KWS_Control
Length of output: 4747
🏁 Script executed:
wc -l structure/mysql_vm_repository.goRepository: easy-cloud-Knet/KWS_Control
Length of output: 107
Use BeginTx(ctx, nil) with timeout context and handle rollback errors explicitly.
Transaction startup is not bounded by the intended timeout deadline. Line 22 calls DB.Begin() before the timeout context is created (line 29), so transaction acquisition has no deadline. Additionally, the bare defer tx.Rollback() at line 27 fails errcheck and silently ignores rollback failures; explicitly handle or log sql.ErrTxDone to prevent hidden errors.
🔧 Proposed transaction pattern
func (r *MySQLVMRepository) AddInstance(instanceInfo *VMInfo, coreIdx int) error {
log := util.GetLogger()
+ ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+ defer cancel()
+
- tx, err := r.DB.Begin()
+ tx, err := r.DB.BeginTx(ctx, nil)
if err != nil {
log.Error("Failed to start transaction %v", err)
return err
}
- defer tx.Rollback()
-
- ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
- defer cancel()
+ defer func() {
+ if rbErr := tx.Rollback(); rbErr != nil && rbErr != sql.ErrTxDone {
+ log.Error("Failed to rollback transaction: %v", rbErr)
+ }
+ }()Apply the same pattern to all six repository methods: UpdateInstance (line 55), DeleteInstance (line 80), GetInstance (line 103), GetInstanceLocation (line 130), and GetAllInstanceInfo (line 151), using each method's existing timeout duration.
🧰 Tools
🪛 golangci-lint (2.11.4)
[error] 27-27: Error return value of tx.Rollback is not checked
(errcheck)
[error] 22-22: (*database/sql.DB).Begin must not be called. use (*database/sql.DB).BeginTx
(noctx)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@structure/mysql_vm_repository.go` around lines 20 - 30, Create the context
with its timeout before starting a transaction and use DB.BeginTx(ctx, nil)
instead of DB.Begin() in MySQLVMRepository.AddInstance (and likewise in
UpdateInstance, DeleteInstance, GetInstance, GetInstanceLocation,
GetAllInstanceInfo), and replace the bare defer tx.Rollback() with an explicit
deferred rollback that checks and logs errors (ignoring sql.ErrTxDone) so
rollback failures aren't silently dropped; ensure each method uses its existing
timeout duration when creating the context.
| _, err = tx.ExecContext(ctx, "UPDATE inst_info SET inst_ip = ?, inst_mem = ?, inst_vcpu = ?, inst_disk = ? WHERE uuid = ?", | ||
| instanceInfo.IP_VM, | ||
| instanceInfo.Memory, | ||
| instanceInfo.Cpu, | ||
| instanceInfo.Disk, | ||
| string(instanceInfo.UUID)) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Inspect UpdateInstance call sites to confirm whether GuacPassword can change.
# Expected: If callers pass updated GuacPassword values, include guac_pass in the UPDATE.
rg -nP '\bUpdateInstance\s*\(' -C4 --type goRepository: easy-cloud-Knet/KWS_Control
Length of output: 1206
🏁 Script executed:
fd -t go -x grep -l "type VMInfo struct" {} \;Repository: easy-cloud-Knet/KWS_Control
Length of output: 240
🏁 Script executed:
rg -n "type VMInfo struct" -A 20 --type goRepository: easy-cloud-Knet/KWS_Control
Length of output: 458
🏁 Script executed:
rg -n "func.*AddInstance" -A 15 --type go | head -100Repository: easy-cloud-Knet/KWS_Control
Length of output: 1195
🏁 Script executed:
rg -n "guac_pass" --type go -C 2Repository: easy-cloud-Knet/KWS_Control
Length of output: 1333
🏁 Script executed:
rg -n "func.*UpdateInstance" -A 25 structure/mysql_vm_repository.goRepository: easy-cloud-Knet/KWS_Control
Length of output: 880
🏁 Script executed:
rg -n "\.UpdateInstance\(" --max-count=20Repository: easy-cloud-Knet/KWS_Control
Length of output: 53
🏁 Script executed:
rg -n "UpdateInstance" -C 3Repository: easy-cloud-Knet/KWS_Control
Length of output: 951
🏁 Script executed:
rg -n "GuacPassword" -C 2Repository: easy-cloud-Knet/KWS_Control
Length of output: 1894
🏁 Script executed:
rg -n "password|Password" -i --max-count=30Repository: easy-cloud-Knet/KWS_Control
Length of output: 3764
🏁 Script executed:
rg -n "UpdateInstance.*VMInfo\|GetInstance.*UpdateInstance" -B 5 -A 5Repository: easy-cloud-Knet/KWS_Control
Length of output: 53
🏁 Script executed:
fd . -name "*.go" -type f | head -20Repository: easy-cloud-Knet/KWS_Control
Length of output: 301
Add guac_pass to the UPDATE statement, or document that GuacPassword is immutable.
UpdateInstance accepts a full VMInfo struct but omits guac_pass from the UPDATE query (line 65). All other repository methods—AddInstance, GetInstance, and GetAllInstanceInfo—explicitly persist or retrieve GuacPassword. This inconsistency means if password rotation is ever needed, callers passing updated GuacPassword values will have those changes silently ignored.
🔧 Proposed fix
- _, err = tx.ExecContext(ctx, "UPDATE inst_info SET inst_ip = ?, inst_mem = ?, inst_vcpu = ?, inst_disk = ? WHERE uuid = ?",
+ _, err = tx.ExecContext(ctx, "UPDATE inst_info SET inst_ip = ?, guac_pass = ?, inst_mem = ?, inst_vcpu = ?, inst_disk = ? WHERE uuid = ?",
instanceInfo.IP_VM,
+ instanceInfo.GuacPassword,
instanceInfo.Memory,
instanceInfo.Cpu,
instanceInfo.Disk,
string(instanceInfo.UUID))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| _, err = tx.ExecContext(ctx, "UPDATE inst_info SET inst_ip = ?, inst_mem = ?, inst_vcpu = ?, inst_disk = ? WHERE uuid = ?", | |
| instanceInfo.IP_VM, | |
| instanceInfo.Memory, | |
| instanceInfo.Cpu, | |
| instanceInfo.Disk, | |
| string(instanceInfo.UUID)) | |
| _, err = tx.ExecContext(ctx, "UPDATE inst_info SET inst_ip = ?, guac_pass = ?, inst_mem = ?, inst_vcpu = ?, inst_disk = ? WHERE uuid = ?", | |
| instanceInfo.IP_VM, | |
| instanceInfo.GuacPassword, | |
| instanceInfo.Memory, | |
| instanceInfo.Cpu, | |
| instanceInfo.Disk, | |
| string(instanceInfo.UUID)) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@structure/mysql_vm_repository.go` around lines 65 - 70, UpdateInstance
currently executes tx.ExecContext without including guac_pass, so
VMInfo.GuacPassword updates are ignored; either persist guac_pass or explicitly
document immutability. Fix by adding guac_pass to the UPDATE SQL and including
instanceInfo.GuacPassword in the tx.ExecContext parameter list within
UpdateInstance (the same pattern used in
AddInstance/GetInstance/GetAllInstanceInfo), or alternatively update the
function/method comment and signature to indicate GuacPassword is immutable so
callers know changes are ignored.
| var rows *sql.Rows | ||
| rows, err = tx.QueryContext(ctx, "SELECT info.uuid, loc.core, info.inst_ip, info.guac_pass, info.inst_vcpu, info.inst_mem, info.inst_disk FROM inst_loc loc JOIN inst_info info ON loc.uuid = info.uuid") | ||
| if err != nil { | ||
| log.Error("Failed to get joined instance info: %v", err) | ||
| return nil, nil, err | ||
| } | ||
|
|
||
| var coreIdxList []int | ||
| var VMInfoList []VMInfo | ||
|
|
||
| for rows.Next() { | ||
| var coreIdx int | ||
| var info VMInfo | ||
|
|
||
| if err := rows.Scan(&info.UUID, &coreIdx, &info.IP_VM, &info.GuacPassword, &info.Cpu, &info.Memory, &info.Disk); err != nil { | ||
| log.Error("Failed to scan instance location: %v", err) | ||
| return nil, nil, err | ||
| } | ||
| log.DebugInfo("Found instance: %s on core %d", info.UUID, coreIdx) | ||
| VMInfoList = append(VMInfoList, info) | ||
| coreIdxList = append(coreIdxList, coreIdx) | ||
| } | ||
| return VMInfoList, coreIdxList, tx.Commit() |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Verify row iteration includes Close and Err handling.
# Expected: `rows.Close()` and `rows.Err()` appear near the `rows.Next()` loop.
rg -nP 'rows\.Next\(\)|rows\.Err\(\)|rows\.Close\(\)' -C3 --type goRepository: easy-cloud-Knet/KWS_Control
Length of output: 434
🏁 Script executed:
cat -n structure/mysql_vm_repository.go | sed -n '155,190p'Repository: easy-cloud-Knet/KWS_Control
Length of output: 1244
Close rows and check iteration errors before committing.
The query result rows at line 162 is never explicitly closed, and rows.Err() is not checked after the loop. This can leak database connections and silently return partial results if an error occurs mid-iteration (e.g., context timeout, network issue). Add a defer rows.Close() immediately after the QueryContext call and check rows.Err() before the final return.
🔧 Proposed fix for row cleanup and error propagation
rows, err = tx.QueryContext(ctx, "SELECT info.uuid, loc.core, info.inst_ip, info.guac_pass, info.inst_vcpu, info.inst_mem, info.inst_disk FROM inst_loc loc JOIN inst_info info ON loc.uuid = info.uuid")
if err != nil {
log.Error("Failed to get joined instance info: %v", err)
return nil, nil, err
}
+ defer func() {
+ if closeErr := rows.Close(); closeErr != nil {
+ log.Error("Failed to close instance info rows: %v", closeErr)
+ }
+ }()
var coreIdxList []int
var VMInfoList []VMInfo
for rows.Next() {
@@
VMInfoList = append(VMInfoList, info)
coreIdxList = append(coreIdxList, coreIdx)
}
+ if err := rows.Err(); err != nil {
+ log.Error("Failed while iterating instance info rows: %v", err)
+ return nil, nil, err
+ }
return VMInfoList, coreIdxList, tx.Commit()
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| var rows *sql.Rows | |
| rows, err = tx.QueryContext(ctx, "SELECT info.uuid, loc.core, info.inst_ip, info.guac_pass, info.inst_vcpu, info.inst_mem, info.inst_disk FROM inst_loc loc JOIN inst_info info ON loc.uuid = info.uuid") | |
| if err != nil { | |
| log.Error("Failed to get joined instance info: %v", err) | |
| return nil, nil, err | |
| } | |
| var coreIdxList []int | |
| var VMInfoList []VMInfo | |
| for rows.Next() { | |
| var coreIdx int | |
| var info VMInfo | |
| if err := rows.Scan(&info.UUID, &coreIdx, &info.IP_VM, &info.GuacPassword, &info.Cpu, &info.Memory, &info.Disk); err != nil { | |
| log.Error("Failed to scan instance location: %v", err) | |
| return nil, nil, err | |
| } | |
| log.DebugInfo("Found instance: %s on core %d", info.UUID, coreIdx) | |
| VMInfoList = append(VMInfoList, info) | |
| coreIdxList = append(coreIdxList, coreIdx) | |
| } | |
| return VMInfoList, coreIdxList, tx.Commit() | |
| var rows *sql.Rows | |
| rows, err = tx.QueryContext(ctx, "SELECT info.uuid, loc.core, info.inst_ip, info.guac_pass, info.inst_vcpu, info.inst_mem, info.inst_disk FROM inst_loc loc JOIN inst_info info ON loc.uuid = info.uuid") | |
| if err != nil { | |
| log.Error("Failed to get joined instance info: %v", err) | |
| return nil, nil, err | |
| } | |
| defer func() { | |
| if closeErr := rows.Close(); closeErr != nil { | |
| log.Error("Failed to close instance info rows: %v", closeErr) | |
| } | |
| }() | |
| var coreIdxList []int | |
| var VMInfoList []VMInfo | |
| for rows.Next() { | |
| var coreIdx int | |
| var info VMInfo | |
| if err := rows.Scan(&info.UUID, &coreIdx, &info.IP_VM, &info.GuacPassword, &info.Cpu, &info.Memory, &info.Disk); err != nil { | |
| log.Error("Failed to scan instance location: %v", err) | |
| return nil, nil, err | |
| } | |
| log.DebugInfo("Found instance: %s on core %d", info.UUID, coreIdx) | |
| VMInfoList = append(VMInfoList, info) | |
| coreIdxList = append(coreIdxList, coreIdx) | |
| } | |
| if err := rows.Err(); err != nil { | |
| log.Error("Failed while iterating instance info rows: %v", err) | |
| return nil, nil, err | |
| } | |
| return VMInfoList, coreIdxList, tx.Commit() |
🧰 Tools
🪛 golangci-lint (2.11.4)
[error] 162-162: rows.Err must be checked
(rowserrcheck)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@structure/mysql_vm_repository.go` around lines 161 - 183, After calling
tx.QueryContext(...) you need to immediately defer rows.Close() and after the
rows.Next() loop check if rows.Err() is non-nil; if it is, return the error
instead of committing. Update the function that builds VMInfoList/coreIdxList to
add defer rows.Close() right after the QueryContext call, check err :=
rows.Err() after the loop and return nil, nil, err when non-nil, and only call
tx.Commit() when iteration succeeded.
| - ./init-test-db.sql:/docker-entrypoint-initdb.d/init.sql | ||
| healthcheck: | ||
| test: ["CMD", "mysqladmin", "ping", "-h", "localhost", "-ptestpass"] | ||
| test: ["CMD", "mysql", "-uroot", "-ptestpass", "-h", "127.0.0.1", "-P", "3306"] |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n tests/docker-compose.test.yml | head -30Repository: easy-cloud-Knet/KWS_Control
Length of output: 1020
🏁 Script executed:
git log --oneline -20 tests/docker-compose.test.ymlRepository: easy-cloud-Knet/KWS_Control
Length of output: 105
🏁 Script executed:
git show 756c338 --statRepository: easy-cloud-Knet/KWS_Control
Length of output: 3563
🏁 Script executed:
git show 756c338:tests/docker-compose.test.yml | head -50Repository: easy-cloud-Knet/KWS_Control
Length of output: 1316
🏁 Script executed:
sed -n '100,150p' tests/blackbox_io_test.shRepository: easy-cloud-Knet/KWS_Control
Length of output: 1934
🏁 Script executed:
grep -n "MAX_WAIT" tests/blackbox_io_test.sh | head -5Repository: easy-cloud-Knet/KWS_Control
Length of output: 163
🏁 Script executed:
cat -n tests/docker-compose.test.ymlRepository: easy-cloud-Knet/KWS_Control
Length of output: 2284
🏁 Script executed:
# Check if there are any other references to mysql healthchecks in the repo
rg "mysql.*healthcheck" -A 5 -B 5Repository: easy-cloud-Knet/KWS_Control
Length of output: 53
🏁 Script executed:
# Check MySQL 8.0 CLI behavior - look at initialization script to understand setup
cat tests/init-test-db.sql | head -20Repository: easy-cloud-Knet/KWS_Control
Length of output: 620
🏁 Script executed:
# Search for any comments in README or documentation about the test setup
cat README.md | grep -A 10 -B 10 -i "test\|health" || echo "No direct matches"Repository: easy-cloud-Knet/KWS_Control
Length of output: 88
🏁 Script executed:
# Check if there are any GitHub issues or comments referencing the healthcheck
find . -type f \( -name "*.md" -o -name "*.txt" -o -name ".github" \) -exec grep -l "health" {} \; 2>/dev/null | head -5Repository: easy-cloud-Knet/KWS_Control
Length of output: 53
🏁 Script executed:
# Check git log for any mentions of healthcheck or select in commit messages
git log --all --oneline | grep -i "health\|select\|mysql" | head -10Repository: easy-cloud-Knet/KWS_Control
Length of output: 186
Add -e "SELECT 1" to strengthen the MySQL healthcheck probe.
Line 10 invokes mysql without -e, which only validates connection and authentication but does not execute an actual query. To match the pattern of other healthchecks in the file (redis-cli ping, Python http request) and ensure the database is truly ready to serve queries, add the SQL probe:
- test: ["CMD", "mysql", "-uroot", "-ptestpass", "-h", "127.0.0.1", "-P", "3306"]
+ test: ["CMD", "mysql", "-uroot", "-ptestpass", "-h", "127.0.0.1", "-P", "3306", "-e", "SELECT 1"]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| test: ["CMD", "mysql", "-uroot", "-ptestpass", "-h", "127.0.0.1", "-P", "3306"] | |
| test: ["CMD", "mysql", "-uroot", "-ptestpass", "-h", "127.0.0.1", "-P", "3306", "-e", "SELECT 1"] |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/docker-compose.test.yml` at line 10, Update the MySQL healthcheck
command that currently uses the test array invoking mysql (the line starting
with test: ["CMD", "mysql", "-uroot", "-ptestpass", "-h", "127.0.0.1", "-P",
"3306"]) to include an explicit query probe by adding the -e "SELECT 1" argument
so the healthcheck both authenticates and executes a trivial SQL statement to
confirm readiness.
| retries: 30 | ||
| start_period: 50s |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Find and read the Docker Compose file
fd docker-compose.test.ymlRepository: easy-cloud-Knet/KWS_Control
Length of output: 100
🏁 Script executed:
# Find the blackbox test script
fd blackbox_io_test.shRepository: easy-cloud-Knet/KWS_Control
Length of output: 96
🏁 Script executed:
cat -n tests/docker-compose.test.yml | head -30Repository: easy-cloud-Knet/KWS_Control
Length of output: 1020
🏁 Script executed:
sed -n '100,140p' tests/blackbox_io_test.shRepository: easy-cloud-Knet/KWS_Control
Length of output: 1631
🏁 Script executed:
grep -n "MAX_WAIT" tests/blackbox_io_test.shRepository: easy-cloud-Knet/KWS_Control
Length of output: 163
Align the healthcheck window with the blackbox readiness timeout.
With start_period: 50s, retries: 30, and interval: 3s, MySQL can take up to 140 seconds to become healthy, while tests/blackbox_io_test.sh waits only 90 seconds for the service to be ready. Shorten the healthcheck parameters or increase MAX_WAIT to prevent CI timeouts.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/docker-compose.test.yml` around lines 13 - 14, The healthcheck window
(start_period, retries, interval) currently allows up to ~140s for MySQL to pass
while tests/blackbox_io_test.sh uses MAX_WAIT=90s; either shorten the
docker-compose healthcheck values (e.g., reduce start_period and/or retries on
the service healthcheck keys: start_period, retries, interval) so total wait <=
MAX_WAIT, or increase the MAX_WAIT constant used by tests/blackbox_io_test.sh to
match the healthcheck window; update the healthcheck fields or the MAX_WAIT
variable accordingly to keep both timeouts aligned.
|
해당 PR만 사용해서 Incremental 하게 Step2 Step3 또한 적용할 예정입니다. |
|
재밌어 보이긴 하는데 저는 목요일 시험 끝나기 전까진 확인이 어려울 것 같습니다. |
| // ControlContext에서 DB 관련 책임을 분리 | ||
| type VMRepository interface { | ||
| AddInstance(instanceInfo *VMInfo, coreIdx int) error | ||
| UpdateInstance(instanceInfo *VMInfo) error |
There was a problem hiding this comment.
나중에 서비스마다 패키지화가 된다면 호출부에서 인터페이스 정의하는 방법도 괜찮아 보입니다!
db레이어 테스트용으로는 괜찮아보이네요
github.com/DATA-DOG/go-sqlmock 라는 패키지로 db 모킹할 수 있다는데 나중에 같이 추가해보시죵
| Resources *ResourceManager | ||
| Config Config | ||
| DB *sql.DB | ||
| DB *sql.DB // subnet 등 직접 쿼리용 (추후 별도 Repository로 분리 예정) |
There was a problem hiding this comment.
나중에 repository 로 분리할때 새로운 폴더나 파일 같이 고민해보면 좋겠네요
지금으로써는 레포지토리 종류가 많지는 않아서 시간대비 효율이 안나올거 같긴합니당
| Config Config | ||
| DB *sql.DB | ||
| DB *sql.DB // subnet 등 직접 쿼리용 (추후 별도 Repository로 분리 예정) | ||
| GuacDB *sql.DB |
There was a problem hiding this comment.
인터페이스로 갓 오브젝트를 줄이는 것은 아주 좋은 방향 같습니다.
추후에 기능별로 구조체가 비대해질 것 같다면 조금 더 쪼갤 수도 있겠네요.
(e.g., mysqlRepository 인터페이스가 모든 SQL 을 담당하는데, 데코레이터로 DB 트랜잭션 + DB 접근 로직별로 패키지화)
지금 단계에서는 오버엔지니어링 같긴합니다.
의견 부탁드려요!
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
|
안녕하세요? 리뷰 필요하실때 rerequest 해주세요 |
Co-authored-by: Copilot <copilot@github.com>
Summary
Motivation
Approach1.
|
| 메서드 | 책임 |
|---|---|
SelectCore(req) |
RLock 안에서 적합한 코어 탐색, 진단 정보(CoreSelectionResult) 함께 반환 |
AllocateResources(core, uuid, vm, req) |
Lock 안에서 VMInfoIdx + Free* 원자 갱신 |
DeallocateResources(core, uuid, req) |
Allocate의 역연산 (롤백 시 사용) |
RegisterVM(uuid, core, vm) |
VMLocation + AliveVM 일관성 보장 등록 |
UnregisterAlive(uuid) |
AliveVM에서 안전한 탐색+삭제 |
신규 타입 HardwareRequirement, CoreSelectionResult 도입.
3. cleanupChain 헬퍼 도입 (service/cleanup.go)
- 단계별 누적된 롤백 함수를 LIFO 순서로 실행하는 명시적 체인
- bool 플래그 3개 →
cleanup.push(fn)호출로 단순화 run()후steps = nil로 idempotency 보장 (이중 실행 방지)
4. CreateVM Orchestrator 패턴 전환 (service/vm.go)
224줄 단일 함수를 10단계 주석이 달린 얇은 오케스트레이터로 재구성. 각 단계는 헬퍼로 분리:
1) selectCoreOrFail() — 코어 선택 + 실패 시 진단 로그
2) GenerateSSHKey()
3) allocateCmsSubnet() → cleanup.push (신규 서브넷 시)
4) guacamole.Configure() → cleanup.push
5) Resources.AllocateResources() → cleanup.push
6) StoreVMInfoToRedis() (best-effort)
7) buildCoreCreateRequest() + coreClient.CreateVM() — 실패 시 cleanup.run()
8) VMRepo.AddInstance() — 실패 시 cleanup.run()
8-1) 신규 서브넷이면 last_subnet IP 갱신
9) Resources.RegisterVM() — VMLocation + AliveVM 원자 등록
5. DeleteVM 단계 분리
4단계로 명시적 구분: Core 삭제 → DB 삭제 → Guacamole 정리(best-effort) → Redis 정리(best-effort).
6. 계층 간 DTO 도입
API → Service → Client/Persistence 각 경계에서 DTO 매핑.
| 계층 | 위치 | 역할 |
|---|---|---|
| API | api/의 ApiXxxRequest/Response |
HTTP wire contract |
| Service Input | service/dto.go의 CreateVMInput, HardwareSpec, NetworkSpec, UserSpec |
비즈니스 입력, 외부와 독립 |
| Service Output | service/dto.go의 VMInfo, VMCpuStatus, VMMemoryStatus, VMDiskStatus |
비즈니스 출력 |
| Client | client/model/ |
외부 Core API wire contract (변경 없음) |
| Persistence | service/redis.go의 redisVMInfo (private) |
Redis 저장 직렬화 포맷 |
매핑 헬퍼:
ApiCreateVmRequest.ToServiceInput()— API → ServicebuildCoreCreateRequest()(service 내부) — Service → ClientredisVMInfo.toServiceDTO()— Persistence → Service
결과: api/ 패키지가 client/model을 더 이상 import하지 않음 → 외부 Core API 변경이 API contract에 직접 전파되지 않음.
Type of Change
- Bug fix
- New feature
- Refactoring
- Docs / Config
- CI/CD
Related Issue
Testing
- Tested locally
-
go build ./...,go vet ./...에러 없이 통과 - 통합 테스트 (
bash tests/blackbox_io_test.sh) 전체 66/66 PASS- Section 1: HTTP Method Routing (8)
- Section 2: Invalid JSON Body (7)
- Section 3: Field Validation (4)
- Section 4: Non-existent UUID (5)
- Section 5: Happy Path — VM Lifecycle (Create → Info → Start → Status → Shutdown → Delete) (13)
- Section 6: Redis Endpoints (29)
-
- No regression in existing functionality
Files Changed (주요)
| 분류 | 파일 | 변경 내용 |
|---|---|---|
| 신규 | client/cms.go |
CMS HTTP 클라이언트 분리 |
| 신규 | service/cleanup.go |
cleanupChain LIFO 롤백 헬퍼 |
| 신규 | service/dto.go |
service 계층 입출력 DTO |
| 변경 | structure/resource_manager.go |
SelectCore, AllocateResources, DeallocateResources, RegisterVM, UnregisterAlive 도메인 메서드 추가 |
| 변경 | service/vm.go |
CreateVM/DeleteVM Orchestrator 전환, DTO 입출력 |
| 변경 | service/network.go |
CmsClient 정의 제거, client.CmsClient 사용 (163줄 → 81줄) |
| 변경 | service/redis.go |
내부 redisVMInfo + 외부 VMInfo 분리, loadRedisVMInfo private 헬퍼 |
| 변경 | api/create_vm.go |
ApiCreateVmRequest + ToServiceInput(), client/model import 제거 |
| 변경 | api/get_vm_status.go |
ApiVmCpuStatusResponse/MemoryStatusResponse/DiskStatusResponse 추가 |
Follow-ups (이번 PR 범위 외)
- Step 3
startup/init.go초기화 로직 분할: DB 초기화 / 코어 헬스체크 / VM 상태 복원pkg/guacamole/config.go의 10단계 단일 함수를 모듈별로 분할- VM/Vm/vm 명명을 Instance로 통일 (DB 스키마
inst_*와 정합성 확보) client/model.VMRedisInfo,VMStatus*상수 등 deprecated 항목 정리
Checklist
- Reviewers assigned
- Related issue linked
| log.Error("CMS : failed to marshal JSON: %v", err) | ||
| return nil, fmt.Errorf("CmsClient.RequestNewInstance: failed to marshal JSON: %w", err) |
There was a problem hiding this comment.
로그는 CMS로 시작하고, 에러는 cmsClient.RequestNewInstance로 따로 놀고있는데, 통일하면 어떨까 싶습니다.
|
|
||
| type CmsResponse struct { | ||
| IP string `json:"ip"` | ||
| MacAddr string `json:"macAddr"` |
There was a problem hiding this comment.
RequestNewInstance 추가하는 김에 CmsResponse도 NewInstanceResponse나 ResponseNewInstance처럼 통일시키는 것도 괜찮아 보입니다!
| // 3) CMS 서브넷 할당 (Add: 기존 서브넷 / New: 신규 서브넷) | ||
| cmsResp, isNewSubnet, err := allocateCmsSubnet(contextStruct, input.SubnetType, uuid, cleanup) | ||
| if err != nil { | ||
| log.Error("CreateVM: failed to configure cms: %v", err, true) | ||
| return fmt.Errorf("CreateVM: failed to configure cms: %w", err) | ||
| return err | ||
| } | ||
|
|
||
| fmt.Printf("%s\n", cmsResp.IP) | ||
| fmt.Printf("%s\n", cmsResp.MacAddr) | ||
| fmt.Printf("%s\n", cmsResp.SdnUUID) | ||
|
|
||
| userPass := guacamole.Configure(req.Users[0].Name, string(req.UUID), cmsResp.IP, privateKeyPEM, contextStruct.GuacDB) | ||
| log.DebugInfo("CMS allocated: ip=%s, mac=%s, sdn=%s", cmsResp.IP, cmsResp.MacAddr, cmsResp.SdnUUID) | ||
|
|
||
| // 4) Guacamole 사용자/커넥션 설정 | ||
| if len(input.Users) == 0 { | ||
| cleanup.run() | ||
| return fmt.Errorf("CreateVM: at least one user is required") | ||
| } | ||
| userPass := guacamole.Configure(input.Users[0].Name, string(uuid), cmsResp.IP, privateKeyPEM, contextStruct.GuacDB) |
There was a problem hiding this comment.
할당 이후, 유저 체크하는 것보다 유저 체크하고(입력한 모든 값들에 대해 검증을 한 이후) 할당하는 게 어떨까 싶습니다
There was a problem hiding this comment.
일단 로직 옮기는 걸 중시해서 그걸 생각을 못했네요
현재 아키텍처에서는 검증이 이루어지는 부분에 대해 해당 플로우로 적용해놓겠습니다.
추후 사용자 입력 검증 보강도 필요하리라 생각이 드는데 추가될검증 로직과 현행 검증 로직은 API단으로 옮기는게 어떨까 싶긴 하네요
kwonkwonn
left a comment
There was a problem hiding this comment.
개별적으로 구현되어있었던 로직을 분리시키는 것 + 롤백 함수 구현은 아주 좋아보입니다.
다만,
service/cleanup이 서비스 레이어에서만 호출 되는 것을 목적으로 하셨는지- 만약 아니라면 공용 레이어에 정의 해봐도 좋을 것 같아요.
- 현재는 오퍼레이션 별로 cleanup.push(), 에러 발생시 cleanUp.run() 수행인데, 에러 발생시 run 을 호출해야 하기 때문에 자잘한 실수가 생길 수 있는 우려가 있을 거 같아요.
- 추후에 휴먼에러가 자주 발생하거나 서비스별 오퍼레이션이 더욱 많아진다면 아래처럼 패턴을 강제해도 좋을 것 같습니다.
var err error
var chain chain
defer func(){
if err!= nil{
chain.run()
}
}
err = operation1()
if err == ...{
return log
}
err = operation2()
if err == ...
수고하셨습니다.
| if err != nil { | ||
| log.Error("CreateVM: failed to configure cms: %v", err, true) | ||
| return fmt.Errorf("CreateVM: failed to configure cms: %w", err) | ||
| return err |
There was a problem hiding this comment.
에러는 기존 에러 포맷으로 유지하는 것이 원인 트래킹에는 조금 더 효율적일 것 같습니다!
| return nil, -1, fmt.Errorf("CreateVM: no suitable core found") | ||
| } | ||
|
|
||
| func allocateCmsSubnet(contextStruct *vms.ControlContext, subnetType string, uuid vms.UUID, cleanup *cleanupChain) (*client.CmsResponse, bool, error) { |
There was a problem hiding this comment.
cleanup 체인이 내부 함수 인자로 들어가는 패턴은 클린 업 관리 측면에서 경계가 모호해 추가 확장시에 휴먼 에러나 디버깅 난이도 상승 우려됩니다.
함수 내부에서 cleanUpChain 생성 + 반환 -> caller 에서 join 혹은
함수 내부에서 cleanUP.run() 이후 에러 반환 혹은
함수를 오퍼레이션 단위로 생각하여 allocateCmsSubnet 용 클린업을 외부에서 호출
하는 방식이 좀 더 안전하지 않을까 싶습니다
There was a problem hiding this comment.
모든 할당, 변환 함수가 자체 cleanup에 대한 책임을 갖는 것이 어떨까 라는 생각에서 일단 cleanup 로직 자체가 존재하지 않는 allocate cms subnet영역에 적용해 봤는데
이건 추후 논의가 좀 필요할 것 같기도 하고 변경점이 꽤나 생길만한 여지가 있는거 같아서
일단 해당부분에서 함수의 인자로 넘기는 것보다는 이전에 적용해 놨던 방식으로 다시 변환해 놓고 추후 논의 후에 방향을 결정해보면 좋을 것 같네요
| cleanup.push(func() { | ||
| log.Info("clean up: guacamole") | ||
| if err := guacamole.Cleanup(string(uuid), contextStruct.GuacDB); err != nil { | ||
| log.Error("Failed to cleanup Guacamole config during rollback: %v", err) |
Summary
갓 오브젝트였던
ControlContext를VMRepository인터페이스와ResourceManager구조체로 분리하여 DB 영속성 책임과 런타임 인메모리 상태 관리 책임을 명확히 구분했습니다.Motivation
structure/control_infra.go의ControlContext가 전체 서비스를 담당mysqladmin ping)이 컨테이서 생성 시 통과하도록 되어있어control-test컨테이너의 DB 연결 실패로 인한 panic이슈 존재Approach
1.
VMRepository인터페이스 도입 (structure/repository.go)MySQLVMRepository구현체 분리 (structure/mysql_vm_repository.go)2.
ResourceManager구조체 도입 (structure/resource_manager.go)sync.RWMutex+Cores+AliveVM+VMLocation을 하나의 응집된 구조체로 묶음3.
ControlContext분리 (structure/control_infra.go)VMRepo VMRepository+Resources *ResourceManager조합 구조로 변경FindCoreByVmUUID()는 리포지토리와 리소스 매니저를 조율하는 얇은 오케스트레이션 계층으로 유지4. 호출부 전면 업데이트
service/vm.go:contextStruct.AddInstance()→contextStruct.VMRepo.AddInstance(), Lock/Cores/VMLocation/AliveVM 접근을contextStruct.Resources.*로 전환service/guacamole.go,service/network.go: RLock/VMLocation 접근 경로 업데이트startup/init.go:NewMySQLVMRepository(),NewResourceManager()초기화 플로우 적용main.go:contextStruct.Cores→contextStruct.Resources.Cores5. MySQL healthcheck 강화 (
tests/docker-compose.test.yml)mysqladmin ping→mysql -e "SELECT 1"로 변경하여 실제 TCP 연결 준비 상태를 검증retries: 30,start_period: 10s적용Type of Change
Related Issue
#37
Testing
go build ./...,go vet ./...에러 없이 통과bash tests/blackbox_io_test.sh) 전체 66/66 PASSFollow-ups (이번 PR 범위 외)
CreateVM()의 8가지 이상의 책임을Orchestrator패턴으로 분리CmsClient를service/→client/패키지로 이동 (다른 PR이 담당 중)startup/init.go초기화 로직 분할: DB 초기화 / 코어 헬스체크 / VM 상태 복원pkg/guacamole/config.go의 10단계 단일 함수를 모듈별로 분할Checklist
Summary by CodeRabbit
Refactor
Tests