Skip to content

ldos-project/mimesys

Repository files navigation

Mimesys: Generating Realistic Executable Testing Environments from Resource Usage Traces

Diffusion-based system emulation framework. Trains a conditional diffusion model to generate executable workloads that reproduce hardware performance traces of target applications.


Contents


Requirements

  • Python 3.10
  • CUDA 12.x
  • uv (curl -LsSf https://astral.sh/uv/install.sh | sh)

Installation

uv sync

This creates a .venv in the repo root, installs all pinned dependencies from uv.lock, and installs the mimesys package in editable mode. To run any command inside the environment, prefix it with uv run.

To activate the venv directly (optional):

source .venv/bin/activate

1. Target Machine Setup

Data collection requires a cluster of machines accessible over SSH. Each node runs the benchmark and ships results back to the controller. The scripts in worker_scripts/ automate this setup.

SSH configuration

Fill in worker_scripts/config.py with your credentials and the list of worker hostnames:

USERNAME        = "your_username"
PRIVATE_KEY_PATH = "~/.ssh/id_rsa"
HOSTNAMES       = ["worker-01.example.com", "worker-02.example.com", ...]

Installing dependencies on each worker

Run the following command to install all required dependencies for executing synthesized workloads and collecting profiling metrics.

Command

cd worker_scripts
uv run python install_remote_dependencies.py

How collection works on each machine

During each active-learning round, the controller:

  1. Writes execution plans (HDF5 files) to per-node directories and transfers them over SSH.
  2. Each node runs worker_scripts/collect_mimesys_metrics.sh, which invokes the benchmark with the assigned plans and collects hardware performance counters.
  3. Results are zipped, copied back to the controller via scp, then parsed and filtered before being added to the training dataset.

2. Data Collection

Training data is collected via novelty-guided active learning (collection/collect_training_data.py). The loop iteratively profiles stressor compositions on remote machines and prioritizes those likely to produce underexplored resource-usage patterns.

Command

cd mimesys/collection
uv run python collect_training_data.py --rounds 20

--rounds sets the number of active-learning rounds after the initial sweep (round 0). Output is written to the path configured in OUTPUT_PATH at the top of the script.

Overview

Round 0 — Initialization

  • A one-hot sweep (initial_candidates) covers each of the stressors in isolation across varying thread counts and weight scales. This anchors the dataset with ground-truth single-stressor behavior at the extremes of the metric space.

Rounds 1+ — Active Learning

  • Each round proposes a batch of stressor compositions using a hybrid of two complementary strategies (novelty / convex hull interpolation, controlled by HULL_FPS_RATIO):
  1. RF novelty + FPS selection — trains a Random Forest surrogate on the current dataset. A large pool of candidates is generated by mixing mutations of existing actions with fresh random compositions. Each candidate is scored by:
  • Rarity: negative log-density under a KDE fit to the observed metric distribution (low density = underrepresented region)
  • Uncertainty: log-determinant of the RF per-tree prediction covariance (high variance = low-confidence region)

The combined novelty score modulates a greedy selection in predicted metric space, ensuring the final batch is both novel and mutually diverse.

  1. Convex-hull interpolation — builds a convex hull of the observed metric space across selected metric subspaces. Grid cells inside the hull are sorted by occupancy (empty first), and new compositions are synthesized by interpolating the actions of their nearest neighbors, targeting gaps in the covered metric space.

After profiling, each round filters out high-variance measurements (per-metric variance exceeding 10% of the current observed range) before adding them to the dataset, keeping only stable, reproducible traces.


3. Training

All training commands are run from mimesys/training/.

Supervised pretraining

cd mimesys/training
CUDA_VISIBLE_DEVICES=0 uv run python trainer.py +exps=pretrain

Multi-GPU

CUDA_VISIBLE_DEVICES=0,1 uv run python trainer.py +exps=pretrain \
    train.trainer.devices=2 \
    train.trainer.strategy=ddp

Resume from checkpoint

uv run python trainer.py +exps=pretrain \
    train.trainer.ckpt_path=/path/to/diffusion-epoch=999.ckpt

Training config example

# mimesys/conf/exps/concat_aug10.yaml  (train + model sections)
train:
  trainer:
    trainer_model_name: MimesysTrainer
    max_epochs: 2000
    devices: 1
    precision: 16-mixed           # inherited from base
    check_val_every_n_epoch: 500
    ckpt_path: ""
  optim:
    lr: 1e-4
  callbacks:
    checkpoint:
      dirpath: diffusion/concat_aug10
      every_n_epochs: 500
      monitor: epoch
      save_top_k: -1
      save_last: true
  run_train: true
  run_test: false
  use_rl: false
  prev_state_lambda: 0.0

model:
  unet:
    input_dim: 20           # num_stressor_types × num_parameters
  context:
    input_dim: 25           # trace feature dimension
    action_dim: 260
    num_heads: 4
    num_layers: 6
    hidden_dim: 256
    dropout: 0.1
    encoder_type: concat
  diffusion:
    n_timesteps: 25
    cfg_args:
      cfg_drop_prob: 0.1
      cfg_guide_w: 3

log:
  project_name: mimesys
  run_name: diffusion/concat_aug10

Training metrics are logged to Weights & Biases under log.project_name / log.run_name.


4. RL Fine-tuning

RL fine-tuning uses DDPO with a profiling-based reward. Start from a pretrained supervised checkpoint.

Command

cd mimesys/training
CUDA_VISIBLE_DEVICES=0 uv run python trainer.py +exps=rl_finetuning

RL config example

# mimesys/conf/exps/rl_finetuning.yaml  (train section)
train:
  trainer:
    trainer_model_name: MimesysTrainer
    max_epochs: 10000
    devices: 1
    check_val_every_n_epoch: 10
    ckpt_path: /path/to/pretrained/diffusion-epoch=999.ckpt   # start from supervised ckpt
  optim:
    lr: 3e-7                        # keep low to avoid catastrophic forgetting
  use_rl: true
  prev_state_lambda: 0.0
  ddpo:
    num_inner_epochs: 1
    num_batches_per_episode: 14
    reward_type: profiling          # live benchmark reward via SSH
    io_reward_weight: 1.0
    kl_coef: 0.05                   # KL penalty to pretrained distribution
  callbacks:
    checkpoint:
      dirpath: diffusion/mimesys_pretrain
      every_n_epochs: 10
      monitor: epoch
      save_last: true
  async_validation: false

log:
  project_name: mimesys
  run_name: diffusion/mimesys_pretrain

The profiler section must be set (same SSH credentials as data collection). Remote machines run the benchmark and return the reward signal each episode.


5. Inference Server & Client

Server

uv run python -m mimesys.inference \
    --ckpt /path/to/diffusion-epoch=999.ckpt

# Custom port and experiment config
uv run python -m mimesys.inference \
    --ckpt /path/to/last.ckpt \
    --exp pretrain \
    --port 8000

# With remote profiling endpoint enabled
uv run python -m mimesys.inference \
    --ckpt /path/to/last.ckpt \
    --enable_profiling

# Choose devices
uv run python -m mimesys.inference \
    --ckpt /path/to/last.ckpt \
    --device cpu # or cuda
Flag Default Description
--ckpt required Path to .ckpt checkpoint
--exp pretrain Hydra experiment config name
--port 8000 HTTP port
--host 0.0.0.0 Bind address
--enable_profiling off Enable POST /profile (requires CloudLab SSH)
--device auto Force cuda or cpu

Client

Generate an execution plan (HDF5) from a time-series resource usage trace file. The trace file uses the HPCPerfStats format.

uv run python -m mimesys.inference.client generate-from-file \
    --file /path/to/stats-workload.txt \
    --method diffusion \
    --output execution_plan_series.h5

Using the generated h5 file, you can run a synthetic workload on a target machine:

# From the `fleetbench` directory on the target machine:

# Copy the h5 file to the machine first, then place it in the execution plans directory
cp execution_plan_series.h5 fleetbench/mimesys/execution_plans/

# Run the synthetic workload
MIMESYS_ITERS=1 MIMESYS_SLEEP=0 ACTION_PROFILING_CACHE_DIR=${HOME_PATH}/fleetbench ACTION_LIST_PATH=${HOME_PATH}/fleetbench/fleetbench/mimesys/mimesys_actions.txt TACC_STATS_DIR=${HOME_PATH}/HPCPerfStats/monitor/src sudo bazel run --config=clang --config=opt fleetbench/mimesys:mimesys_benchmark -- --benchmark_filter="BM_Mimesys"

To generate a workload and profile it in a single command:

uv run python -m mimesys.inference.client profile-from-file \
    --file /path/to/stats-workload.txt \
    --method diffusion \
    --output metrics.png

About

Mimesys: Generating Realistic Executable Testing Environments from Resource Usage Traces (OSDI 26)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors