Skip to content

cdn_bench: multi-instance parser and scientific notation fix#577

Open
SamirFarhat17 wants to merge 3 commits intofacebookresearch:v2-betafrom
SamirFarhat17:export-D100658456-to-v2-beta
Open

cdn_bench: multi-instance parser and scientific notation fix#577
SamirFarhat17 wants to merge 3 commits intofacebookresearch:v2-betafrom
SamirFarhat17:export-D100658456-to-v2-beta

Conversation

@SamirFarhat17
Copy link
Copy Markdown

Summary:
Two fixes found while running multi-proxy CDN benchmarks on Gen10+/Gen11 edge hosts:

cdn_bench.py parser — multi-instance support:

  • Section headers now match with startswith() instead of exact ==, so 'Client Results (instance 0, target ...)' is correctly detected
  • Per-instance metrics stored as client_1_requests_sent, proxy_2_actual_rps etc
  • Aggregate totals accumulated across instances (client_requests_sent = sum of all instances)
  • Computed aggregate success/error rates and instance counts
  "metrics": {
    "client_instances": 0,
    "exit_code": 0,
    "protocol": "h1",
    "proxy_1_actual_rps": 4000.0,
    "proxy_1_avg_backend_latency_ms": 176.534,
    "proxy_1_avg_latency_ms": 177.0,
    "proxy_1_requests_failed": 0,
    "proxy_1_requests_received": 280356,
    "proxy_1_requests_succeeded": 280356,
    "proxy_1_retries_attempted": 0,
    "proxy_1_retries_succeeded": 0,
    "proxy_1_success_rate_pct": 100.0,
    "proxy_2_actual_rps": 4000.0,
    "proxy_2_avg_backend_latency_ms": 175.723,
    "proxy_2_avg_latency_ms": 176.0,
    "proxy_2_requests_failed": 0,
    "proxy_2_requests_received": 285273,
    "proxy_2_requests_succeeded": 285273,
    "proxy_2_retries_attempted": 0,
    "proxy_2_retries_succeeded": 0,
    "proxy_2_success_rate_pct": 100.0,
    "proxy_actual_rps": 8000.0,
    "proxy_instances": 2,
    "proxy_requests_failed": 0,
    "proxy_requests_received": 565629,
    "proxy_requests_succeeded": 565629,
    "proxy_success_rate_pct": 100.0
  },

run.sh — scientific notation handling:

  • Proxy metrics regex updated to match scientific notation (e.g. 5e+03 was parsed as just 5)
  • printf formatting for Success Rate (%.2f) and Actual RPS (%.1f) so 1e+02 displays as 100.00%
        Proxy Results (instance 0, port 8081)
          Requests Received: 276720
          Requests Succeeded: 276719
          Requests Failed: 1
          Success Rate: 100.00%
          Actual RPS: 4000.0
          Avg Total Latency ms: 180
          Avg Backend Latency ms: 180.095
          Retries Attempted: 0
          Retries Succeeded: 0

        Proxy Role Complete

        Cleaning up processes...
stderr:

Results Report:
{
  "benchmark_args": [
    "-m proxy",
    "-B 2401:db00:f01b:301d:face:0:18f:0",
    "-b 8080",
    "-P 8081",
    "-p h1"
  ],
  "benchmark_desc": "Distributed CDN benchmark. Run server, proxy, and client roles on separate hosts. Supports multiple instances of each role for scaling.\n",
  "benchmark_hooks": [
    "perf: {'perfstat': {'interval': 1}, 'mpstat': {'interval': 1}, 'netstat': {'interval': 1}}",
    "copymove: {'is_move': True, 'after': ['packages/cdn_bench/cdn_bench_run.log']}"
  ],
  "benchmark_name": "cdn_bench",
  "machines": [
    {
      "cpu_architecture": "x86_64",
      "cpu_model": "INTEL(R) XEON(R) PLATINUM 8558P",
      "hostname": "fnedge932.01.ams2.facebook.com",
      "kernel_version": "6.4.3-0_fbk15_hardened_2630_gf27365f948db",
      "mem_total_kib": "525701232 KiB",
      "num_cpus_usable": 96,
      "num_logical_cpus": "96",
      "os_distro": "centos",
      "os_release_name": "CentOS Stream 9",
      "threads_per_core": "2"
    }
  ],
  "metadata": {
    "L1d cache": "2.3 MiB (48 instances)",
    "L1i cache": "1.5 MiB (48 instances)",
    "L2 cache": "96 MiB (48 instances)",
    "L3 cache": "260 MiB (1 instance)"
  },
  "metrics": {
    "client_instances": 0,
    "exit_code": 0,
    "protocol": "h1",
    "proxy_1_actual_rps": 4000.0,
    "proxy_1_avg_backend_latency_ms": 180.095,
    "proxy_1_avg_latency_ms": 180.0,
    "proxy_1_requests_failed": 1,
    "proxy_1_requests_received": 276720,
    "proxy_1_requests_succeeded": 276719,
    "proxy_1_retries_attempted": 0,
    "proxy_1_retries_succeeded": 0,
    "proxy_1_success_rate_pct": 100.0,
    "proxy_actual_rps": 4000.0,
    "proxy_instances": 1,
    "proxy_requests_failed": 1,
    "proxy_requests_received": 276720,
    "proxy_requests_succeeded": 276719,
    "proxy_success_rate_pct": 99.99963862387973
  },

Reviewed By: YifanYuan3

Differential Revision: D100658456

…#575)

Summary:

Added health checks to cdn_bench/run.sh so it fails fast with a clear message instead of silently misbehaving:

- verify_content_server() probes each content_server with curl after startup to confirm it's actually serving, not just listening
- Backend Reachability Check runs before starting proxies — if any backend is unreachable it prints the exact curl command to debug and aborts
- Fixed IPv6 host:port parsing (was using ${entry%:*} which strips everything after the first colon, now extracts port first then strips it)

Reviewed By: YifanYuan3

Differential Revision: D100220256
…facebookresearch#576)

Summary:

Follow-up improvements to cdn_bench run.sh for operational ergonomics.

- Graceful proxy shutdown — sends SIGINT instead of SIGTERM so proxygen flushes its metrics summary before exit, with a 2s grace period before SIGKILL
- Auto-terminate for server and proxy roles when -d (duration) is passed, so long-running roles exit automatically after the client finishes (server: +20s grace, proxy: +10s grace)
- Client-side proxy reachability check — verifies all proxy targets respond (HTTP/1.1 and h2) before sending traffic, aborts with diagnostic info if unreachable
- Proxy stderr tee — proxy_server stderr is now tee'd to both terminal and file so metrics are visible during the run
- Changed metrics_interval from 0 to 5 for periodic metrics output during the run
- Minor quoting fix for IPv6 host:port variable expansion

Differential Revision: D100630922
Summary:
Two fixes found while running multi-proxy CDN benchmarks on Gen10+/Gen11 edge hosts:

**cdn_bench.py parser — multi-instance support:**
- Section headers now match with startswith() instead of exact ==, so 'Client Results (instance 0, target ...)' is correctly detected
- Per-instance metrics stored as client_1_requests_sent, proxy_2_actual_rps etc
- Aggregate totals accumulated across instances (client_requests_sent = sum of all instances)
- Computed aggregate success/error rates and instance counts
```
  "metrics": {
    "client_instances": 0,
    "exit_code": 0,
    "protocol": "h1",
    "proxy_1_actual_rps": 4000.0,
    "proxy_1_avg_backend_latency_ms": 176.534,
    "proxy_1_avg_latency_ms": 177.0,
    "proxy_1_requests_failed": 0,
    "proxy_1_requests_received": 280356,
    "proxy_1_requests_succeeded": 280356,
    "proxy_1_retries_attempted": 0,
    "proxy_1_retries_succeeded": 0,
    "proxy_1_success_rate_pct": 100.0,
    "proxy_2_actual_rps": 4000.0,
    "proxy_2_avg_backend_latency_ms": 175.723,
    "proxy_2_avg_latency_ms": 176.0,
    "proxy_2_requests_failed": 0,
    "proxy_2_requests_received": 285273,
    "proxy_2_requests_succeeded": 285273,
    "proxy_2_retries_attempted": 0,
    "proxy_2_retries_succeeded": 0,
    "proxy_2_success_rate_pct": 100.0,
    "proxy_actual_rps": 8000.0,
    "proxy_instances": 2,
    "proxy_requests_failed": 0,
    "proxy_requests_received": 565629,
    "proxy_requests_succeeded": 565629,
    "proxy_success_rate_pct": 100.0
  },
```
**run.sh — scientific notation handling:**
- Proxy metrics regex updated to match scientific notation (e.g. 5e+03 was parsed as just 5)
- printf formatting for Success Rate (%.2f) and Actual RPS (%.1f) so 1e+02 displays as 100.00%
```
        Proxy Results (instance 0, port 8081)
          Requests Received: 276720
          Requests Succeeded: 276719
          Requests Failed: 1
          Success Rate: 100.00%
          Actual RPS: 4000.0
          Avg Total Latency ms: 180
          Avg Backend Latency ms: 180.095
          Retries Attempted: 0
          Retries Succeeded: 0

        Proxy Role Complete

        Cleaning up processes...
stderr:

Results Report:
{
  "benchmark_args": [
    "-m proxy",
    "-B 2401:db00:f01b:301d:face:0:18f:0",
    "-b 8080",
    "-P 8081",
    "-p h1"
  ],
  "benchmark_desc": "Distributed CDN benchmark. Run server, proxy, and client roles on separate hosts. Supports multiple instances of each role for scaling.\n",
  "benchmark_hooks": [
    "perf: {'perfstat': {'interval': 1}, 'mpstat': {'interval': 1}, 'netstat': {'interval': 1}}",
    "copymove: {'is_move': True, 'after': ['packages/cdn_bench/cdn_bench_run.log']}"
  ],
  "benchmark_name": "cdn_bench",
  "machines": [
    {
      "cpu_architecture": "x86_64",
      "cpu_model": "INTEL(R) XEON(R) PLATINUM 8558P",
      "hostname": "fnedge932.01.ams2.facebook.com",
      "kernel_version": "6.4.3-0_fbk15_hardened_2630_gf27365f948db",
      "mem_total_kib": "525701232 KiB",
      "num_cpus_usable": 96,
      "num_logical_cpus": "96",
      "os_distro": "centos",
      "os_release_name": "CentOS Stream 9",
      "threads_per_core": "2"
    }
  ],
  "metadata": {
    "L1d cache": "2.3 MiB (48 instances)",
    "L1i cache": "1.5 MiB (48 instances)",
    "L2 cache": "96 MiB (48 instances)",
    "L3 cache": "260 MiB (1 instance)"
  },
  "metrics": {
    "client_instances": 0,
    "exit_code": 0,
    "protocol": "h1",
    "proxy_1_actual_rps": 4000.0,
    "proxy_1_avg_backend_latency_ms": 180.095,
    "proxy_1_avg_latency_ms": 180.0,
    "proxy_1_requests_failed": 1,
    "proxy_1_requests_received": 276720,
    "proxy_1_requests_succeeded": 276719,
    "proxy_1_retries_attempted": 0,
    "proxy_1_retries_succeeded": 0,
    "proxy_1_success_rate_pct": 100.0,
    "proxy_actual_rps": 4000.0,
    "proxy_instances": 1,
    "proxy_requests_failed": 1,
    "proxy_requests_received": 276720,
    "proxy_requests_succeeded": 276719,
    "proxy_success_rate_pct": 99.99963862387973
  },
```

Reviewed By: YifanYuan3

Differential Revision: D100658456
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 13, 2026
@meta-codesync
Copy link
Copy Markdown

meta-codesync bot commented Apr 13, 2026

@SamirFarhat17 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D100658456.

meta-codesync bot pushed a commit that referenced this pull request Apr 13, 2026
Summary:
Pull Request resolved: #577

Two fixes found while running multi-proxy CDN benchmarks on Gen10+/Gen11 edge hosts:

**cdn_bench.py parser — multi-instance support:**
- Section headers now match with startswith() instead of exact ==, so 'Client Results (instance 0, target ...)' is correctly detected
- Per-instance metrics stored as client_1_requests_sent, proxy_2_actual_rps etc
- Aggregate totals accumulated across instances (client_requests_sent = sum of all instances)
- Computed aggregate success/error rates and instance counts
```
  "metrics": {
    "client_instances": 0,
    "exit_code": 0,
    "protocol": "h1",
    "proxy_1_actual_rps": 4000.0,
    "proxy_1_avg_backend_latency_ms": 176.534,
    "proxy_1_avg_latency_ms": 177.0,
    "proxy_1_requests_failed": 0,
    "proxy_1_requests_received": 280356,
    "proxy_1_requests_succeeded": 280356,
    "proxy_1_retries_attempted": 0,
    "proxy_1_retries_succeeded": 0,
    "proxy_1_success_rate_pct": 100.0,
    "proxy_2_actual_rps": 4000.0,
    "proxy_2_avg_backend_latency_ms": 175.723,
    "proxy_2_avg_latency_ms": 176.0,
    "proxy_2_requests_failed": 0,
    "proxy_2_requests_received": 285273,
    "proxy_2_requests_succeeded": 285273,
    "proxy_2_retries_attempted": 0,
    "proxy_2_retries_succeeded": 0,
    "proxy_2_success_rate_pct": 100.0,
    "proxy_actual_rps": 8000.0,
    "proxy_instances": 2,
    "proxy_requests_failed": 0,
    "proxy_requests_received": 565629,
    "proxy_requests_succeeded": 565629,
    "proxy_success_rate_pct": 100.0
  },
```
**run.sh — scientific notation handling:**
- Proxy metrics regex updated to match scientific notation (e.g. 5e+03 was parsed as just 5)
- printf formatting for Success Rate (%.2f) and Actual RPS (%.1f) so 1e+02 displays as 100.00%
```
        Proxy Results (instance 0, port 8081)
          Requests Received: 276720
          Requests Succeeded: 276719
          Requests Failed: 1
          Success Rate: 100.00%
          Actual RPS: 4000.0
          Avg Total Latency ms: 180
          Avg Backend Latency ms: 180.095
          Retries Attempted: 0
          Retries Succeeded: 0

        Proxy Role Complete

        Cleaning up processes...
stderr:

Results Report:
{
  "benchmark_args": [
    "-m proxy",
    "-B 2401:db00:f01b:301d:face:0:18f:0",
    "-b 8080",
    "-P 8081",
    "-p h1"
  ],
  "benchmark_desc": "Distributed CDN benchmark. Run server, proxy, and client roles on separate hosts. Supports multiple instances of each role for scaling.\n",
  "benchmark_hooks": [
    "perf: {'perfstat': {'interval': 1}, 'mpstat': {'interval': 1}, 'netstat': {'interval': 1}}",
    "copymove: {'is_move': True, 'after': ['packages/cdn_bench/cdn_bench_run.log']}"
  ],
  "benchmark_name": "cdn_bench",
  "machines": [
    {
      "cpu_architecture": "x86_64",
      "cpu_model": "INTEL(R) XEON(R) PLATINUM 8558P",
      "hostname": "fnedge932.01.ams2.facebook.com",
      "kernel_version": "6.4.3-0_fbk15_hardened_2630_gf27365f948db",
      "mem_total_kib": "525701232 KiB",
      "num_cpus_usable": 96,
      "num_logical_cpus": "96",
      "os_distro": "centos",
      "os_release_name": "CentOS Stream 9",
      "threads_per_core": "2"
    }
  ],
  "metadata": {
    "L1d cache": "2.3 MiB (48 instances)",
    "L1i cache": "1.5 MiB (48 instances)",
    "L2 cache": "96 MiB (48 instances)",
    "L3 cache": "260 MiB (1 instance)"
  },
  "metrics": {
    "client_instances": 0,
    "exit_code": 0,
    "protocol": "h1",
    "proxy_1_actual_rps": 4000.0,
    "proxy_1_avg_backend_latency_ms": 180.095,
    "proxy_1_avg_latency_ms": 180.0,
    "proxy_1_requests_failed": 1,
    "proxy_1_requests_received": 276720,
    "proxy_1_requests_succeeded": 276719,
    "proxy_1_retries_attempted": 0,
    "proxy_1_retries_succeeded": 0,
    "proxy_1_success_rate_pct": 100.0,
    "proxy_actual_rps": 4000.0,
    "proxy_instances": 1,
    "proxy_requests_failed": 1,
    "proxy_requests_received": 276720,
    "proxy_requests_succeeded": 276719,
    "proxy_success_rate_pct": 99.99963862387973
  },
```

Reviewed By: YifanYuan3

Differential Revision: D100658456

fbshipit-source-id: 466c3b3392f7f8e03dab8095ea03739db04a6828
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant