Skip to content

Statistics reporting tool: ais-stats#189

Open
zbyrne wants to merge 6 commits intodevelopfrom
zbyrne/ais-stats
Open

Statistics reporting tool: ais-stats#189
zbyrne wants to merge 6 commits intodevelopfrom
zbyrne/ais-stats

Conversation

@zbyrne
Copy link
Collaborator

@zbyrne zbyrne commented Feb 18, 2026

Motivation

AIHIPFILE-53 Infinity Storage as a storage solution must have sufficient performance counters & statistics for proper debugging, system component workflow understanding, and performance troubleshooting.

Technical Details

Counters are stored in a shared-memory-backed mmapped buffer as an array of atomic_uint64_ts. The library uses unix sockets and control messages to communicate the fd for the buffer with the reporting app.

Test Plan

Unit tests were added for some simple behaviors. Additionally, the example program aiscp was launched by ais-stats and IO stats were reported.

Test Result

$ ctest . -R 'HipFileStats*'
Test project /home/AMD/zbyrne/git/hipFile/build
Start 412: HipFileStats.statsAddFastPathRead
1/6 Test #412: HipFileStats.statsAddFastPathRead ........ Passed 0.02 sec
Start 413: HipFileStats.statsAddFastPathWrite
2/6 Test #413: HipFileStats.statsAddFastPathWrite ....... Passed 0.02 sec
Start 414: HipFileStats.statsAddFallbackPathRead
3/6 Test #414: HipFileStats.statsAddFallbackPathRead .... Passed 0.02 sec
Start 415: HipFileStats.statsAddFallbackPathWrite
4/6 Test #415: HipFileStats.statsAddFallbackPathWrite ... Passed 0.02 sec
Start 416: HipFileStats.StatsServerLifetime
5/6 Test #416: HipFileStats.StatsServerLifetime ......... Passed 0.02 sec
Start 417: HipFileStats.GenerateReportV1
6/6 Test #417: HipFileStats.GenerateReportV1 ............ Passed 0.02 sec

100% tests passed, 0 tests failed out of 6

$ build/tools/ais-stats/ais-stats build/examples/aiscp/aiscp README.md /home/AMD/zbyrne/README.md
Total fast path reads (B): 0
Total fast path writes (B):0
Total fallback path reads (B): 1729
Total fallback path writes (B): 524288

Submission Checklist

@zbyrne zbyrne force-pushed the zbyrne/ais-stats branch 3 times, most recently from d84ac81 to ba02048 Compare February 19, 2026 16:15
@gaoikawa gaoikawa requested a review from Copilot February 19, 2026 16:30
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds a statistics reporting tool (ais-stats) to collect and display I/O performance metrics for the hipFile library. The implementation uses shared memory and Unix domain sockets to communicate statistics between the monitored process and the reporting tool.

Changes:

  • Added ais-stats command-line tool to launch programs and collect their I/O statistics
  • Implemented StatsServer class that maintains statistics in shared memory and serves them via Unix sockets
  • Integrated statistics tracking into fastpath and fallback I/O backends
  • Added system call wrappers for memfd_create and ftruncate to support shared memory functionality

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 19 comments.

Show a summary per file
File Description
tools/ais-stats/ais-stats.cpp Command-line tool that launches or attaches to processes to collect statistics
tools/ais-stats/CMakeLists.txt Build configuration for the ais-stats tool
src/amd_detail/stats.h Statistics data structures and API definitions
src/amd_detail/stats.cpp Core statistics implementation with shared memory and socket communication
src/amd_detail/sys.h Added system call wrappers for memfd_create and ftruncate
src/amd_detail/sys.cpp Implementation of new system call wrappers
src/amd_detail/backend/fastpath.cpp Integrated statistics tracking for fast path I/O operations
src/amd_detail/backend/fallback.cpp Integrated statistics tracking for fallback path I/O operations
src/amd_detail/CMakeLists.txt Added stats.cpp to build
test/amd_detail/stats.cpp Unit tests for statistics functionality
test/amd_detail/msys.h Added mock methods for new system calls
test/amd_detail/CMakeLists.txt Added stats test file to build
util/format-source.sh Added tools directory to code formatting
CMakeLists.txt Added option and build support for tools

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

using Array = std::array<std::atomic_uint64_t, static_cast<std::size_t>(StatsCounters::Capacity)>;
const uint64_t version{1};
Array counters;
};
Copy link
Collaborator

@kurtmcmillan kurtmcmillan Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to have the counters be fields of the stats struct rather than being elements in an array?

eg.

Version 1:

struct stats {
  const uint64_t version{1};
  // Version 1 Fields
  uint64_t fastpath_read_bytes;
  uint64_t fastpath_write_bytes;
}

Version 2:

struct stats {
  const uint64_t version{2};
  // Version 1 Fields
  uint64_t fastpath_read_bytes;
  uint64_t fastpath_write_bytes;
  // Version 2 Fields
  uint64_t fastpath_read_err_bytes;
  uint64_t fastpath_read_count;
  uint64_t fastpath_read_err_count;
  ...
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had them like that originally, I changed it to this, thinking it would make more sense to version the index enum

Copy link
Collaborator

@sbates130272 sbates130272 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zbyrne some minor comments. Looks good to me!

throw std::runtime_error("Invalid IoType");
}

switch (type) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zbyrne do we understand any performance implications of adding this call on the fastpath? Are there any changes for performance regression?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbates130272 I haven't looked too deeply into the perf cost. Once the pr is closer to done I'll do an A/B with fio. If the call overhead is noticeable I can inline and streamline a bit, if I start seeing contention on the atomic adds with lots of jobs... I dunno. I'm thinking we should add a config value to enable or disable collection anyway, IIRC gds-stats has 3 config levels with increasing levels of detail collected.

The Stats struct will eventually be stored in a shared memory backed,
mmapped file that can accessed from another process (ais-stats) to
generate a report.
The cmsg code is based on examples from the man page.
@zbyrne zbyrne force-pushed the zbyrne/ais-stats branch 4 times, most recently from 11f8d91 to 820a521 Compare February 26, 2026 23:52
More example cmsg code and a very simple report forthe tracked
counters.
ais-stats uses StatsClient to connect to a process running a
StatsServer either by launching the process or taking the PID of a
running process from the command line.
@zbyrne zbyrne marked this pull request as ready for review February 27, 2026 17:42
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we able to move MStateServer into a header file so that other units can mock out the stats functions in unit tests?

Comment on lines +45 to +48
void statsAddFastPathRead(uint64_t bytes);
void statsAddFastPathWrite(uint64_t bytes);
void statsAddFallbackPathRead(uint64_t bytes);
void statsAddFallbackPathWrite(uint64_t bytes);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GTest doesn't work on free functions for mocking/patching. Are we able to move them into the Stats struct?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On further consideration, this might require a bit of a redesign. Fallback & Fastpath would both need to have a StatsServer instance assuming these functions get placed into it.

Having getStats() patched to return null might be good enough to at least limit how much stats code gets executed in unit tests (at least for how the functions are written today).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm currently using our Context<> pattern as a lazy singleton. A single StatsServer could probably be part of the DriverState

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I might have had a wrong understanding with how the IO Backends were needing to access the StatsServer.

I think that might work, something like Context<DriverState>::get()->statsAddFastPath(IOType)? That would allow the function to keep the check to see if stat collection was enabled without needing to do that check in io(). Based on the current stats implementation, we wouldn't need to obtain the DriverState lock?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants