This utility is used to trace ucx communications, which is useful for tracing communications in multi-node mpi applications.
ucTrace has been accepted to appear in the 40th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2026).
If you use ucTrace in your research, please cite our paper:
@misc{gencer2026uctrace,
title={ucTrace: A Multi-Layer Profiling Tool for UCX-driven Communication},
author={Emir Gencer and Mohammad Kefah Taha Issa and Ilyas Turimbetov and James D. Trotter and Didem Unat},
year={2026},
eprint={2602.19084},
archivePrefix={arXiv},
primaryNote={cs.DC},
doi={10.48550/arXiv.2602.19084}
}Preprint: arXiv:2602.19084
After parsing the profile logs, a sample output looks like this:
DisplayComm(direction=('ai02.kuacc.ku.edu.tr:95185', '->', 'ai02.kuacc.ku.edu.tr:95187'), iface=Iface(iface_ptr='0xe00050', addr='2:02 00 ce 03 00 00 00 00:54 8c 7b 55 6d 8a 97 01', md_name='sysv', tl_name='sysv', tl_res_count='1', resources='( 1 , memory , UCT_DEVICE_TYPE_SHM , 255),'), comm=Comm(time=Time(sec=1727867369, nsec=395628316), success=1, size=163, func='uct_ep_am_bcopy', rkey='0', extra='SNOOP_UCT_COMM_EXTRA_AMINFO:11', ep_ptr='0xed1860', name='sysv', if_ptr='0xe00050', addr='2:02 00 ce 03 00 00 00 00:54 8c 7b 55 6d 8a 97 01'), am_handler='0x7f5edd3290c0:ucp_rndv_rtr_handler:/kuacc/users/egencer20/repos/ucx_build/lib/libucp')
DisplayComm(direction=('ai02.kuacc.ku.edu.tr:95187', '->', 'ai02.kuacc.ku.edu.tr:95186'), iface=Iface(iface_ptr='0x1683970', addr='2:d3 73 01 00:54 8c 7b 55 6d 8a 97 81', md_name='cuda_ipc', tl_name='cuda_ipc', tl_res_count='1', resources='( 1 , cuda , UCT_DEVICE_TYPE_SHM , 255),'), comm=Comm(time=Time(sec=1727867369, nsec=488106788), success=1, size=40960, func='uct_ep_put_zcopy', rkey='80 8f df 00 00 00 00 00 d2 73 01 00 00 00 00 00 00 a0 00 00 00 00 00 00 00 02 00 00 00 80 02 00 00 1e 00 00 00 00 00 00 32 00 00 00 00 00 00 00 66 b3 d5 c1 ae 00 00 5c 00 00 00 00 00 00 00 00 d2 73 01 00 00 00 00 00 00 80 22 d1 7b 7f 00 00 00 a0 00 00 00 00 00 00 00 00 00 00 92 19 c7 4a 92 fb c0 82 89 03 2f ec 92 aa 4c cc 00 00 00 00', extra='SNOOP_UCT_COMM_EXTRA_ADDR:0x7f7bd1228000', ep_ptr='0x17d8c60', name='cuda_ipc', if_ptr='0x1683970', addr='2:d3 73 01 00:54 8c 7b 55 6d 8a 97 81'), am_handler='')
DisplayComm(direction=('ai02.kuacc.ku.edu.tr:95185', '->', 'ai03.kuacc.ku.edu.tr:110585'), iface=Iface(iface_ptr='0xe15380', addr='0:nil:', md_name='ib', tl_name='rc_verbs', tl_res_count='1', resources='( 1 , mlx5_0:1 , UCT_DEVICE_TYPE_NET , 5),'), comm=Comm(time=Time(sec=1727867369, nsec=511347785), success=1, size=40960, func='uct_ep_put_zcopy', rkey='6f a5 03 00 6f a5 03 00', extra='SNOOP_UCT_COMM_EXTRA_ADDR:0x7ffa6fc00000', ep_ptr='0x17fbe30', name='rc_verbs', if_ptr='0xe15380', addr='1:00 86 a5 00:00 73 00'), am_handler='')
This tracking utility has 3 components:
- A modified UCX library which can be found here.
- The preload library loaded with LD_PRELOAD, which can be found in preload/
- The main ucTrace library which is in cpp-rewrite/
- The parser that parses the output of logfiles produced by the preload library.
First build our modified ucx library. Please follow instructions found here.
Please use the provided Makefile to build ucTrace. The makefile accepts the following options:
SNOOP_VERBOSE: Enable verbose logging, default is 0.SNOOP_COLLECT_BT: Collect backtrace information, default is 0.SNOOP_NO_LOGS: Disable dumping out logs at the end, default is 0.DEBUG_SYMBOLS: Compiles with debug flags, default is 0.
Required environment variables:
SNOOP_UCX_UCX: Path to the ucTrace modified UCX source directory.UCX_INSTALL: Path to the ucTrace modified UCX install path.MPI_HOME: Path to the MPI installation for ucTrace MPI module.CUDA_PATH: Path to CUDA installation for device attribution.
See make help for more information.
Make sure your MPI supports UCX fabric.
Setup your environment variables:
# add your libsnoop.so to LD_LIBRARY_PATH
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$SNOOP_UCX_HOME"
# add the custom UCX to LD_LIBRARY_PATH such that it has priority
export LD_LIBRARY_PATH="$UCX_BUILD/lib/:$UCX_BUILD/lib/ucx:$LD_LIBRARY_PATH"
# run your mpi application
mpirun --mca pml ucx -x UCX_TLS=all -x LD_PRELOAD="$SNOOP_UCX_HOME/preload.so" -np 4 ./myappWhen your processes exit, they will create log files named hostname:pid.log.asd.
Put all logfiles that are created in a single run in a folder, and run parser/parse.py. It will print its output to standart out so you might want to pipe it to a file.
python3 parse.py my-outputs/ -p my_output.pklparse.py requires pandas library to run.
To make parsing faster, we recommend using pypy3 instead of cpython.
To visualize a generated pkl file you need to modify parser/visualizer.py, adding the path of your pickle files to the relevant list.
Run the visualizer using streamlit run visualizer.py.