High-performance C++ framework for scheduling and executing task DAGs across CPU and GPU devices.
- Features
- Quick Start
- Installation
- Documentation
- Architecture
- Examples
- Performance
- Contributing
- License
| Feature | Benefit |
|---|---|
| 🚀 Blazing Fast | Zero-overhead abstractions, lock-free data structures, 50-100x faster GPU memory allocation |
| 🔄 DAG Execution | Automatic cycle detection, topological sorting, dependency tracking |
| 🎯 Smart Scheduling | Pluggable policies: GPU-first, CPU-first, round-robin, load-based |
| 💾 Memory Pool | Buddy system allocator eliminates cudaMalloc/cudaFree overhead |
| 📊 Performance Insights | Built-in profiler with Chrome tracing export and parallelism metrics |
| 🛡️ Production Ready | Retry policies, failure propagation, graceful degradation |
#include <hts/heterogeneous_task_scheduler.hpp>
using namespace hts;
int main() {
TaskGraph graph;
TaskBuilder builder(graph);
// Create CPU task
auto cpu_task = builder
.create_task("Preprocess")
.device(DeviceType::CPU)
.cpu_func([](TaskContext& ctx) {
std::cout << "Preprocessing on CPU..." << std::endl;
})
.build();
// Create GPU task
auto gpu_task = builder
.create_task("Compute")
.device(DeviceType::GPU)
.gpu_func([](TaskContext& ctx, cudaStream_t stream) {
my_kernel<<<256, 128, 0, stream>>>(data);
cudaStreamSynchronize(stream);
})
.build();
// Set dependency
graph.add_dependency(cpu_task->id(), gpu_task->id());
// Execute
Scheduler scheduler;
// Note: Tasks are accessed via scheduler.graph() or use the graph directly
scheduler.execute();
return 0;
}# Clone repository
git clone https://github.com/LessUp/heterogeneous-task-scheduler.git
cd heterogeneous-task-scheduler
# Build (using scripts)
scripts/build.sh --cpu-only # or scripts/build.sh for CUDA support
# Run tests
scripts/test.shWith CMake FetchContent:
include(FetchContent)
FetchContent_Declare(
hts
GIT_REPOSITORY https://github.com/LessUp/heterogeneous-task-scheduler.git
GIT_TAG v1.2.0
)
FetchContent_MakeAvailable(hts)
target_link_libraries(your_target PRIVATE hts_lib)| Requirement | Version | Notes |
|---|---|---|
| CMake | >= 3.18 | Build system |
| CUDA Toolkit | >= 11.0 | GPU support (optional) |
| C++ Compiler | C++17 | GCC 8+, Clang 7+, MSVC 2019+ |
| GPU | Compute Capability 5.0+ | For GPU tasks |
Ubuntu/Debian:
sudo apt-get install build-essential cmake git
# Install CUDA from https://developer.nvidia.com/cuda-downloadsmacOS:
brew install cmake git
# Note: GPU features not supported on macOSWindows:
- Install Visual Studio 2019+ with C++ support
- Install CUDA Toolkit
- Install CMake
See Installation Guide for detailed instructions.
📚 Full documentation is available at GitHub Pages
The website includes:
- 📖 Getting Started Guides - Installation, quickstart, architecture
- 📘 API Reference - Complete Scheduler, TaskGraph, TaskBuilder documentation
- 💡 Examples - Working code examples from simple to complex
- 📊 Performance Guides - Profiling and optimization tips
- 🛡️ Error Handling - Retry policies, fallbacks, best practices
| Topic | Link |
|---|---|
| Installation Guide | Website → |
| Quick Start Tutorial | Website → |
| Architecture Overview | Website → |
| Scheduler API | Website → |
| TaskGraph API | Website → |
| Examples | Website → |
| Changelog | Website → |
| Contributing Guide | Website → |
Technical design documents and product requirements are in the /specs directory:
| Resource | Link |
|---|---|
| Product Requirements | specs/product/ |
| Architecture RFC | specs/rfc/ |
| Test Specifications | specs/testing/ |
┌─────────────────────────────────────────────────────────────────┐
│ User Application │
├─────────────────────────────────────────────────────────────────┤
│ TaskGraph Builder API │
│ TaskBuilder │ TaskGroup │ TaskBarrier │ TaskFuture │
├─────────────────────────────────────────────────────────────────┤
│ Scheduler │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Dependency │ │ Scheduling │ │ Profiler │ │
│ │ Manager │ │ Policy │ │ & Logger │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
├─────────────────────────────────────────────────────────────────┤
│ Execution Engine │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ CPU Thread │ │ GPU Stream │ │ Resource │ │
│ │ Pool │ │ Manager │ │ Limiter │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
├─────────────────────────────────────────────────────────────────┤
│ Memory Pool │
│ Buddy System Allocator (GPU Memory) │
└─────────────────────────────────────────────────────────────────┘
- TaskGraph: DAG representation with cycle detection
- Scheduler: Central coordinator with pluggable policies
- Execution Engine: Thread pool + CUDA streams
- Memory Pool: Efficient GPU memory management
- Profiler: Performance monitoring and reporting
#include <hts/heterogeneous_task_scheduler.hpp>
using namespace hts;
int main() {
TaskGraph graph;
TaskBuilder builder(graph);
// CPU preprocessing
auto preprocess = builder
.create_task("Preprocess")
.device(DeviceType::CPU)
.cpu_func([](TaskContext& ctx) {
std::cout << "Preprocessing data..." << std::endl;
})
.build();
// GPU computation
auto compute = builder
.create_task("GPUCompute")
.device(DeviceType::GPU)
.gpu_func([](TaskContext& ctx, cudaStream_t stream) {
my_kernel<<<256, 128, 0, stream>>>(data);
cudaStreamSynchronize(stream);
})
.priority(10)
.build();
// CPU postprocessing
auto postprocess = builder
.create_task("Postprocess")
.device(DeviceType::CPU)
.cpu_func([](TaskContext& ctx) {
std::cout << "Postprocessing results..." << std::endl;
})
.build();
// Set dependencies: preprocess → compute → postprocess
graph.add_dependency(preprocess->id(), compute->id());
graph.add_dependency(compute->id(), postprocess->id());
// Execute
Scheduler scheduler;
scheduler.set_policy(std::make_unique<GPUPriorityPolicy>());
scheduler.execute();
return 0;
}// GPU task with automatic retry on failure
auto unreliable_task = builder
.create_task("RiskyGPUTask")
.device(DeviceType::GPU)
.gpu_func(risky_kernel)
.retry_policy(RetryPolicy{
.max_retries = 3,
.backoff_ms = 100,
.backoff_multiplier = 2.0f
})
.fallback([](TaskContext& ctx) {
std::cout << "GPU failed, using CPU fallback" << std::endl;
cpu_fallback(ctx);
})
.build();More examples:
- Simple DAG - Basic pipeline
- Pipeline - Complex ML pipeline with error handling
- examples/ directory for complete examples
| Operation | cudaMalloc | HTS Memory Pool | Speedup |
|---|---|---|---|
| Allocate 1 MB | ~50 μs | ~1 μs | 50x |
| Free 1 MB | ~25 μs | ~1 μs | 25x |
| Operation | Latency |
|---|---|
| Add task | ~50 ns |
| Add dependency | ~30 ns |
| Schedule task | ~100 ns |
| Workload | CPU-only | HTS (CPU+GPU) | Speedup |
|---|---|---|---|
| Image Processing | 1.0x | 3.5x | 3.5x |
| ML Inference | 1.0x | 8.2x | 8.2x |
| Data Pipeline | 1.0x | 2.1x | 2.1x |
See docs/en/profiling.md for profiling guide.
- ✅ Bilingual documentation (English/Chinese)
- ✅ Comprehensive API documentation
- ✅ Professional changelog structure
- 🔄 Multi-GPU support
- 🔄 Distributed execution
- 🔄 Python bindings
- 📋 WebAssembly support
- 📋 Cloud-native scheduling
- 📋 Auto-tuning policies
We welcome contributions! See our Contributing Guide for details.
# 1. Fork and clone
git clone https://github.com/YOUR_USERNAME/heterogeneous-task-scheduler.git
cd heterogeneous-task-scheduler
# 2. Create feature branch
git checkout -b feature/amazing-feature
# 3. Build and test
scripts/build.sh --cpu-only
scripts/test.sh
# 4. Format code
scripts/format.sh
# 5. Commit and push
git commit -m "feat: add amazing feature"
git push origin feature/amazing-feature
# 6. Open Pull Request- 🐛 Bug Fixes - Always welcome!
- 📝 Documentation - Guides, examples, API docs
- ✨ New Features - Please discuss in Issues first
- 🎨 Code Quality - Refactoring, style improvements
- 🧪 Tests - Increase coverage, add edge cases
- 💡 Examples - Real-world use cases
This project is licensed under the MIT License - see LICENSE file for details.
- 📚 Documentation: GitHub Pages
- 🐛 Issues: GitHub Issues
- 💬 Discussions: GitHub Discussions
HTS — High-performance heterogeneous computing made simple.