Heterogeneous Task Scheduler (HTS)

High-performance C++ framework for scheduling and executing task DAGs across CPU and GPU devices.

📋 Table of Contents

Features
Quick Start
Installation
Documentation
Architecture
Examples
Performance
Contributing
License

✨ Features

Why Choose HTS?

Feature	Benefit
🚀 Blazing Fast	Zero-overhead abstractions, lock-free data structures, 50-100x faster GPU memory allocation
🔄 DAG Execution	Automatic cycle detection, topological sorting, dependency tracking
🎯 Smart Scheduling	Pluggable policies: GPU-first, CPU-first, round-robin, load-based
💾 Memory Pool	Buddy system allocator eliminates cudaMalloc/cudaFree overhead
📊 Performance Insights	Built-in profiler with Chrome tracing export and parallelism metrics
🛡️ Production Ready	Retry policies, failure propagation, graceful degradation

Quick Example

#include <hts/heterogeneous_task_scheduler.hpp>

using namespace hts;

int main() {
    TaskGraph graph;
    TaskBuilder builder(graph);
    
    // Create CPU task
    auto cpu_task = builder
        .create_task("Preprocess")
        .device(DeviceType::CPU)
        .cpu_func([](TaskContext& ctx) {
            std::cout << "Preprocessing on CPU..." << std::endl;
        })
        .build();
    
    // Create GPU task
    auto gpu_task = builder
        .create_task("Compute")
        .device(DeviceType::GPU)
        .gpu_func([](TaskContext& ctx, cudaStream_t stream) {
            my_kernel<<<256, 128, 0, stream>>>(data);
            cudaStreamSynchronize(stream);
        })
        .build();
    
    // Set dependency
    graph.add_dependency(cpu_task->id(), gpu_task->id());
    
    // Execute
    Scheduler scheduler;
    // Note: Tasks are accessed via scheduler.graph() or use the graph directly
    scheduler.execute();
    
    return 0;
}

🚀 Quick Start

Build from Source

# Clone repository
git clone https://github.com/LessUp/heterogeneous-task-scheduler.git
cd heterogeneous-task-scheduler

# Build (using scripts)
scripts/build.sh --cpu-only  # or scripts/build.sh for CUDA support

# Run tests
scripts/test.sh

Use in Your Project

With CMake FetchContent:

include(FetchContent)
FetchContent_Declare(
    hts
    GIT_REPOSITORY https://github.com/LessUp/heterogeneous-task-scheduler.git
    GIT_TAG        v1.2.0
)
FetchContent_MakeAvailable(hts)

target_link_libraries(your_target PRIVATE hts_lib)

📥 Installation

Requirements

Requirement	Version	Notes
CMake	>= 3.18	Build system
CUDA Toolkit	>= 11.0	GPU support (optional)
C++ Compiler	C++17	GCC 8+, Clang 7+, MSVC 2019+
GPU	Compute Capability 5.0+	For GPU tasks

Platform-Specific Instructions

Ubuntu/Debian:

sudo apt-get install build-essential cmake git
# Install CUDA from https://developer.nvidia.com/cuda-downloads

macOS:

brew install cmake git
# Note: GPU features not supported on macOS

Windows:

Install Visual Studio 2019+ with C++ support
Install CUDA Toolkit
Install CMake

See Installation Guide for detailed instructions.

📖 Documentation

🌐 Complete Website

📚 Full documentation is available at GitHub Pages

The website includes:

📖 Getting Started Guides - Installation, quickstart, architecture
📘 API Reference - Complete Scheduler, TaskGraph, TaskBuilder documentation
💡 Examples - Working code examples from simple to complex
📊 Performance Guides - Profiling and optimization tips
🛡️ Error Handling - Retry policies, fallbacks, best practices

Key Pages

Topic	Link
Installation Guide	Website →
Quick Start Tutorial	Website →
Architecture Overview	Website →
Scheduler API	Website →
TaskGraph API	Website →
Examples	Website →
Changelog	Website →
Contributing Guide	Website →

Specifications

Technical design documents and product requirements are in the /specs directory:

Resource	Link
Product Requirements	specs/product/
Architecture RFC	specs/rfc/
Test Specifications	specs/testing/

🎯 Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        User Application                          │
├─────────────────────────────────────────────────────────────────┤
│                      TaskGraph Builder API                       │
│    TaskBuilder │ TaskGroup │ TaskBarrier │ TaskFuture            │
├─────────────────────────────────────────────────────────────────┤
│                          Scheduler                               │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐              │
│  │ Dependency  │  │  Scheduling │  │  Profiler   │              │
│  │  Manager    │  │   Policy    │  │  & Logger   │              │
│  └─────────────┘  └─────────────┘  └─────────────┘              │
├─────────────────────────────────────────────────────────────────┤
│                      Execution Engine                            │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐              │
│  │ CPU Thread  │  │ GPU Stream  │  │  Resource   │              │
│  │   Pool      │  │  Manager    │  │   Limiter   │              │
│  └─────────────┘  └─────────────┘  └─────────────┘              │
├─────────────────────────────────────────────────────────────────┤
│                       Memory Pool                                │
│            Buddy System Allocator (GPU Memory)                   │
└─────────────────────────────────────────────────────────────────┘

Key Components

TaskGraph: DAG representation with cycle detection
Scheduler: Central coordinator with pluggable policies
Execution Engine: Thread pool + CUDA streams
Memory Pool: Efficient GPU memory management
Profiler: Performance monitoring and reporting

💡 Examples

CPU + GPU Pipeline

#include <hts/heterogeneous_task_scheduler.hpp>

using namespace hts;

int main() {
    TaskGraph graph;
    TaskBuilder builder(graph);
    
    // CPU preprocessing
    auto preprocess = builder
        .create_task("Preprocess")
        .device(DeviceType::CPU)
        .cpu_func([](TaskContext& ctx) {
            std::cout << "Preprocessing data..." << std::endl;
        })
        .build();
    
    // GPU computation
    auto compute = builder
        .create_task("GPUCompute")
        .device(DeviceType::GPU)
        .gpu_func([](TaskContext& ctx, cudaStream_t stream) {
            my_kernel<<<256, 128, 0, stream>>>(data);
            cudaStreamSynchronize(stream);
        })
        .priority(10)
        .build();
    
    // CPU postprocessing
    auto postprocess = builder
        .create_task("Postprocess")
        .device(DeviceType::CPU)
        .cpu_func([](TaskContext& ctx) {
            std::cout << "Postprocessing results..." << std::endl;
        })
        .build();
    
    // Set dependencies: preprocess → compute → postprocess
    graph.add_dependency(preprocess->id(), compute->id());
    graph.add_dependency(compute->id(), postprocess->id());
    
    // Execute
    Scheduler scheduler;
    scheduler.set_policy(std::make_unique<GPUPriorityPolicy>());
    scheduler.execute();
    
    return 0;
}

With Retry Policy

// GPU task with automatic retry on failure
auto unreliable_task = builder
    .create_task("RiskyGPUTask")
    .device(DeviceType::GPU)
    .gpu_func(risky_kernel)
    .retry_policy(RetryPolicy{
        .max_retries = 3,
        .backoff_ms = 100,
        .backoff_multiplier = 2.0f
    })
    .fallback([](TaskContext& ctx) {
        std::cout << "GPU failed, using CPU fallback" << std::endl;
        cpu_fallback(ctx);
    })
    .build();

More examples:

Simple DAG - Basic pipeline
Pipeline - Complex ML pipeline with error handling
examples/ directory for complete examples

📊 Performance

Memory Allocation

Operation	cudaMalloc	HTS Memory Pool	Speedup
Allocate 1 MB	~50 μs	~1 μs	50x
Free 1 MB	~25 μs	~1 μs	25x

Scheduling Overhead

Operation	Latency
Add task	~50 ns
Add dependency	~30 ns
Schedule task	~100 ns

Typical Workloads

Workload	CPU-only	HTS (CPU+GPU)	Speedup
Image Processing	1.0x	3.5x	3.5x
ML Inference	1.0x	8.2x	8.2x
Data Pipeline	1.0x	2.1x	2.1x

See docs/en/profiling.md for profiling guide.

🛣️ Roadmap

Current (v1.2.0)

✅ Bilingual documentation (English/Chinese)
✅ Comprehensive API documentation
✅ Professional changelog structure

Planned (v1.3.0)

🔄 Multi-GPU support
🔄 Distributed execution
🔄 Python bindings

Future (v2.0.0)

📋 WebAssembly support
📋 Cloud-native scheduling
📋 Auto-tuning policies

🤝 Contributing

We welcome contributions! See our Contributing Guide for details.

Quick Start for Contributors

# 1. Fork and clone
git clone https://github.com/YOUR_USERNAME/heterogeneous-task-scheduler.git
cd heterogeneous-task-scheduler

# 2. Create feature branch
git checkout -b feature/amazing-feature

# 3. Build and test
scripts/build.sh --cpu-only
scripts/test.sh

# 4. Format code
scripts/format.sh

# 5. Commit and push
git commit -m "feat: add amazing feature"
git push origin feature/amazing-feature

# 6. Open Pull Request

Contribution Types

🐛 Bug Fixes - Always welcome!
📝 Documentation - Guides, examples, API docs
✨ New Features - Please discuss in Issues first
🎨 Code Quality - Refactoring, style improvements
🧪 Tests - Increase coverage, add edge cases
💡 Examples - Real-world use cases

📄 License

This project is licensed under the MIT License - see LICENSE file for details.

🔗 Resources

📚 Documentation: GitHub Pages
🐛 Issues: GitHub Issues
💬 Discussions: GitHub Discussions

HTS — High-performance heterogeneous computing made simple.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.github		.github
changelog		changelog
docs		docs
examples		examples
include/hts		include/hts
scripts		scripts
specs		specs
src		src
tests		tests
website		website
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.editorconfig		.editorconfig
.gitignore		.gitignore
.markdownlint.json		.markdownlint.json
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CMakeLists.txt		CMakeLists.txt
CMakePresets.json		CMakePresets.json
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md
SECURITY.md		SECURITY.md

Folders and files

Latest commit

History

Repository files navigation

Heterogeneous Task Scheduler (HTS)

📋 Table of Contents

✨ Features

Why Choose HTS?

Quick Example

🚀 Quick Start

Build from Source

Use in Your Project

📥 Installation

Requirements

Platform-Specific Instructions

📖 Documentation

🌐 Complete Website

Key Pages

Specifications

🎯 Architecture

Key Components

💡 Examples

CPU + GPU Pipeline

With Retry Policy

📊 Performance

Memory Allocation

Scheduling Overhead

Typical Workloads

🛣️ Roadmap

Current (v1.2.0)

Planned (v1.3.0)

Future (v2.0.0)

🤝 Contributing

Quick Start for Contributors

Contribution Types

📄 License

🔗 Resources

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages