- Properly determine `TILE_SIZE` (see paper) - Can we avoid "partial" kernel operations?
TILE_SIZE(see paper)