Hello,
Thank you for providing this code repository. However, I noticed something when trying to run your DASP program (in double precision). It performs a warmup of 100 iterations, followed by 1000 iterations, which you use for timing and reporting performance.
When I run it for only 1 iteration (0 warmup, 1 iteration for timing), it produces incorrect results.
However when running >1 iterations, the results were correct. I noticed that you CudaMemset the y vector only once, before the first iteration, and not in between iterations.
Do you have any idea why this might happen? It seems that the program cannot write "all" results with just one iteration.
Thank you for your attention.
Best regards,
pmpakos
Hello,
Thank you for providing this code repository. However, I noticed something when trying to run your DASP program (in double precision). It performs a warmup of 100 iterations, followed by 1000 iterations, which you use for timing and reporting performance.
When I run it for only 1 iteration (0 warmup, 1 iteration for timing), it produces incorrect results.
However when running >1 iterations, the results were correct. I noticed that you CudaMemset the y vector only once, before the first iteration, and not in between iterations.
Do you have any idea why this might happen? It seems that the program cannot write "all" results with just one iteration.
Thank you for your attention.
Best regards,
pmpakos