Multi-GPU programming strategies using CUDA -
i need advice on project going undertake. planning run simple kernels (yet decide, hinging on embarassingly parallel ones) on multi-gpu node using cuda 4.0 following strategies listed below. intention profile node, launching kernels in different strategies cuda provide on multi-gpu environment.
- single host thread - multiple devices (shared context)
- single host thread - concurrent execution of kernels on single device (shared context)
- multiple host threads - (equal) multiple devices (independent contexts)
- single host thread - sequential kernel execution on 1 device
- multiple host threads - concurrent execution of kernels on 1 device (independent contexts)
- multiple host threads - sequential execution of kernels on 1 device (independent contexts)
am missing out categories? opinion test categories have chosen , general advice w.r.t multi-gpu programming welcome.
thanks,
sayan
edit:
i thought previous categorization involved redundancy, modified it.
most workloads light enough on cpu work can juggle multiple gpus single thread, became possible starting cuda 4.0. before cuda 4.0, call cuctxpopcurrent()/cuctxpushcurrent() change context current given thread. starting cuda 4.0, can call cudasetdevice() set current context correspond given device.
your option 1) misnomer, though, because there no "shared context" - gpu contexts still separate , device memory , objects such cuda streams , cuda events affiliated gpu context in created.
Comments
Post a Comment