Multi-GPU programming strategies using CUDA -


i need advice on project going undertake. planning run simple kernels (yet decide, hinging on embarassingly parallel ones) on multi-gpu node using cuda 4.0 following strategies listed below. intention profile node, launching kernels in different strategies cuda provide on multi-gpu environment.

  1. single host thread - multiple devices (shared context)
  2. single host thread - concurrent execution of kernels on single device (shared context)
  3. multiple host threads - (equal) multiple devices (independent contexts)
  4. single host thread - sequential kernel execution on 1 device
  5. multiple host threads - concurrent execution of kernels on 1 device (independent contexts)
  6. multiple host threads - sequential execution of kernels on 1 device (independent contexts)

am missing out categories? opinion test categories have chosen , general advice w.r.t multi-gpu programming welcome.

thanks,
sayan

edit:

i thought previous categorization involved redundancy, modified it.

most workloads light enough on cpu work can juggle multiple gpus single thread, became possible starting cuda 4.0. before cuda 4.0, call cuctxpopcurrent()/cuctxpushcurrent() change context current given thread. starting cuda 4.0, can call cudasetdevice() set current context correspond given device.

your option 1) misnomer, though, because there no "shared context" - gpu contexts still separate , device memory , objects such cuda streams , cuda events affiliated gpu context in created.


Comments

Popular posts from this blog

c# - SharpSVN - How to get the previous revision? -

c++ - Is it possible to compile a VST on linux? -

url - Querystring manipulation of email Address in PHP -