c++ - How to perform deep copying of struct with CUDA? -


programming cuda facing problem trying copy data host gpu.

i have 3 nested struct these:

typedef struct {     char data[128];     short length; } cell;  typedef struct {     cell* elements;     int height;     int width; } matrix;  typedef struct {     matrix* tables;     int count; } container; 

so container "includes" matrix elements, in turn includes cell elements.

let's suppose dynamically allocate host memory in way:

container c; c.tables = malloc(20 * sizeof(matrix));  for(int = 0;i<20;i++){     matrix m;     m.elements = malloc(100 * sizeof(cell));     c.tables[i] = m; } 

that is, container of 20 matrix of 100 cells each.

  • how copy data device memory using cudamemcpy()?
  • is there way perform deep copy of "struct of struct" host device?

thanks time.

andrea

the short answer "just don't". there 4 reasons why that:

  1. there no deep copy functionality in api
  2. the resulting code have writeto set , copy structure have described gpu ridiculously complex (about 4000 api calls @ minimum, , intermediate kernel 20 matrix of 100 cells example)
  3. the gpu code using 3 levels of pointer indirection have massively increased memory access latency , break little cache coherency available on gpu
  4. if want copy data host afterwards, have same problem in reverse

consider using linear memory , indexing instead. portable between host , gpu, , allocation , copy overhead 1% of pointer based alternative.

if really want this, leave comment , try , dig old code examples show complete folly nested pointers on gpu.


Comments

Popular posts from this blog

c++ - Is it possible to compile a VST on linux? -

java - Output of Eclipse is rubbish -

jquery - Confused with JSON data and normal data in Django ajax request -