c++ - How to perform deep copying of struct with CUDA? -
programming cuda facing problem trying copy data host gpu.
i have 3 nested struct these:
typedef struct { char data[128]; short length; } cell; typedef struct { cell* elements; int height; int width; } matrix; typedef struct { matrix* tables; int count; } container; so container "includes" matrix elements, in turn includes cell elements.
let's suppose dynamically allocate host memory in way:
container c; c.tables = malloc(20 * sizeof(matrix)); for(int = 0;i<20;i++){ matrix m; m.elements = malloc(100 * sizeof(cell)); c.tables[i] = m; } that is, container of 20 matrix of 100 cells each.
- how copy data device memory using cudamemcpy()?
- is there way perform deep copy of "struct of struct" host device?
thanks time.
andrea
the short answer "just don't". there 4 reasons why that:
- there no deep copy functionality in api
- the resulting code have writeto set , copy structure have described gpu ridiculously complex (about 4000 api calls @ minimum, , intermediate kernel 20 matrix of 100 cells example)
- the gpu code using 3 levels of pointer indirection have massively increased memory access latency , break little cache coherency available on gpu
- if want copy data host afterwards, have same problem in reverse
consider using linear memory , indexing instead. portable between host , gpu, , allocation , copy overhead 1% of pointer based alternative.
if really want this, leave comment , try , dig old code examples show complete folly nested pointers on gpu.
Comments
Post a Comment