cuda - Correct Effective Bandwith calculations of y = Ax+b? -


i calculate bandwith of matrix vector multiplication , addition: (assume = m times n big) y = a*x +b

but bit confused read , write count number of bytes read global memory:

is effective bandwith:

bytesreadwrite = m*n (for reading a) + n(for read x) + m (for read b)  + m(for write y) 

or it

bytesreadwrite = m*n (for reading a) + m*n (for read x) + m (for read b)  + m(for write y) 

m*n x because read once whole x each row (also if work shared memory, have read once whole x vector per row)

does have advice of right choice? dont really...

i tend use first calculation why? make sense?

thanks lot!!!

it's none of above. in terms of memory bandwidth, modern processors load of items operated on once level 2 cache, , operate on them there, after results written out memory items changed. effectively, bandwidth sum total size of elements involved. note: oversimplification, because doesn't take account effects of streaming, not mention memory pagination. streaming, it's not uncommon have single matrix operate on large set of data (3d graphics calculations, example); in case, matrix gets loaded l2 cache (and presumably reasonably optimized code registers there) once, , vectors loaded through. once again, model isn't complete without understanding of modern memory paging techniques; there's gigantic difference in above if matrix , vectors stored in different memory pages, example; not mention serious optimizations in packing vectors "streaming" l2 cache. , then, that's assuming cpu model of performing matrix math; bringing gpu picture changes things once again dramatically.


Comments

Popular posts from this blog

c# - SharpSVN - How to get the previous revision? -

c++ - Is it possible to compile a VST on linux? -

url - Querystring manipulation of email Address in PHP -