If it's the difference between finding a memory element in VRAM compared to spinning out to disk/DRAM and doing a PCIe transfer while that warp is idle, then 10x is conservative. Ideally warps with information at hand would jump ahead, but you still end with some async kernel waiting on data at one point. Depending on how an algorithm is run.
I think the standard taught methodology with CUDA is that for every memory access you need 24-30 FLOPs per DRAM read/write access to get peak performance. If you have to go outside VRAM for that cache line, then the algorithm better iterate over it's own data to keep on spinning to maintain high perf.
That makes sense. Assuming a 4K render path for movies (Sony has been pushing 4K for a while now), I can see them benefiting from the increase. Even in my own amateur experience with doing 4K effects in AE, it consumes all 32GB of my system memory in a heartbeat.
Agreed. Thanks for the giant, primitive GPU NVIDIA, but call us back when you can cut power consumption in half across your entire product stack. No one wants to waste an expansion slot, let alone two on a BlowDryer(TM) m6000.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
16 Comments
Back to Article
carnachion - Tuesday, March 22, 2016 - link
Where are the news Teslas!!ImSpartacus - Tuesday, March 22, 2016 - link
Apparently arriving for a while (in meaningful quantities), else this update wouldn't be necessary.damianrobertjones - Tuesday, March 22, 2016 - link
Does Tesla work for Anandtech?nathanddrews - Tuesday, March 22, 2016 - link
Love seeing GPUs with tons of RAM, even if I likely won't be using it anytime soon. The Sony quote... 10x performance boost... compared to what?zoxo - Tuesday, March 22, 2016 - link
If you are memory size limited, it's reasonable to assume a very large performance gainyannigr2 - Tuesday, March 22, 2016 - link
Looking at the Angry Birds movie trailer, I would say they are more "good scenario" limited.Ian Cutress - Tuesday, March 22, 2016 - link
If it's the difference between finding a memory element in VRAM compared to spinning out to disk/DRAM and doing a PCIe transfer while that warp is idle, then 10x is conservative. Ideally warps with information at hand would jump ahead, but you still end with some async kernel waiting on data at one point. Depending on how an algorithm is run.I think the standard taught methodology with CUDA is that for every memory access you need 24-30 FLOPs per DRAM read/write access to get peak performance. If you have to go outside VRAM for that cache line, then the algorithm better iterate over it's own data to keep on spinning to maintain high perf.
nathanddrews - Tuesday, March 22, 2016 - link
That makes sense. Assuming a 4K render path for movies (Sony has been pushing 4K for a while now), I can see them benefiting from the increase. Even in my own amateur experience with doing 4K effects in AE, it consumes all 32GB of my system memory in a heartbeat.kefkiroth - Tuesday, March 22, 2016 - link
This graphics card has more memory than some phones have storage.Eden-K121D - Tuesday, March 22, 2016 - link
I saw what you did there *cough* Apple *cough*Le Geek - Tuesday, March 22, 2016 - link
This gpu has more memory than all of my computing devices combined.nickolas - Wednesday, March 23, 2016 - link
Still 28nm process? Lame!BrokenCrayons - Wednesday, March 23, 2016 - link
Agreed. Thanks for the giant, primitive GPU NVIDIA, but call us back when you can cut power consumption in half across your entire product stack. No one wants to waste an expansion slot, let alone two on a BlowDryer(TM) m6000.Pissedoffyouth - Wednesday, March 30, 2016 - link
you are clearly the wrong market for thist0mmyr - Tuesday, May 3, 2016 - link
but how many FPS can this card alone get in q3arena?ninjaburger - Friday, May 6, 2016 - link
Over 9000