C++ AMP

Errata for C++ AMP




The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color Key: Serious Technical Mistake Minor Technical Mistake Language or formatting error Typo Question Note Update



Version Location Description Submitted By Date Submitted Date Corrected
Printed, PDF, ePub, Mobi, Safari Books Online
Page 8
1st code paragraph

This line contains two mistakes: "bool bSSEInstructions = (CpuInfo[3] >> 24 && 0x1)" First, the integer array is name CPUInfo and not CpuInfo. Secondly, logical and operator "&&" with constant 0x1 (true) does not have any effect on the result of the expression. I guess it should be a bit-wise and operation instead.

Note from the Author or Editor:
Should read: bool bSSEInstructions = (CPUInfo[3] >> 24 & 0x1);

Matias Dons Dollerup  Oct 09, 2012 
Printed, PDF, ePub, Mobi, Safari Books Online
Page 10
Last paragraph on page

Originally submitted on: http://blogs.msdn.com/b/nativeconcurrency/archive/2012/10/11/c-amp-book-now-available.aspx#10361456 Jo Blow 20 Oct 2012 8:34 PM Just started reading the book. I'd like to point out that parallelization of delayed recurrence relationships is actually possible contrary to what it says on p10. One can partition the array into pieces. If some pieces depend other pieces it is still possible, in some cases, to parallel them. The example gives a[k] = a[k-1] + b[k]. We can assume that when k-1 is outside the partition it has a value of zero. This, then will throw off each value in the array by some constant. We can add it back in after the fact quite easily. Essentially it is a boundary value problem. The problem is that we have to potentially loop back over the entire array(possibly multiple times) and this may defeat the speed up in the first place. It will depend on the specific case. (there are other "tricks" that could potentially be used too... the point here, is only that it is possible). Ade: This should read: For example, this loop is not parallelizable in its current form:

Ade Miller
Ade Miller
 
Oct 22, 2012 
PDF, ePub, Mobi, Safari Books Online, Other Digital Version
Page 12
End of second paragraph, which is below the code snippet.

The author refers readers to Chapter 2 for a description of lambdas in C++, while the description is actually on page 53, in Chapter 3.

Note from the Author or Editor:
This section should read: If you are not familiar with lambdas, see the “Lambdas in C++11” section in Chapter 3, “C++ AMP Fundamentals,” for an overview.

Fernando Montenegro  Dec 05, 2012 
PDF, ePub, Mobi, Safari Books Online, Other Digital Version
Page 36
Last line of for_each: acc = r * s

I was puzzled by the "acc = r * s;" single CPU code on page 36 in this function: void NBodySimpleInteractionEngine::BodyBodyInteraction(const ParticleCpu* const pParticlesIn, ParticleCpu& particleOut, int numParticles) const { float_3 pos(particleOut.pos); float_3 vel(particleOut.vel); float_3 acc(0.0f); std::for_each(pParticlesIn, pParticlesIn + numParticles, [=, &acc](const ParticleCpu& p) { const float_3 r = p.pos - pos; float distSqr = SqrLength(r) + m_softeningSquared; float invDist = 1.0f / sqrt(distSqr); float invDistCube = invDist * invDist * invDist; float s = m_particleMass * invDistCube; acc = r * s; }); vel += acc * m_deltaTime; vel *= m_dampingFactor; pos += vel * m_deltaTime; particleOut.pos = pos; particleOut.vel = vel; } because the final value of acc depended ONLY on the last call of the lambda. But the sum of all the accelerations caused by each point should be the final value of acc. And in fact, if I look at the AMP version, I find the code I expected (acc += r * s;) //-------------------------------------------------------------------------------------- // Calculate the acceleration (force * mass) change for a pair of particles. //-------------------------------------------------------------------------------------- void BodyBodyInteraction(float_3& acc, const float_3 particlePosition, const float_3 otherParticlePosition, float softeningSquared, float particleMass) restrict(amp) { float_3 r = otherParticlePosition - particlePosition; float distSqr = SqrLength(r) + softeningSquared; float invDist = concurrency::fast_math::rsqrt(distSqr); float invDistCube = invDist * invDist * invDist; float s = particleMass * invDistCube; acc += r * s; } So it looks like a minor bug to be fixed, and Amit agrees. http://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/657296d8-0322-4a7e-b453-c6c12f4a5553

Note from the Author or Editor:
The line: acc = r * s; Should read: acc += r * s;

Andrew Webb  Nov 27, 2012 
PDF, ePub, Mobi, Safari Books Online, Other Digital Version
Page 51
code example

array_view doesn't have a member called "grid". This should be "extent". Thus, the fourth line should be: parallel_for_each(av.extent, [=](index<1> idx) restrict(amp)

Note from the Author or Editor:
In the PDF this is on page 52. parallel_for_each(av.grid, [=](index<1> idx) restrict(amp) Should read parallel_for_each(av.extent, [=](index<1> idx) restrict(amp)

Edd Porter  Dec 19, 2012 
Printed, PDF, ePub, Mobi, Safari Books Online
Page 74
source code, loop on i and loop on k, bottom quarter of page

"i += TS" and "k < TS" should almost certainly be "i += TileSize" and "k < TileSize"

Note from the Author or Editor:
Should read: for (int i = 0; i < W; i += TileSize) { tile_static float sA[TileSize][TileSize]; tile_static float sB[TileSize][TileSize]; sA[row][col] = a(tidx.global[0], col + i); sB[row][col] = b(row + i, tidx.global[1]); for (int k = 0; k < TileSize; k++) sum += sA[row][k] * sB[k][col]; }

Anonymous  Oct 17, 2012 
Printed, PDF, ePub, Mobi, Safari Books Online
Page 149
1st para and diagram

Reader feedback (Mark Delaney): I have one additional confusion. Not sure if it is my confusion or a mistake in the book. I am replying by email to include graphic content. On page 149, printed book, I am very confused by the figures. Ade: For updated content see Errata at http://ampbook.codeplex.com/

Ade Miller
Ade Miller
 
Nov 04, 2012 
Printed, PDF, ePub, Mobi, Safari Books Online, Other Digital Version
Page 198
1st paragraph

At the end of the paragraph, the ante-penultimate phrase states "The emulated accelerators, WARP and REF, have warp sizes of 1 and 4, respectively". In the current C++ AMP implementation both devices use a warp size of 4. Thank you.

Note from the Author or Editor:
This is not actually incorrect text, however I would reword as follows: The emulated accelerators, WARP and REF, have warp sizes of 1 and 4, respectively. These numbers may change in the future so you should not rely on this when implementing applications that will run on a wide range of hardware platforms.

Alex Voicu  Feb 18, 2013 
Printed, PDF, ePub, Mobi, Safari Books Online
Page 296
Time-Out Detection and Recovery

Currently the TDR feature is not supported correctly in the NVIDIA and AMD drivers. This is tracked in a issue on CodePlex. http://ampbook.codeplex.com/workitem/33361 While the code and text in the book is correct it will not work correctly with the current drivers. No accelerator_view_removed is thrown.

Ade Miller
Ade Miller
 
Nov 14, 2012