Sponsored By

2D Surface Deformation

Increases in the performance of graphics accelerators free additional CPU cycles that can be used for real-time physical world modeling. Modifying the geometry of an object is more effective than mapping a new or animated texture, because correctly deformed objects will look right from any angle and in any lighting conditions. Max I. Fomitchev discusses the implementation of deformable surfaces for real-time 3D games that simulate realistic environments.

February 16, 2000

22 Min Read

Author: by Max Fomitchev

The rapid growth of CPU power opens possibilities for development of 3D games that feature realistic environment. The increase in the performance of graphics accelerators frees additional CPU cycles that can be used for real-time physical world modeling.

Game realism can be improved by simulating deformable objects or deformable surfaces. The list of common deformable objects and surfaces include:

  • water (waves, streams, waterfalls), wind and splashes simulation;

  • cloth (capes, flags, curtains), wind and wrinkle simulation;

  • soil (mud, clay), imprint and crater simulation;

  • vegetation (grass, trees), wind simulation.

Simulation of deformable objects results in unique experience of realistic never-the-same environment. Wind that causes waves and makes grass bend; objects that produce ripples when dropped in water; soil covered with footprints and explosion craters.

The user receives much richer sense of reality when the geometry of an object is modified rather than when anew or animated texture is mapped on to it because correctly deformed objects will look right from any angle and in any lighting conditions. Also deformed objects can block or reveal other objects behind them.

The implementation of deformable surfaces discussed in this article is intended for real-time 3D games that simulate realistic environment. The algorithm is optimized for AMD 3DNow! technology. Different implementations and their performance are discussed.

Physics of 2D surface deformation

Consider a grid that represents a simple 2D surface as shown on figure 1. Each vertex on the surface is connected with 6 neighbors to the north, east, southeast, south, west and northwest. This interconnection defines local topology. The neighboring vertices interact with each other by means of elastic forces (i.e. interconnections between vertices, depicted as solid lines on the figures, act as coil springs). In the initial (relaxed) state vertices are evenly spaced, and vectors SRelaxE, SRelaxSE, SRelaxS specify distances between neighbors. In the relaxed state elastic forces between vertices are equal to zero.

When an external force (Fext) is applied to a vertex on the surface at time t0=Dt, the vertex starts moving and displaces to a new location time t1=2Dt. This displacement produces elastic forces between local topological neighbors. The resulted elastic forces counter the displaced vertex motion and try to return the vertex to its original location. However according to the 3rd Newton’s law the same forces but with opposite direction act upon neighboring vertices. So at time t2=3Dt the neighbors get displaced, and entire cluster of vertices comes in motion. With time more and more vertices get involved in motion, representing wave propagation across the surface.

The 2D surface is characterized by the following parameters:

  • local topology;

  • global topology (i.e. plane, cylinder, sphere, etc.)

  • vertex mass m;

  • relaxation distances (SRelaxE, SRelaxSE, SRelaxS);

  • elasticity model (linear, exponential, etc.);

  • elasticity constant (elasticity tensor E for anisotropic surfaces) E, E >= 0;

  • damping constant (damping tensor D for anisotropic surfaces) d, 0 <= d <= 1.

Greater elasticity corresponds to faster deformation and stiffer body, while low elasticity corresponds to softer bodies.

Smaller damping represents faster relaxation or faster solidification for ductile surfaces.

This paper deals with plane isotropic surfaces combined from identical particles with mass m with six-neighbor local topology and linear elasticity model.

State of a vertex on the surface is characterized by coordinate vector S, velocity V, total internal (elastic) force FTotal and external force FExt.

The surface deformation is calculated for each vertex in the following way. First vertex displacement relative to the east, south and southeast neighbors is calculated:

DSE = SE - S - SRelaxE

DSS = SS - S - SRelaxS

DSSE = SSE - S - SRelaxSE

Then elastic forces between local neighbors are evaluated:

FE = Elasticity · DSE

FS = Elasticity · DSS

FSE = Elasticity · DSSE

Total force acting on the vertex under consideration is:

FTotal = FE + FS + FSE

According to the 3rd Newton’s Law the vertex under consideration contributes to the total force acting on its neighbors:

FTotalE = FTotalE - FE

FTotalS = FTotalS - FS

FTotalSE = FTotalSE - FSE

Notice that north, west and northwest elastic forces are not calculated directly. The total force for the vertex FTotal is updated automatically when north, west and northwest vertices are processed (3rd Newton’s law).

Finally, when all vertices on the surface are processed (i.e. all internal forces evaluated) new coordinates are calculated for each vertex.

Acceleration is calculated from the 2nd Newton’s law:

a = (FTotal + Fext)/m

The rest is discrete numerical integration:

DV = a · Dt

V = V + DV

DS = V · Dt

S = S + DS

Lastly vertice velocity is adjusted in order to account for damping:

V = V · d

Effectors

Normally surface does not deform by itself, but rather deforms due to the influence of an external object – effector. Any rigid object that collides with a deformable surface is an effector. Wind can be modeled as a stream of infinitely small rigid particles with Brownian velocities. Spatial (3D) effectors can be modeled as a set of infinitely small particles corresponding to the object’s vertices.

Particle effector is show on figure 3. It is characterized by influence radius REffector and strength.

Surface vertices located further than REffector from the center of effector remain unaffected. The effector influence results in external forces acting on in-range vertices. Typically the effector-induced force is inversely proportional to the distance between vertex and the center of effector:

DS = SEffector – S

Fnew = EffectorStrength · DS/|DS|, |DS| < REffector

Fext = Fext + Fnew

When simulating more or less complete physical system one can account for surface resistance by updating total force acting on effector:

FTotalEffector = FTotalEffector – Fnew

Models of common surfaces

Based on the described 2D surface model different common surfaces can be represented by varying surface parameters:

  • Water (elastic): elasticity = 1, damping ~ 0.995.

  • Cloth (elastic): elasticity ~ 0.9, damping ~ 0.9.

  • Rubber (elastic): elasticity > 1, damping < 1.

Setting damping constant to 1 will produce infinite oscillations generated by a single effector. Deformation pattern will never be the same. This might be a good idea for infinite water waves modeling for seascapes.

Typically a deformable surface has some fixed vertices, i.e. vertices that can not be displaced from their original location. Such fixed vertices correspond to surface edges or to attachment edges / points. Seashore and curtains are examples of fixed edges and fixed attachment points respectively.

Trees and vegetation can be modeled as a hierarchy of surfaces with cylinder / cone global topology and stiff rubber (elasticity > 1, damping < 1) properties, where individual cylinders correspond to branches, stems or trunks. Root and attachment edges must be marked as fixed points. Particle effectors that simulate wind can be applied to such ‘rubber’ trees to produce realistic grass fields or woods experience.

Algorithm implementation

Surface deformation algorithm can be split on two parts:

  1. Initial setup - to collision with effector (external force calculation);

  2. Iterative surface deformation calculation (new vertex position and normal calculation).

The surface deformation algorithm is inherently suitable for SIMD processing and can be implemented in several ways. But before the discussion of the implementation consider the following numerical optimizations or "cheating opportunities" that remove redundant division and multiplication operations:

  • 1/m can be stored to avoid division in a = F/m;

  • m = 1 can be used to avoid multiplication by 1/m (a · F);

  • Elasticity = 1 can be used to avoid multiplication in F = Elasticity · DS (F º DS);

  • dt = 1 can be used to avoid multiplication in DV = a · Dt and DS = V · Dt (DV º a and DS º V);

  • Uniform relaxation length SRelaxEX = SRelaxSY = SRelaxSEX = SRelaxSEY can be used to maximize register utilization

C code

When targeting to Direct3D the easiest way is to implement surface deformation using D3DVECTOR structure and AoS paradigm. All vectors (forces, displacements, velocities etc.) can be defined as arrays of D3DVECTOR. However keeping in mind SIMD-optimization and porting to 3DNow! it is necessary to choose an alternative path and SoA paradigm.

C-implementation is straightforward. The resulting objective code produced by Visual C (optimized for performance on ‘blended’ CPU) yields truly amazing performance on Athlon CPU. The performance comes from the unique FPU design that features fully pipelined out-of-order execution of floating-point operations. Most common FP-instructions, such as fadd, fsub, fmul take only one cycle to execute. Some instructions such as fxch are virtually free and typically consume zero CPU cycles. Moreover, fadd / fsub, fmul and fst instructions can execute in parallel. The C implementation in Listing1. takes about 2M cycles to process 10,000 vertices on Athlon-600 MHz.

Listing 1. C Implementation of Surface Deformation

void SURFACE::Update(float dt)
{
// Reset Total Force
memset(TotalForceX, 0, NumVertex*sizeof(float));
memset(TotalForceY, 0, NumVertex*sizeof(float));
memset(TotalForceZ, 0, NumVertex*sizeof(float));

// For each vertex
for ( int i = 0, k = 0; i < NumRows - 1; i++ )
{
// Clear first vertex in the next row
TotalForceX[(i + 1)*NumCols] = 0.0f;
TotalForceY[(i + 1)*NumCols] = 0.0f;
TotalForceZ[(i + 1)*NumCols] = 0.0f;

for ( int j = 0; j < NumCols - 1; j++, k++ )
{
// Distance between C-E vertices
float
dx = SX[k + 1] - SX[k],
dy = SY[k + 1] - SY[k],
dz = SZ[k + 1] - SZ[k];
// Less relaxation length
dx -= SRelaxE.x;
dy -= SRelaxE.y;
dz -= SRelaxE.z;
// C-E elastic force
dx *= Elasticity,
dy *= Elasticity,}
dz *= Elasticity;
// Total force for C-vertex
TotalForceX[k] += dx;
TotalForceY[k] += dy;
TotalForceZ[k] += dz;
// Total force for E-vertex (3rd Newton's Law)
TotalForceX[k + 1] -= dx;
TotalForceY[k + 1] -= dy;
TotalForceZ[k + 1] -= dz;

// Distance between C-S vertices
dx = SX[k + NumCols] - SX[k];
dy = SY[k + NumCols] - SY[k];
dz = SZ[k + NumCols] - SZ[k];
// Less relaxation length
dx -= SRelaxS.x;
dy -= SRelaxS.y;
dz -= SRelaxS.z;
// C-S elastic force
dx *= Elasticity;
dy *= Elasticity;
dz *= Elasticity;
// Total force for C-vertex
TotalForceX[k] += dx;
TotalForceY[k] += dy;
TotalForceZ[k] += dz;
// Total force for S-vertex (3rd Newton's Law)
TotalForceX[k + NumCols] -= dx;
TotalForceY[k + NumCols] -= dy;
TotalForceZ[k + NumCols] -= dz;

// Distance between C-SE vertices
dx = SX[k + NumCols + 1] - SX[k];
dy = SY[k + NumCols + 1] - SY[k];
dz = SZ[k + NumCols + 1] - SZ[k];
// Less relaxation length
dx -= SRelaxSE.x;
dy -= SRelaxSE.y;
dz -= SRelaxSE.z;
// C-SE elastic force
dx *= Elasticity;
dy *= Elasticity;
dz *= Elasticity;
// Total force for C-vertex
TotalForceX[k] += dx;
TotalForceY[k] += dy;
TotalForceZ[k] += dz;
// Total force for SE-vertex (3rd Newton's Law)
TotalForceX[k + NumCols + 1] -= dx;
TotalForceY[k + NumCols + 1] -= dy;
TotalForceZ[k + NumCols + 1] -= dz;
}
k++;
}

float Massdt = Mass*dt;
// For each vertex add external force
for ( k = 0; k < NumVertex; k++ )
{
TotalForceX[k] += ExternalForceX[k];
ExternalForceX[k] = 0.0f;
TotalForceY[k] += ExternalForceY[k];
ExternalForceY[k] = 0.0f;
TotalForceZ[k] += ExternalForceZ[k];
ExternalForceZ[k] = 0.0f;

// For each vertex that is not fixed in space
if ( Fixed[k] )
{
// Calculate velocity: V = V + A*dt, A = F/m
VX[k] += TotalForceX[k]*Massdt;
VY[k] += TotalForceY[k]*Massdt;
VZ[k] += TotalForceZ[k]*Massdt;

// Calculate new vertex position: S = S + V*dt
SX[k] += VX[k]*dt;
SY[k] += VY[k]*dt;
SZ[k] += VZ[k]*dt;

// Account for damping factor: V = V*damp
VX[k] *= Damping;
VY[k] *= Damping;
VZ[k] *= Damping;
// Set normals to macth velocity
Normal[k].x = VX[k];
Normal[k].y = VY[k];
Normal[k].z = VZ[k];
}

}

}

3DNow! code

The hand-optimized assembler code that uses 3DNow! instruction set (see AMD SDK amd3dx.h header file) can be fairly easy produced from SoA C-implementation. Vertices are processed in pairs, and the code for x, y and z-components is the same except for displacement on memory operations.

Athlon CPU is less sensitive for instruction scheduling mostly due to the large (72 mops) reorder buffer. However it still a good idea to schedule instructions to avoid possible pipeline stalls due to latencies of both 3DNow! pipelines. In the code fragment below 3DNow! instructions are grouped in packs of 3. Each instruction processes the same vector component (x, y or z) for all three topological neighbors: east, south and southeast. This grouping of instructions hides most of the latency of 3DNow! pipeline since the result of the first instruction is not referenced until 3 cycles later.

The code suffers a little from misaligned data access. Misaligned access comes from the necessity to process east and southeast vertices. However the penalty for misaligned access is only one cycle.

The code heavily uses load-execute operations to minimize register pressure and promote code density. Also branch for fixed vertices was eliminated and replaced by MMX pand instruction that resets total force to zero for fixed vertices.

For storing normal data both halves of MMX registers must be written to memory but to different locations. pswapd instruction that is an Athlon extension 3DNow! instruction set was used to swap upper and lower 32-bit halves of a MMX register. The pswapd instruction offers better performance than punpckhdq MMX instruction since it doesn’t use MMX shifter unit.

PREFETCH instructions were used to optimize memory throughput.

The 3DNow! assembler code takes about 1.5M cycles to process 10,000 vertices on Athlon-600 MHz. The 3DNow! implementation in Listing 2 is ~40% faster than x87 code.

Max I. Fomitchev is a full-time consultant at Systems&Programming Resources Inc., Tulsa, OK. He has a MCS and is getting his Ph.D. in March. He has recently released Gems 3D shareware 3D puzzle game (www.Gems3D.com), and he also does contract work for AMD among other things. Specializing in code optimization, his interests include code optimization, 3D graphics, physics simulation, compiler design and ultrasound imaging. Check out his website at www.webzone.net/maxf.

Listing 2. 3DNow! Implementation of Surface Deformation

void SURFACE::Update3DNow(float dt)
{
float* ptrS, Massdt[2];
int i, j, k;
float
SREX[2] = {SRelaxE.x, SRelaxE.x},
SREY[2] = {SRelaxE.y, SRelaxE.y},
SREZ[2] = {SRelaxE.z, SRelaxE.z},
SRSX[2] = {SRelaxS.x, SRelaxS.x},
SRSY[2] = {SRelaxS.y, SRelaxS.y},
SRSZ[2] = {SRelaxS.z, SRelaxS.z},
SRSEX[2] = {SRelaxSE.x, SRelaxSE.x},
SRSEY[2] = {SRelaxSE.y, SRelaxSE.y},
SRSEZ[2] = {SRelaxSE.z, SRelaxSE.z};
// Reset Total Force
memset(TotalForceX, 0, NumVertex*sizeof(float));
memset(TotalForceY, 0, NumVertex*sizeof(float));
memset(TotalForceZ, 0, NumVertex*sizeof(float));

// Initial setup
__asm {
femms
mov ebx,this // ebx -> this
movd mm7,[ebx]this.Elasticity
punpckldq mm7,mm7 // mm7 = Elasticity
mov eax,[ebx]this.NumVertex
shl eax,2 // eax = NumVertex*sizeof(float)
mov ecx,[ebx]this.NumCols
shl ecx,2 // ecx = NumCols*sizeof(float)
mov edx,eax
shl edx,1 // edx = NumVertex*sizeof(float)*2
lea edi,[ebx]this.SX // [edi] -> SX/SY/SZ
lea esi,[ebx]this.TotalForceX // [esi] ->
TotalForceX/TotalForceY/TotalForceZ
mov i,eax
sub i,ecx
M:
mov j,ecx
sub j,8
M1:
prefetchm(edi,64)
prefetchm(esi,64)
// [edi] -> S
// [esi] -> TotalForce
// mm7 = Elasticity
// mm0 = dx/dy/dz
// Process all vertices X-coordinate
movq mm3,[edi] // mm3 = SX[k]
movq mm0,[edi + 4] // SX[k + 1]
movq mm1,[edi + ecx] // SX[k + NumCols]
movq mm2,[edi + ecx+4] // SX[k + NumCols + 1]
pfsub (mm0,mm3) // dx = SX[k + 1] - SX[k]
pfsub (mm1,mm3) // dx = SX[k + NumCols] - SX[k]
pfsub (mm2,mm3) // dx = SX[k + NumCols + 1] - SX[k]
movq mm4,SREX
movq mm5,SRSX
movq mm6,SRSEX
pfsub (mm0,mm4) // dxE -= SRelaxE.x
pfsub (mm1,mm5) // dxS -= SRelaxS.x
pfsub (mm2,mm6) // dxSE -= SRelaxSE.x
movq mm3,[esi + 4] // TotalForceX[k + 1]
pfmul (mm0,mm7) // dxE *= Elasticity
pfmul (mm1,mm7) // dxS *= Elasticity
pfmul (mm2,mm7) // dxSE *= Elasticity
pfsub (mm3,mm0) // TotalForceX[k + 1] - dxE
movq mm4,[esi + ecx] // TotalForceX[k + NumCols]
movq [esi + 4],mm3 // TotalForceX[k + 1] -= dxE
pfsub (mm4,mm1) // TotalForceX[k + NumCols] - dxS
movq [esi + ecx],mm4 // TotalForceX[k + NumCols] -= dxS
movq mm5,[esi + ecx+4] // TotalForceX[k + NumCols + 1]
pfsub (mm5,mm2) // TotalForceX[k + NumCols + 1] - dxSE
movq [esi + ecx+4],mm5 // TotalForceX[k + NumCols + 1] = dxSE
pfadd (mm0,esi) // TotalForceX[k] += dxE
pfadd (mm0,mm1) // TotalForceX[k] += dxS
pfadd (mm0,mm2) // TotalForceX[k] += dxSE
movq [esi],mm0 // TotalForceX[k] += dx

// Y-coordinate
add edi,eax
add esi,eax
movq mm3,[edi] // mm3 = SY[k]
movq mm0,[edi + 4] // SY[k + 1]
movq mm1,[edi + ecx] // SY[k + NumCols]
movq mm2,[edi + ecx+4] // SY[k + NumCols + 1]
pfsub (mm0,mm3) // dy = SY[k + 1] - SY[k]
pfsub (mm1,mm3) // dy = SY[k + NumCols] - SY[k]
pfsub (mm2,mm3) // dy = SY[k + NumCols + 1] - SY[k]
movq mm4,SREY
movq mm5,SRSY
movq mm6,SRSEY
pfsub (mm0,mm4) // dyE -= SRelaxE.y
pfsub (mm1,mm5) // dyS -= SRelaxS.y
pfsub (mm2,mm6) // dySE -= SRelaxSE.y
movq mm3,[esi + 4] // TotalForceY[k + 1]
pfmul (mm0,mm7) // dyE *= Elasticity
pfmul (mm1,mm7) // dyS *= Elasticity
pfmul (mm2,mm7) // dySE *= Elasticity
pfsub (mm3,mm0) // TotalForceY[k + 1] - dyE
movq mm4,[esi + ecx] // TotalForceY[k + NumCols]
movq [esi + 4],mm3 // TotalForceY[k + 1] -= dyE
pfsub (mm4,mm1) // TotalForceY[k + NumCols] - dyS
movq [esi + ecx],mm4 // TotalForceY[k + NumCols] -= dyS
movq mm5,[esi + ecx+4] // TotalForceY[k + NumCols + 1]
pfsub (mm5,mm2) // TotalForceY[k + NumCols + 1] - dySE
movq [esi + ecx+4],mm5 // TotalForceY[k + NumCols + 1] -= dySE
pfadd (mm0,esi) // TotalForceY[k] += dyE
pfadd (mm0,mm1) // TotalForceY[k] += dyS
pfadd (mm0,mm2) // TotalForceY[k] += dySE
movq [esi],mm0 // TotalForceY[k] += dy

// Z-coordinate
add edi,eax
add esi,eax
movq mm3,[edi] // mm3 = SZ[k]
movq mm0,[edi + 4] // SZ[k + 1]
movq mm1,[edi + ecx] // SZ[k + NumCols]
movq mm2,[edi + ecx+4] // SZ[k + NumCols + 1]
pfsub (mm0,mm3) // dz = SZ[k + 1] - SZ[k]
pfsub (mm1,mm3) // dz = SZ[k + NumCols] - SZ[k]
pfsub (mm2,mm3) // dz = SZ[k + NumCols + 1] - SZ[k]
movq mm4,SREZ
movq mm5,SRSZ
movq mm6,SRSEZ
pfsub (mm0,mm4) // dzE -= SRelaxE.z
pfsub (mm1,mm5) // dzS -= SRelaxS.z
pfsub (mm2,mm6) // dzSE -= SRelaxSE.z
movq mm3,[esi + 4] // TotalForceZ[k + 1]
pfmul (mm0,mm7) // dzE *= Elasticity
pfmul (mm1,mm7) // dzS *= Elasticity
pfmul (mm2,mm7) // dzSE *= Elasticity
pfsub (mm3,mm0) // TotalForceZ[k + 1] - dzE
movq mm4,[esi + ecx] // TotalForceZ[k + NumCols]
movq [esi + 4],mm3 // TotalForceZ[k + 1] -= dzE
pfsub (mm4,mm1) // TotalForceZ[k + NumCols] - dzS
movq [esi + ecx],mm4 // TotalForceZ[k + NumCols] -= dzS
movq mm5,[esi + ecx+4] // TotalForceZ[k + NumCols + 1]
pfsub (mm5,mm2) // TotalForceZ[k + NumCols + 1] - dzSE
movq [esi + ecx+4],mm5 // TotalForceZ[k + NumCols + 1] -= dzSE
pfadd (mm0,esi) // TotalForceZ[k] += dzE
pfadd (mm0,mm1) // TotalForceZ[k] += dzS
pfadd (mm0,mm2) // TotalForceZ[k] += dzSE
movq [esi],mm0 // TotalForceZ[k] += dz

// Update indices
sub edi,edx
sub esi,edx
add edi,8
add esi,8
sub j,8
jnz M1
add edi,8
add esi,8
sub i,ecx
jnz M

lea ecx,[ebx]this.SX
mov ptrS,ecx // ptrS = SX
movd mm5,[ebx]this.Mass
punpckldq mm5,mm5
movd mm6,dt // mm6 = dt
punpckldq mm6,mm6
pfmul (mm5,mm6) // mm5 = Massdt
movq Massdt,mm5
movd mm7,[ebx]this.Damping
punpckldq mm7,mm7 // mm7 = Damping
lea ecx,[ebx]this.Normal // [ecx] -> Normal
lea edx,[ebx]this.Fixed // [edx] -> Fixed
lea edi,[ebx]this.TotalForceX // [edi] -> TotalForce
lea esi,[ebx]this.ExternalForceX// [esi] -> ExternalForce
lea ebx,[ebx]this.VX // [ebx] -> V
mov k,eax

// For each vertex add external force
M2:
prefetchm(edi,64)
prefetchm(esi,64)
prefetchm(ebx,64)
prefetchm(ecx,64)
movq mm6,[edx] // Fixed[k]
movq mm0,[edi] // TotalForceX[k]
movq mm1,[edi + eax] // TotalForceY[k]
movq mm2,[edi + eax*2] // TotalForceZ[k]
movq mm3,[esi] // ExternalForceX[k]
movq mm4,[esi + eax] // ExternalForceY[k]
movq mm5,[esi + eax*2] // ExternalForceZ[k]
pfadd (mm0,mm3) // mm0 = TotalForceX[k] + ExternalForceX[k]
pfadd (mm1,mm4) // mm1 = TotalForceY[k] + ExternalForceY[k]
pfadd (mm2,mm5) // mm2 = TotalForceZ[k] + ExternalForceZ[k]
pxor mm3,mm3
pand mm0,mm6 // TotalForceX[k] & Fixed[k]
pand mm1,mm6 // TotalForceY[k] & Fixed[k]
pand mm2,mm6 // TotalForceZ[k] & Fixed[k]
movq mm6,Massdt
movq [esi],mm3 // ExternalForceX[k] = 0
movq [esi + eax],mm3 // ExternalForceY[k] = 0
movq [esi + eax*2],mm3 // ExternalForceZ[k] = 0
//maskmovq mm0,mm4
movq [edi],mm0 // TotalForceX[k] += ExternalForceX[k] & Fixed[k]
movq [edi + eax],mm1 // TotalForceY[k] += ExternalForceY[k] & Fixed[k]
movq [edi + eax*2],mm2 // TotalForceZ[k] += ExternalForceZ[k] & Fixed[k]
pfmul (mm0,mm6) // mm0 = TotalForceX[k]*Massdt
pfmul (mm1,mm6) // mm1 = TotalForceY[k]*Massdt
pfmul (mm2,mm6) // mm2 = TotalForceZ[k]*Massdt
movq mm6,dt
punpckldq mm6,mm6

// Calculate velocity: V = V + TotalForce*Massdt
pfadd (mm0,ebx) // VX[k] + TotalForceX[k]*Massdt
movq mm4,[ebx + eax] // VY[k]
movq mm5,[ebx + eax*2] // VZ[k]
pfadd (mm1,mm4) // VY[k] + TotalForceY[k]*Massdt
pfadd (mm2,mm5) // VZ[k] + TotalForceZ[k]*Massdt
movq mm3,mm0 // mm3 = VX[k] + TotalForceX[k]*Massdt
movq mm4,mm1 // mm3 = VY[k] + TotalForceY[k]*Massdt
movq mm5,mm2 // mm3 = VY[k] + TotalForceZ[k]*Massdt
pfmul (mm3,mm7) // VX[k]*Damping
pfmul (mm4,mm7) // VY[k]*Damping
pfmul (mm5,mm7) // VZ[k]*Damping
pfmul (mm0,mm6) // mm0 = (VX[k] + TotalForceX[k]*Massdt)*dt
pfmul (mm1,mm6) // mm1 = (VY[k] + TotalForceY[k]*Massdt)*dt
pfmul (mm2,mm6) // mm2 = (VY[k] + TotalForceZ[k]*Massdt)*dt
movq [ebx],mm3 // VX[k] += TotalForceX[k]*Massdt
movq [ebx + eax],mm4 // VY[k] += TotalForceY[k]*Massdt
movq [ebx + eax*2],mm5 // VZ[k] += TotalForceZ[k]*Massdt
movd [ecx],mm3 // Normal[k].x = VX[k]
movd [ecx + 4],mm4 // Normal[k].y = VY[k]
movd [ecx + 8],mm5 // Normal[k].z = VZ[k]
pswapd (mm3,mm3)
pswapd (mm4,mm4)
pswapd (mm5,mm5)
//punpckhdq mm3,mm3
movd [ecx + TYPE D3DVECTOR],mm3 // Normal[k].x = VX[k]
//punpckhdq mm4,mm4
movd [ecx + TYPE D3DVECTOR + 4],mm4 // Normal[k].y = VY[k]
//punpckhdq mm5,mm5
movd [ecx + TYPE D3DVECTOR + 8],mm5 // Normal[k].z = VZ[k]

// Calculate new vertex position: S = S + V*dt
// Save edi
mov i,edi
// [edi] -> S
mov edi,ptrS
movq mm3,[edi] // SX[k]
movq mm4,[edi + eax] // SY[k]
movq mm5,[edi + eax*2] // SZ[k]
pfadd (mm0,mm3) // SX[k] + VX[k]*dt
pfadd (mm1,mm4) // SY[k] + VY[k]*dt
pfadd (mm2,mm5) // SZ[k] + VZ[k]*dt
movq [edi],mm0 // SX[k] += VX[k]*dt
movq [edi + eax],mm1 // SY[k] += VY[k]*dt
movq [edi + eax*2],mm2 // SZ[k] += VZ[k]*dt
mov edi,i

// Update pointers
add ptrS,8
add edi,8
add esi,8
add ebx,8
add ecx,(TYPE D3DVECTOR)*2
add edx,8
sub k,8
jnz M2
femms
}
}

Read more about:

Features
Daily news, dev blogs, and stories from Game Developer straight to your inbox

You May Also Like