My Message close
GAME JOBS
Latest Blogs
spacer View All     Post     RSS spacer
 
May 23, 2013
 
We're Indie, we like Microsoft. Too Controversial?
 
The Procession of Progression in Game Design
 
Xbox One: a flawed plan, well executed [1]
 
Letting the Player Find the Fun [1]
 
Using Small Studios As Stepping Stones In Your Career [4]
spacer
Latest Jobs
spacer View All     Post a Job     RSS spacer
 
May 23, 2013
 
Stomp Games
Web Game Programmer
 
Hasbro
Producer - Boys Integrated Play
 
LeapFrog
Associate Producer
 
Off Base Productions
Senior Front End Software Engineer
 
Off Base Productions
Web Application Developer
 
CCP - North America
Lead/Senior Visual Effects Artist
spacer
Latest Press Releases
spacer View All     RSS spacer
 
May 23, 2013
 
EA ANNOUNCES NEED FOR
SPEED RIVALS RACING TO
XBOX...
 
E3: Indie Co-op Puzzler
Tiny Brains Confirmed
for...
 
The Age of Shadows on
Distant Worlds starts
now!
 
Super Splatters Bursts
onto Steam in Late June
 
THE MIGHTY QUEST FOR EPIC
LOOT BRINGS OUT THE...
spacer
About
spacer Editor-In-Chief:
Kris Graft
Blog Director:
Christian Nutt
Senior Contributing Editor:
Brandon Sheffield
News Editors:
Mike Rose, Kris Ligman
Editors-At-Large:
Leigh Alexander, Chris Morris
Advertising:
Jennifer Sulik
Recruitment:
Gina Gross
Education:
Gillian Crowley
 
Contact Gamasutra
 
Report a Problem
 
Submit News
 
Comment Guidelines
Sponsor

 
In-Depth: Light pre-pass renderer on iPhone
In-Depth: Light pre-pass renderer on iPhone
 

March 22, 2012   |   By Simon Yeung

Comments Post A Comment

More: Smartphone/Tablet, Programming, Art





[In this reprinted #altdevblogaday in-depth piece, game programmer Simon Yeung shares his experiments getting a light pre-pass renderer to run on an iPhone 4S at 30 frames per second with 30 dynamic lights.]

Introduction

About a month ago, I bought an iPhone 4S, so I wrote some code on my new toy. Although this device does not support multiple render target (MRT), it does support rendering to a floating point render target (only available on iPhone 4S and iPad2).

So, I tested it with a light pre-pass renderer:



In the test, HDR lighting is done (gamma= 2.0 instead of 2.2, without adaptation) with three post processing filters (flimic tone mapping, bloom, and photo filter). In the test scene, three directional lights (one of them cast shadow with four cascade) and 30 point lights are used with two skinned models, running bullet physics at the same time, which can have around 28~32fps.

G-buffer layout

I have tried two different layout for the G-buffer. My first attempt is to use one 16-bit render target with the R channel storing the depth value, the G and B channels storing the view space normal using the encoding method from "A bit more deferred-CryEngine 3", and the A channel storing the glossiness for specular lighting calculation.

But later I discovered that this device support the openGL extension GL_OES_depth_texture, which can render the depth buffer into a texture. So my second attempt is to switch the G-buffer layout to use the RGB channels to store the view space normal without encoding, and the A channel storing the glossiness while the depth can be sampled directly from the depth texture.

G-buffer storing view space normal and glossiness
Depth buffer

Switching to this layout gives a boost in the frame rate as the normal value does not need to encode/decode from the texture. However, making the 16-bit render target to 8-bit to store normal and glossiness does not give any performance improvement, probably because the test scene is not bound by band width.

Stencil optimization

The second optimization is to optimize the deferred lights, using the stencil trick by drawing a convex light polygon to cull those pixels that do not need to perform lighting.

drawing the bounding volume of the point lights

However, after finishing implementing the stencil trick, the frame rate drops… This is because when filling the stencil buffer, I used the shader that is the same as the one used for performing lighting. Even if the color write is disabled during filling the stencil buffer, the GPU is still doing redundant work. So a simple shader is used in the stencil pass instead, which improves the performance.

Also, drawing out the shape of the point lights made me discover that the attenuation factor I used (i.e. 1/(1+k.d+k.d^2) ) has a large area that does not get lit, so I switched to a more simple linear falloff model (e.g. 1- lightDistance/lightRange, can give an exponent to control the falloff) to give a tighter bound.

light buffer

Combining post-processing passes

Combining the full screen render passes can help performance. In the test scene, originally the bloom result is additively blend with the tone-mapped scene render target, followed by a photo filter and render to the back buffer. I combined these passes by calculating the additive blend with tone-mapped scene inside the photo filter shader, which is faster than before.

Resolution

The program is run at a low resolution with back buffer of 480x320pixels. Also, the G-buffer and the post processing textures are further scaled down to 360x300pixels. This can reduce the number of fragments that need to be shaded by the pixel shaders.

Shadow

In the scene, cascaded shadow map is used with four cascades (resolution= 256×256). I have tried using the GL_EXT_shadow_samplers extension, hoping that it can helps the frame rate. But the result is disappointing, as the speed of the extension is the same as performing comparison inside the shader…


It takes around 8ms for calculating shadow and blurring it. If a basic shadow map is used instead (i.e. without cascade) with blurring, it gives a little performance boost depending on how many point lights on screen. Of course, switching off the blur will speed up the shadow calculation a lot.

basic shadow map
basic shadow map with blur

Cascaded shadow map
Cascaded shadow map with blur

Conclusion

In this post, I described the methods used to make a light pre-pass renderer to run on the iPhone to achieve 30fps with 30 dynamic lights. However, high resolution is sacrificed in order to keep the dynamic lights, HDR lighting and the post processing filters.

Also, no anti-aliasing is done in the test as the frame rate is not good enough. Maybe MSAA can be done if the basic shadow map is used instead of cascade. But we will leave that for future investigation.

References
[1] Light Pre Pass Renderer: http://diaryofagraphicsprogrammer.blogspot.com/2008/03/light-pre-pass-renderer.html
[2] A bit more deferred – CryEngine 3: http://www.crytek.com/sites/default/files/A_bit_more_deferred_-_CryEngine3.ppt
[3] Filmic tone mapping operators: http://filmicgames.com/archives/75
[4] Crysis Next Gen Effects: http://www.crytek.com/sites/default/files/GDC08_SousaT_CrysisEffects.ppt
[5] Position From Depth 3: Back In The Habit: http://mynameismjp.wordpress.com/2010/09/05/position-from-depth-3/
[6] Fast Mobile Shaders: http://blogs.unity3d.com/wp-content/uploads/2011/08/FastMobileShaders_siggraph2011.pdf
[7] GLSL Optimizer: http://aras-p.info/blog/2010/09/29/glsl-optimizer/
[8] Deferred Cascaded Shadow Maps: http://aras-p.info/blog/2009/11/04/deferred-cascaded-shadow-maps/
[This piece was reprinted from #AltDevBlogADay, a shared blog initiative started by @mike_acton devoted to giving game developers of all disciplines a place to motivate each other to write regularly about their personal game development passions.]
 
 
Top Stories

image
Blog: I took my Ouya game to retail, and here's what happened
image
Video: Thief vs. Deus Ex - a design discussion
image
Here's how much 'whales' spent so far this year
image
'This model of game making is so fundamentally broken.'


   
 
Comments


none
 
Comment:
 




 
UBM Tech