Saturday, July 4, 2015

happy (gpu) independence day

So, I realized it has been a while since posting about freedreno progress, so in honor of US independence day I figured it was as good an excuse as any for an update about independence from gpu blob driver for snapdragon/adreno..

Back in end of March 2015 at ELC, I gave a freedreno update presentation at ELC, listing the following major tasks left for gles3 support:
  • Uniform Buffer Objects (UBO)
  • Transform Feedback (TF)
  • Multi-Render-Target (MRT)
  • advanced flow control in shader compiler
 and additionally for gl3:
  • Multisample anti-aliasing (MSAA)
  • NV_conditional_render
  • 32b depth (z32 and z32_s8) (which I forgot to mention in the presentation)
EDIT: Ilia pointed out that 32b depth is needed for gles3 too, and gl3 additionally needs clipdist/etc (which we'll have to emulate, but hopefully can do in a generic nir pass) and rgtc (which will need sw decompression hopefully in mesa core so other drivers for gles class hw can reuse).  Original list was based on what mesa's compute_version() code was checking quite some time back.
Since then, we've gained support for UBO's (a3xx by Ilia Mirkin, and a4xx), MRT (for a3xx and core, again thanks to Ilia.. still needs to be wired up for a4xx), 32b depth (a3xx and core, again thanks to Ilia), and I've finished up shader compiler for loops/flow-control for ir3 (a3xx/a4xx).  The shader compiler work was a somewhat larger task than I expected (and I did expect it to be a lot of work), but it also involved moving over to NIR, in addition to re-writing the scheduler and register allocation passes, as well as a lot of re-org to ir3 in order to support multiple basic blocks.  The move to NIR was not strictly required, but it brings a lot of benefits in the form of shared support for conversion to SSA, scalarizing, CSE, DCE, constant folding, and algebraic optimizations.  And I figured it was less work in the long run to move to NIR first and drop the TGSI frontend, before doing all the refactoring needed to support loops and non-lowerable flow-control.  Incidentally, the compiler work should make the shader-compiler part of TF easier (since we need to generate a conditional write to TF buffer iff not overwriting past the end of the TF buffer).

In the mean time, freedreno and drm/msm have also gained support for the a306 gpu found in the new dragonboard 410c.  This board is a nice new low cost ($75) snapdragon community board based on the 64bit snapdragon 410.  And thanks to a lot of work by linaro and qualcomm, the upstream kernel situation for this board is looking pretty good.  It is shipping initially with a 4.0 based kernel (with patches on top for stuff that hadn't yet been merged for 4.0, including a lot of stuff backported from 4.1 and 4.2), including gpu/display/audio/video-codec/etc.  I believe that the 4.1 kernel was the first version where a vanilla kernel could boot on db410c with basic stuff (like serial console) working.  The kernel support for the gpu and display, other than the adv7533 hdmi bridge chip) landed in 4.2.  There is still more work to get *everything* (including audio, vidc, etc) merged upstream, but work continues in that direction, making this quite an exciting board.
Also, we have a GSoC student, Varad, working on freedreno support for android.  It is still in early stages, with some debugging still to do, but he has made a lot of progress and things are starting to work.
And since no blog post is complete without some nice screenshots...  the other day someone pointed me at a post in the dolphin forums about how dolphin was running on a420 (same device as in the ifc6540).  We all had a good laugh about the rendering issues with the blob driver.  But, since dolphin was the first gl3 game that worked with freedreno, I was curious how freedreno would do.. so I fired up the ifc6540 and replayed some dolphin fifo logs that would let me render approximately the same scenes:

Yoshi looks to be rendering pretty well.. digimon has a bit of corruption, but no where near as bad as the blob driver.  I suspect the issue with digimon is an instruction scheduling issue in the shader compiler (well, no rest for the gpu driver writers), but nice to see that it is already in pretty good shape.

Now we just need steam store or some unigine demos for arm linux :-P