mesa/src
Jason Ekstrand c217ee8d35 nir: Insert b2b1s around booleans in nir_lower_to
By inserting a b2b1 around the load_ubo, load_input, etc. intrinsics
generated by nir_lower_io, we can ensure that the intrinsic has the
correct destination bit size.  Not having the right size can mess up
passes which try to optimize access.  In particular, it was causing
brw_nir_analyze_ubo_ranges to ignore load_ubo of booleans which meant
that booleans uniforms weren't getting pushed as push constants.  I
don't think this is an actual functional bug anywhere hence no CC to
stable but it may improve perf somewhere.

Shader-db results on ICL with iris:

    total instructions in shared programs: 16076707 -> 16075246 (<.01%)
    instructions in affected programs: 129034 -> 127573 (-1.13%)
    helped: 487
    HURT: 0
    helped stats (abs) min: 3 max: 3 x̄: 3.00 x̃: 3
    helped stats (rel) min: 0.45% max: 3.00% x̄: 1.33% x̃: 1.36%
    95% mean confidence interval for instructions value: -3.00 -3.00
    95% mean confidence interval for instructions %-change: -1.37% -1.29%
    Instructions are helped.

    total cycles in shared programs: 338015639 -> 337983311 (<.01%)
    cycles in affected programs: 971986 -> 939658 (-3.33%)
    helped: 362
    HURT: 110
    helped stats (abs) min: 1 max: 1664 x̄: 97.37 x̃: 43
    helped stats (rel) min: 0.03% max: 36.22% x̄: 5.58% x̃: 2.60%
    HURT stats (abs)   min: 1 max: 554 x̄: 26.55 x̃: 18
    HURT stats (rel)   min: 0.03% max: 10.99% x̄: 1.04% x̃: 0.96%
    95% mean confidence interval for cycles value: -79.97 -57.01
    95% mean confidence interval for cycles %-change: -4.60% -3.47%
    Cycles are helped.

    total sends in shared programs: 815037 -> 814550 (-0.06%)
    sends in affected programs: 5701 -> 5214 (-8.54%)
    helped: 487
    HURT: 0

    LOST:   2
    GAINED: 0

The two lost programs were SIMD16 shaders in CS:GO.  However, CS:GO was
also one of the most helped programs where it shaves sends off of 134
programs.  This seems to reduce GPU core clocks by about 4% on the first
1000 frames of the PTS benchmark.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338>
2020-03-30 15:46:19 +00:00
..
amd aco: Implement b2b32 and b2b1 2020-03-30 15:46:19 +00:00
broadcom meson: inline inc_common 2020-03-28 21:36:54 +01:00
compiler nir: Insert b2b1s around booleans in nir_lower_to 2020-03-30 15:46:19 +00:00
drm-shim meson: inline inc_common 2020-03-28 21:36:54 +01:00
egl scons: Prune out unnecessary targets. 2020-03-30 13:38:01 +00:00
etnaviv meson: inline inc_common 2020-03-28 21:36:54 +01:00
freedreno meson: inline inc_common 2020-03-28 21:36:54 +01:00
gallium etnaviv: compiled_framebuffer_state: get rid of SE_SCISSOR_* 2020-03-30 15:30:15 +00:00
gbm
getopt
glx scons: Prune out unnecessary targets. 2020-03-30 13:38:01 +00:00
gtest
hgl scons: Prune out unnecessary targets. 2020-03-30 13:38:01 +00:00
imgui
intel intel/nir: Run copy-prop and DCE after lower_bool_to_int32 2020-03-30 15:46:19 +00:00
loader
mapi scons: Prune out unnecessary targets. 2020-03-30 13:38:01 +00:00
mesa scons: Prune out unnecessary targets. 2020-03-30 13:38:01 +00:00
panfrost meson: inline inc_common 2020-03-28 21:36:54 +01:00
util meson: inline inc_common 2020-03-28 21:36:54 +01:00
vulkan vulkan: drop unused include directories 2020-03-28 21:36:54 +01:00
meson.build meson: inline inc_common 2020-03-28 21:36:54 +01:00
SConscript scons: Prune out unnecessary targets. 2020-03-30 13:38:01 +00:00