On GFX6-7, the 8 and 16-bit integer add reductions use the 32-bit v_add
instruction, which clobbers the VCC register.
Cc: mesa-stable
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10346>
This is used when printing the program and to avoid updating register
demand during post-RA liveness analysis.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10315>
p_extract_vector copy-propagation can create byte sels for v2b operands.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10315>
It isn't intended to be accurate after RA, so num_waves can become zero,
breaking the sgpr_limit calculation.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10315>
We can just check whether tex_instr is NULL instead.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10036>
Make sure to preserve signed zeroes.
Fixes dEQP-VK.spirv_assembly.instruction.compute.opquantize.flush_to_zero
on GFX6 (Pitcairn). Untested on GFX7.
Fixes: 54a09545ec ("aco: optimize a*0.0")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10319>
It doesn't support Navi1x and the removal enables this nice code cleanup.
v2: rebase - mareko
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (v1)
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10199>
clang generates a warning if there's no explicit break or fall-through
annotation. The latter would be kind of silly in this case, and not
robust against any future changes turning the fall-through invalid.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10220>
One exception is src/amd/addrlib/, for which -Wimplicit-fallthrough is
explicitly disabled.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10220>
Otherwise, handle_loop_phis() might pass it to handle_live_in() and then
we could have two phis for this variable.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 7c64623e94 ("aco/ra: refactor SSA repairing during register allocation")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10236>
Late export is theoretically better if used with LATE_ALLOC,
but in practice, the early export has an advantage of
lower register usage, therefore more concurrent waves.
The idea of this commit is that "small" shaders benefit from early
primitive export more, due to being able to launch much more waves.
Let's consider a NIR shader "small" when it has only 1 block.
This yields both better performance, and better stats, than always
using late export.
Fossil DB on Sienna:
Totals from 12807 (8.76% of 146265) affected shaders:
VGPRs: 609128 -> 620216 (+1.82%); split: -0.01%, +1.83%
SpillSGPRs: 1458 -> 1538 (+5.49%)
CodeSize: 37028204 -> 37019320 (-0.02%); split: -0.17%, +0.14%
MaxWaves: 282902 -> 278516 (-1.55%)
Instrs: 7163142 -> 7162925 (-0.00%); split: -0.18%, +0.18%
VClause: 169285 -> 169547 (+0.15%); split: -1.15%, +1.30%
SClause: 267373 -> 267151 (-0.08%); split: -0.24%, +0.16%
Copies: 446442 -> 444567 (-0.42%); split: -2.68%, +2.26%
Branches: 156245 -> 156195 (-0.03%); split: -0.30%, +0.26%
PreSGPRs: 434701 -> 447396 (+2.92%)
PreVGPRs: 527783 -> 540527 (+2.41%)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10106>
The user-set priority of shaders matters very little, but we hope
this might still help speed up VS input loads especially.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10106>
We learned that the gs_alloc_req is not actually when the export
space allocation happens. So it makes no sense to prioritize it.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10106>
Previously, every wave had multiple active lanes read the LDS, and
the data was processed by VALU DPP instructions.
Now, only the first lane reads the LDS in order to avoid bank
conflicts, and the results are processed by SALU.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10155>
The previous pass was based on misconceptions and
rounded up with bug fixes. The new pass is entirely
rewritten and basically just one-to-one from the paper:
"Revisiting Out-of-SSA Translation for Correctness, CodeQuality, and Efficiency"
by B. Boissinot et al.
It also incorporates the value-equality testing.
The regressions are mainly due to creating parallelcopies for
exec phis at loop headers (mitigated in the next commit).
Totals from 4933 (3.61% of 136546) affected shaders (Raven):
SpillSGPRs: 16249 -> 16527 (+1.71%); split: -0.28%, +1.99%
SpillVGPRs: 1771 -> 1595 (-9.94%)
CodeSize: 57544436 -> 58280304 (+1.28%); split: -0.00%, +1.28%
Scratch: 176128 -> 179200 (+1.74%)
Instrs: 11265783 -> 11445884 (+1.60%); split: -0.00%, +1.60%
Latency: 552596156 -> 555880540 (+0.59%); split: -0.53%, +1.13%
InvThroughput: 271431862 -> 273097423 (+0.61%); split: -0.18%, +0.79%
VClause: 160240 -> 160241 (+0.00%); split: -0.02%, +0.02%
SClause: 386863 -> 386685 (-0.05%); split: -0.07%, +0.02%
Copies: 1180801 -> 1345633 (+13.96%); split: -0.02%, +13.98%
Branches: 379129 -> 393052 (+3.67%); split: -0.01%, +3.69%
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9196>
To decide which variables to spill, we must use the distances at the
beginning of the loop-header, and not the distances at the end of the
loop-preheader. The difference are that the former includes phis which are
viable to be spilled as opposed to the phi operands which would be reloaded
by add_coupling_code(), ending up in potentially too high register pressure
before the loop.
Totals from 206 (0.15% of 136546) affected shaders (Raven):
SpillSGPRs: 5154 -> 5000 (-2.99%)
CodeSize: 3654072 -> 3647184 (-0.19%); split: -0.19%, +0.00%
Instrs: 701482 -> 700526 (-0.14%); split: -0.14%, +0.00%
Latency: 40988780 -> 40872506 (-0.28%); split: -0.29%, +0.00%
InvThroughput: 20364560 -> 20306006 (-0.29%)
SClause: 20192 -> 20198 (+0.03%)
Copies: 77732 -> 77688 (-0.06%); split: -0.08%, +0.03%
Branches: 24204 -> 24050 (-0.64%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9196>
This also fixes some hypothetical issue for loops without phis
and for loops with higher register pressure at the end of the
loop preheader.
No fossil-db changes.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9196>
This handles the case of
exec = p_linear_phi %a, %b
where %a or %b might have been spilled.
By directly reloading these variables into the exec mask register,
we can avoid additional CSSA parallelcopies.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9196>
This shouldn't sign-extend.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10081>
This allows to force the VRS rates via RADV_FORCE_VRS, the supported
values are 2x2, 1x2 and 2x1. This supports the primitive shading rate
mode for non GUI elements.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7794>
The upper 32 bits are truncated before converting, which still produces
correct results since they never meaningfully contribute to the result.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9597>
The previous code produced incorrect results for inputs outside the
range [INT16_MIN, INT16_MAX].
A problematic case is e.g. i2f16 32768, which previously would be
converted to -32768.0 instead of returning the exactly representable
floating point result.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9597>
This function has evolved to be a generic helper function used throughout
the file, so having those assumptions written down explicitly and document
unsupported edge cases should help prevent incorrect use.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9597>
This doesn't change semantics but allows us to reject this potentially
ambiguous configuration in convert_int in a later change.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9597>
Previously, inputs such as 0x100000000 would have their upper 32-bits
ignored despite being representable by 32-bit floats.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9597>