Commit graph

365 commits

Author SHA1 Message Date
Eric Anholt
eea6f21cbd freedreno: Fix invalid read when a block has no instructions.
We can't deref list_(first/last)_entries unless we know we have at least
one.  Instead, just use our IP we've been tracking as we go to set up the
start ip, and fill in the end IP as we walk instructions.

Fixes a complaint in valgrind on
dEQP-GLES3.functional.transform_feedback.* which sometimes has an
empty main (non-END) block when the VS inputs are just directly mapped
to outputs without any ALU ops.

Reviewed-by: Rob Clark <robdclark@chromium.org>
2019-09-16 22:02:43 +00:00
Rob Clark
b4df115d3f freedreno/a6xx: pre-calculate userconst stateobj size
The AnTuTu "garden" benchmark overflows the fixed size constbuffer
stateobject, so lets be more clever and calculate (a potentially
slightly pessimistic) actual size.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-09-12 18:07:20 -07:00
Kristian H. Kristensen
30ab3e39fd freedreno/a6xx: Implement primitive count queries on GPU
The driver can't determine PIPE_QUERY_PRIMITIVES_GENERATED or
PIPE_QUERY_PRIMITIVES_EMITTED once we support geometry or
tessellation, since these stages add primitives at runtime.  Use the
WRITE_PRIMITIVE_COUNTS event to write back the primitive counts and
implement a hw query for this.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-09-06 09:53:28 -07:00
Jonathan Marek
feea5986a9 freedreno/a2xx: formats update
For render formats, update fd2_pipe2color to only work with HW supported
render formats, and remove the format whitelist is_format_supported. This
patch enables float render formats (which work).

For vertex/texture formats, use a generic function which translates using
the bitsize of the channels. Since we fake support for some vertex formats,
check for these in is_format_supported to avoid enabling them as sampler
formats.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
2019-09-06 02:24:29 +00:00
Jonathan Marek
88ca73bcd0 freedreno/a2xx: implement polygon offset
Fixes failures in the following deqp tests:
dEQP-GLES2.functional.polygon_offset.*

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-09-06 02:24:29 +00:00
Vasily Khoruzhick
9367d2ca37 nir: allow specifying filter callback in lower_alu_to_scalar
Set of opcodes doesn't have enough flexibility in certain cases. E.g.
Utgard PP has vector conditional select operation, but condition is always
scalar. Lowering all the vector selects to scalar increases instruction
number, so we need a way to filter only those ops that can't be handled
in hardware.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-09-06 01:51:28 +00:00
Rob Clark
9baa72b7fc freedreno/ir3: allow copy propagation for relative
This appears to work fine (with the additional constraint of keeping the
indirect load in the same block that a0.x was loaded).

We can probably lift this restriction on earlier gens after testing.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-09-06 00:13:44 +00:00
Rob Clark
d9ad6f54dc freedreno/ir3: fix cp cmps.s opt
Need to use ir3_instr_set_address(), otherwise the instruction might not
get added to the indirects table.  This becomes a problem when we turn
on copy propagation for relative accesses, as check_instr() in the sched
pass won't realize there is an indirect consumer of address register
load that is ready to be scheduled.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-09-06 00:13:44 +00:00
Rob Clark
e59bfc820b freedreno/ir3: assert that only single address
An instruction can reference only a single address register value.
Add an assert to catch bugs.

Also, address value should also be local to the same block as the
instruction.

(The one spot where changing the instruction address is actually legit
needs to clear the address first.)

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-09-06 00:13:44 +00:00
Rob Clark
f94f22e87a freedreno/ir3: fix mad copy propagation special case
After the next patch enabling copy propagation for relative sources,
we'll need to dereference the n'th src in valid_flags(), so we actually
need to swap the sources before calling valid_flags().

But the logic was already a bit cumbersome, so move it into a helper
function.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-09-06 00:13:44 +00:00
Rob Clark
1fd6a91d4a freedreno/ir3: fix addr/pred spilling
The live_values and use_count was not being properly updated.  This
starts triggering problems with the next patch, where we allow copy
propagation for RELATIV access.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-09-06 00:13:44 +00:00
Rob Clark
50a91fbf87 freedreno/ir3: cleanup "partially const" ubo srcs
Move the constant part of the indirect offset into nir intrinsic base.
When we have multiple indirect accesses with different constant offsets,
this lets other opt passes clean up things to use a single address
register value.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-09-06 00:13:44 +00:00
Eric Engestrom
c4969b0a25 freedreno/drm-shim: fix mem leak
Fixes: 494ecef6b4 ("freedreno: Add support for drm-shim.")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-09-04 00:18:37 +01:00
Rob Clark
1ef459297c freedreno/ir3: use uniform base
When lowering from ubo, use the constant base field in the load_uniform
instruction for the constant part of the offset.  Doesn't change much
for constant indexing, but this will help for indirect indexing because
constant-folding can't completely clean up the result.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-09-03 14:10:57 -07:00
Rob Clark
305bcdf992 freedreno/drm: fix 64b iova shifts
Should shift before splitting 64b iova into dwords

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-09-03 14:10:57 -07:00
Alyssa Rosenzweig
3f9dc97124 freedreno/ir3: Link directly to Sethi-Ullman paper
Allow a direct link to the PDF itself from the authors themselves,
rather than a paywall splash page.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Acked-by: Rob Clark <robdclark@chromium.org>
2019-08-30 15:50:22 -07:00
Rob Clark
6167a63839 freedreno/ir3: do better job of marking convergence points
Fixes:
dEQP-GLES3.functional.shaders.switch.switch_in_do_while_loop_dynamic_vertex
dEQP-GLES3.functional.shaders.switch.switch_in_do_while_loop_dynamic_fragment

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-28 15:25:27 -07:00
Rob Clark
6af70aa2b4 freedreno/ir3: maintain predecessors/successors
While resolving jumps to skip intermediate jumps from the structured
CFG, maintain the successors and predecessors correctly.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-28 15:25:25 -07:00
Rob Clark
06bc4875ff freedreno/ir3: convert block->predecessors to set
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-28 15:25:19 -07:00
Jason Ekstrand
951cf94521 nir: Add explicit signs to image min/max intrinsics
This better matches all the other atomic intrinsics such as those for
SSBOs and shared variables where the sign is part of the intrinsic
opcode.  Both generators (GLSL and SPIR-V) know the sign from the type
of the image variable or handle.  In SPIR-V, signed min/max are separate
opcodes from unsigned.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-21 17:19:55 +00:00
Rob Clark
882d53d8e3 freedreno/ir3+a6xx: same VBO state for draw/binning
Worth ~+20% on gl_driver2

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-08-13 08:11:26 -07:00
Rob Clark
4a188e4215 freedreno/ir3: track # of driver params
To avoid emitting unneeded const state.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-08-13 08:11:26 -07:00
Rob Clark
5722149bf1 freedreno/ir3: drop unneeded ir3_ra() args
Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-08-13 08:08:07 -07:00
Rhys Perry
da8ed68aca nir: replace nir_move_load_const() with nir_opt_sink()
This is mostly the same as nir_move_load_const() but can also move
undef instructions, comparisons and some intrinsics (being careful with
loops).

v2: actually delete nir_move_load_const.c
v3: fix nir_opt_sink() usage in freedreno
v3: update Makefile.sources
v4: replace get_move_def with nir_can_move_instr and nir_instr_ssa_def
v4: handle if uses
v4: fix handling of nested loops
v5: re-write adjust_block_for_loops
v5: re-write setting of use_block for if uses

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Co-authored-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-12 22:01:30 +00:00
Caio Marcelo de Oliveira Filho
5ed4e31c08 spirv: Drop lower_workgroup_access_to_offsets
Intel drivers are not using this anymore, and turnip still don't have
Compute Shaders, so won't make a difference to stop using this option.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Rob Clark <robdclark@chromium.org>
2019-08-10 22:15:35 -07:00
John Stultz
fcfa2d1447 mesa: freedreno: Android.registers.mk: Fix up register xml.h file generation
The current Androdi.registers.mk file causes build failures that
look like:
 FAILED:
 external/mesa3d/src/freedreno/Android.registers.mk:49: error: implicit rules are obsolete: out/target/product/linaro_db845c/gen/STATIC_LIBRARIES/libfreedreno_registers_intermediates/registers/%.xml.h

Caused by the following Android build rule change:
https://android.googlesource.com/platform/build/+/HEAD/Changes.md#implicit_rules

I tried to replace this with something similar to the static
pattern suggested in the URL above, but ended up getting all the
xml.h files generated using only the first a2xx.xml source file.

So I've fallen back to explicitly defining the make rules for
each.

Additionally, we needed to provide the proper
LOCAL_EXPORT_C_INCLUDE_DIRS and add the defined static library
to the components that depend on the register headers.

Acked-by: Eric Anholt <eric@anholt.net>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2019-08-07 02:18:38 +00:00
John Stultz
96baf052b2 mesa: Add ir3/ir3_nir_imul.c generation to Android.mk
With current master we're seeing build failures with AOSP:
  error: undefined symbol: ir3_nir_lower_imul

This is due to the ir3_nir_imul.c file not being generated
in the Android.mk files.

This patch simply adds it to the Android build, after which
thigns build and book ok on db410c.

Cc: Rob Clark <robdclark@chromium.org>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Cc: Amit Pundir <amit.pundir@linaro.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Alistair Strachan <astrachan@google.com>
Cc: Greg Hartman <ghartman@google.com>
Cc: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2019-08-07 02:18:19 +00:00
Eric Engestrom
d2d85b950d meson: replace libmesa_util with idep_mesautil
This automates the include_directories and dependencies tracking so that
all users of libmesa_util don't need to add them manually.

Next commit will remove the ones that were only added for that reason.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Eric Anholt <eric@anholt.net>
Tested-by: Vinson Lee <vlee@freedesktop.org>
2019-08-03 00:08:37 +00:00
Rob Clark
44f3c1cf01 freedreno: update registers
Pull in some updates of VSC regs

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-02 10:24:14 -07:00
Rob Clark
e2bb3e84ab freedreno/drm: convert ring_pool to child_pool
Worth another couple percent at driver2

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-08-02 10:24:14 -07:00
Rob Clark
9ac23794c9 freedreno/drm: remove idx_lock
Since it ends up contended, it is a bit of a bottleneck for workloads
with high driver overhead.  Worth nearly +10% at gfxbench driver2.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-08-02 10:24:14 -07:00
Jonathan Marek
d8584c5cf2 freedreno: a2xx: implement texture tiling
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
2019-08-02 15:58:22 +00:00
Rob Clark
73cc2dc084 freedreno/ir3: fix for array/reg store vs meta instructions
fishgl.com has a shader which does roughly:

   foo = texture(...);
   if (bar)
      foo = texture(...);

after lowering phi webs to regs we end up w/ a vec4 reg (array).  But
since it was not an indirect access, we try to skip the extra mov.  This
results that the per-component fanout (split) meta instructions store
directly to the reg (array).  Which doesn't work out in RA.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-07-29 15:15:31 -07:00
Eric Anholt
91986fbbdb freedreno: Fix data race on making the shader's id.
The value is only used for IR3_DBG_DISASM, but it cleans up the
helgrind output.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-07-29 12:50:49 -07:00
Eric Anholt
6f0521b78c freedreno: Take a lock around shader variant creation.
Shaders are shared across contexts in gallium (part of making it so
that you get shader compile at link time, for shader-db and to reduce
compiles at draw time).  So, we need to protect from variant creation
for a shader from multiple threads at the same time.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-07-29 12:50:49 -07:00
Eric Anholt
6e3b220ad3 freedreno: Fix data races with allocating/freeing struct ir3.
There is a single ir3_compiler in the screen, and each context may be
compiling ir3 shaders, which call ir3_create.  ralloc doesn't do any
locking on its own, so eventually you can end up racing to break
ralloc's linked lists.

We really don't want struct ir3 to live as long as the compiler (maybe
struct ir3_shader's lifetime, if anything), so you'd better be freeing
it anyway.

Fixes: 8fe2076243 ("freedreno/ir3: convert over to ralloc")
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-07-29 12:50:49 -07:00
Eric Engestrom
d2de5b6ba2 anv+tu+radv: delete unusable dev_icd.json
As per previous commit, Meson doesn't support using uninstalled libs,
they're simply not ready until `ninja install` is ran, so delete them.

Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> # for anv
Reviewed-by: Eric Anholt <eric@anholt.net> # for tu
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> # for radv
2019-07-26 14:47:53 +00:00
Eric Anholt
494ecef6b4 freedreno: Add support for drm-shim.
I'm using this for shader-db analysis on x86_64 systems.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-07-25 08:56:19 -07:00
Eric Anholt
0d8a4c67cf freedreno: Convert nir_lower_tg4_to_tex to the NIR lowering helper.
Cuts a bunch of boilerplate.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-07-18 11:28:56 -07:00
Eric Anholt
56f4ede73d freedreno: Convert load_barycentric_at_sample to the NIR lowering helper.
Cuts out a ton of boilerplate.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-07-18 11:28:56 -07:00
Eric Anholt
61098baf42 freedreno: Convert load_barycentric_at_offset to the NIR lowering helper.
Cuts out a ton of boilerplate.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-07-18 11:28:56 -07:00
Kristian H. Kristensen
e03259974e freedreno: Generate headers from xml files
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Acked-by: Rob Clark <robdclark@gmail.com>
2019-07-10 22:05:02 +00:00
Eric Engestrom
1abae9e54a tu: add exported symbols check
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2019-07-10 11:27:51 +00:00
Sagar Ghuge
456557a837 nir: Add lower_rotate flag and set to true in all drivers
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Suggested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-01 10:14:22 -07:00
Rob Clark
02893fe73a freedreno: update generated registers
Corrects the a3xx texconst state for TILE_MODE.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-07-01 06:15:52 -07:00
Rob Clark
9753d7381c freedreno/ir3: small cleanup
`target` cannot be NULL here.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-06-28 13:02:59 -07:00
Rob Clark
016a9ab2f9 freedreno/ir3: fix missing (ss) in dummy bary.f case
In case we need to insert a dummy bary.f for the (ei) flag, it also
needs (ss) so we don't release varying storage to the next VS wave
before the ldlv completed.  Fixes random failures in:

dEQP-GLES3.functional.transform_feedback.random.interleaved.lines.*

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-28 13:02:59 -07:00
Eric Anholt
5c4289dd4b freedreno: Only upload the used part of UBO0 to the constant buffer.
We were pessimistically uploading all of it in case of indirection,
but we can just bump that when we encounter indirection.

total constlen in shared programs: 2529623 -> 2485933 (-1.73%)

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-06-24 14:23:07 -07:00
Eric Anholt
852704976a freedreno: Stop treating UBO 0 specially in UBO uploading.
ir3_nir_analyze_ubo_ranges() has already told us how much of cb0 we
need to upload (all of it, since it will lower indirect UBO 0 accesses
from load_ubo back to indirection on the constant buffer).

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-06-24 14:23:07 -07:00
Daniel Schürmann
165b7f3a44 nir: define behavior of nir_op_bfm and nir_op_u/ibfe according to SM5 spec.
That is: the five least significant bits provide the values of
'bits' and 'offset' which is the case for all hardware currently
supported by NIR and using the bfm/bfe instructions.
This patch also changes the lowering of bitfield_insert/extract
using shifts to not use bfm and removes the flag 'lower_bfm'.

Tested-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-06-24 18:42:20 +02:00