ReinUsesLisp
4e35177e23
shader_ir: Implement VOTE
...
Implement VOTE using Nvidia's intrinsics. Documentation about these can
be found here
https://developer.nvidia.com/reading-between-threads-shader-intrinsics
Instead of using portable ARB instructions I opted to use Nvidia
intrinsics because these are the closest we have to how Tegra X1
hardware renders.
To stub VOTE on non-Nvidia drivers (including nouveau) this commit
simulates a GPU with a warp size of one, returning what is meaningful
for the instruction being emulated:
* anyThreadNV(value) -> value
* allThreadsNV(value) -> value
* allThreadsEqualNV(value) -> true
ballotARB, also known as "uint64_t(activeThreadsNV())", emits
VOTE.ANY Rd, PT, PT;
on nouveau's compiler. This doesn't match exactly to Nvidia's code
VOTE.ALL Rd, PT, PT;
Which is emulated with activeThreadsNV() by this commit. In theory this
shouldn't really matter since .ANY, .ALL and .EQ affect the predicates
(set to PT on those cases) and not the registers.
2019-08-21 14:50:38 -03:00
bunnei
cedc1aab4a
Merge pull request #2753 from FernandoS27/float-convert
...
Shader_Ir: Implement F16 Variants of F2F, F2I, I2F.
2019-08-21 10:27:57 -04:00
ReinUsesLisp
2ff8044806
shader_ir: Implement NOP
2019-08-04 03:02:55 -03:00
bunnei
52f54c728d
Merge pull request #2592 from FernandoS27/sync1
...
Implement GPU Synchronization Mechanisms & Correct NVFlinger
2019-07-26 14:26:44 -04:00
Fernando Sahmkow
a452ff983d
MaxwellDMA: Fixes, corrections and relaxations.
...
This commit fixes offsets on Linear -> Tiled copies, corrects z pos
fortiled->linear copies, corrects bytes_per_pixel calculation in tiled
-> linear copies and relaxes some limitations set by latest dma fixes
refactors.
2019-07-25 20:41:42 -04:00
bunnei
31e8a61527
Merge pull request #2743 from FernandoS27/surpress-assert
...
Downgrade and suppress a series of GPU asserts and debug messages.
2019-07-25 12:34:36 -04:00
bunnei
9be9600bdc
Merge pull request #2704 from FernandoS27/conditional
...
maxwell3d: Implement Conditional Rendering
2019-07-24 17:07:57 -04:00
bunnei
f601f25bcc
Merge pull request #2734 from ReinUsesLisp/compute-shaders
...
gl_rasterizer: Implement compute shaders
2019-07-22 11:12:55 -04:00
bunnei
27e10e0442
Merge pull request #2735 from FernandoS27/pipeline-rework
...
Rework Dirty Flags in GPU Pipeline, Optimize CBData and Redo Clearing mechanism
2019-07-21 00:59:52 -04:00
Fernando Sahmkow
11f4e739bd
Shader_Ir: Implement F16 Variants of F2F, F2I, I2F.
...
This commit takes care of implementing the F16 Variants of the
conversion instructions and makes sure conversions are done.
2019-07-20 17:38:25 -04:00
Fernando Sahmkow
7a35178ee2
Maxwell3D: Reorganize and address feedback
2019-07-20 10:18:35 -04:00
ReinUsesLisp
6c4985edc9
shader/half_set_predicate: Implement missing HSETP2 variants
2019-07-19 22:20:47 -03:00
Fernando Sahmkow
3a3fee5abf
MaxwellDMA/KeplerCopy: Downgrade DMA log message to Trace.
...
This log was just to know which games used DMA. It's no longer
important.
2019-07-18 08:31:38 -04:00
Fernando Sahmkow
4be61013a1
GL_State: Feedback and fixes
2019-07-17 17:29:56 -04:00
Fernando Sahmkow
5ad889f6fd
Maxwell3D: Address Feedback
2019-07-17 17:29:55 -04:00
Fernando Sahmkow
8cdbfe69b1
GL_Rasterizer: Corrections to Clearing.
2019-07-17 17:29:54 -04:00
Fernando Sahmkow
0ff4a5fa39
Maxwell3D: Correct marking dirtiness on CB upload
2019-07-17 17:29:53 -04:00
Fernando Sahmkow
fec32fed18
GL_Rasterizer: Rework RenderTarget/DepthBuffer clearing
2019-07-17 17:29:52 -04:00
Fernando Sahmkow
a081dea8ab
Maxwell3D: Implement State Dirty Flags.
2019-07-17 17:29:51 -04:00
Fernando Sahmkow
0d3db58657
Maxwell3D: Rework CBData Upload
2019-07-17 17:29:50 -04:00
Fernando Sahmkow
f2e7b29c14
Maxwell3D: Rework the dirty system to be more consistant and scaleable
2019-07-17 17:29:49 -04:00
Fernando Sahmkow
e42bcf2314
maxwell3d: Implement Conditional Rendering
...
Conditional Rendering takes care of conditionaly clearing or drawing
depending on a set of queries. This PR implements the query checks to
stablish if things can be rendered or not.
2019-07-17 17:13:19 -04:00
ReinUsesLisp
725ba6cf63
gl_rasterizer: Implement compute shaders
2019-07-15 17:38:25 -03:00
Fernando Sahmkow
1bdb59fc6e
Merge pull request #2695 from ReinUsesLisp/layer-viewport
...
gl_shader_decompiler: Implement gl_ViewportIndex and gl_Layer in vertex shaders
2019-07-15 16:28:07 -04:00
bunnei
3477b92289
Merge pull request #2675 from ReinUsesLisp/opengl-buffer-cache
...
buffer_cache: Implement a generic buffer cache and its OpenGL backend
2019-07-14 19:03:43 -04:00
Fernando Sahmkow
0ec9da2f9f
Merge pull request #2692 from ReinUsesLisp/tlds-f16
...
shader/texture: Add F16 support for TLDS
2019-07-14 08:44:38 -04:00
Fernando Sahmkow
8a6fc529a9
shader_ir: Implement BRX & BRA.CC
2019-07-09 08:14:37 -04:00
ReinUsesLisp
c9d886c84e
gl_shader_decompiler: Implement gl_ViewportIndex and gl_Layer in vertex shaders
...
This commit implements gl_ViewportIndex and gl_Layer in vertex and
geometry shaders. In the case it's used in a vertex shader, it requires
ARB_shader_viewport_layer_array. This extension is available on AMD and
Nvidia devices (mesa and proprietary drivers), but not available on
Intel on any platform. At the moment of writing this description I don't
know if this is a hardware limitation or a driver limitation.
In the case that ARB_shader_viewport_layer_array is not available,
writes to these registers on a vertex shader are ignored, with the
appropriate logging.
2019-07-07 20:42:55 -03:00
ReinUsesLisp
d0966b9f7c
shader/texture: Add F16 support for TLDS
2019-07-07 16:05:56 -03:00
ReinUsesLisp
7ecf64257a
gl_rasterizer: Minor style changes
2019-07-06 00:37:55 -03:00
Fernando Sahmkow
82b829625b
video_core: Implement GPU side Syncpoints
2019-07-05 15:49:11 -04:00
ReinUsesLisp
4d63f97945
shader_bytecode: Include missing <array>
2019-06-24 01:51:02 -03:00
Fernando Sahmkow
082740d34d
surface: Correct format S8Z24
2019-06-20 21:38:34 -03:00
Fernando Sahmkow
7232a1ed16
decoders: correct block calculation
2019-06-20 21:38:34 -03:00
Fernando Sahmkow
cb728797b0
fermi2d: Correct Origin Mode
2019-06-20 21:38:34 -03:00
Fernando Sahmkow
175aa343ff
texture_cache: Fermi2D reform and implement View Mirage
...
This also does some fixes on compressed textures reinterpret and on the
Fermi2D engine in general.
2019-06-20 21:38:33 -03:00
ReinUsesLisp
06c4ce8645
shader: Decode SUST and implement backing image functionality
2019-06-20 21:38:33 -03:00
ReinUsesLisp
b8c75a845b
maxwell_3d: Partially implement texture buffers as 1D textures
2019-06-20 21:36:12 -03:00
ReinUsesLisp
4e81fc8296
shader: Implement texture buffers
2019-06-20 21:36:12 -03:00
Fernando Sahmkow
d267948a73
texture_cache: loose TryReconstructSurface when accurate GPU is not on.
...
Also corrects some asserts.
2019-06-20 21:36:12 -03:00
Fernando Sahmkow
6bd034eae9
engine_upload: Addapt to new Texture Cache
2019-06-20 21:36:12 -03:00
ReinUsesLisp
345e73f2fe
video_core: Use un-shifted block sizes to avoid integer divisions
...
Instead of storing all block width, height and depths in their shifted
form:
block_width = 1U << block_shift;
Store them like they are provided by the emulated hardware (their
block_shift form). This way we can avoid doing the costly
Common::AlignUp operation to align texture sizes and drop CPU integer
divisions with bitwise logic (defined in Common::AlignBits).
2019-06-20 21:36:12 -03:00
bunnei
c7b5c245e1
Merge pull request #2562 from ReinUsesLisp/split-cbuf-upload
...
video_core/engines: Move ConstBufferInfo out of Maxwell3D
2019-06-17 22:35:04 -04:00
ReinUsesLisp
528c15051c
kepler_compute: Use std::array for cbuf info
2019-06-07 20:36:22 -03:00
ReinUsesLisp
17d5fb6d06
kepler_compute: Fix block_dim_x encoding
2019-06-07 20:35:46 -03:00
ReinUsesLisp
2f2a61887a
video_core/engines: Move ConstBufferInfo out of Maxwell3D
2019-06-07 19:47:15 -03:00
Fernando Sahmkow
a32c52b1d8
shader_bytecode: Mark EXIT as flow instruction
2019-06-04 12:18:35 -04:00
ReinUsesLisp
75e7b45d69
shader/memory: Implement ST (generic memory)
2019-05-20 22:41:53 -03:00
ReinUsesLisp
f78ef617b6
shader/memory: Implement LD (generic memory)
2019-05-20 22:38:59 -03:00
bunnei
d49efbfb4a
Merge pull request #2441 from ReinUsesLisp/al2p
...
shader: Implement AL2P and ALD.PHYS
2019-05-19 14:02:58 -04:00
Hexagon12
b54bd3f018
Merge pull request #2472 from FernandoS27/tic
...
maxwell_3d: reduce severity of different component formats assert.
2019-05-19 15:04:47 +01:00
Hexagon12
3bd5f01240
Merge pull request #2469 from lioncash/copyable
...
video_core/engines/maxwell_3d: Add is_trivially_copyable_v check for Regs
2019-05-19 15:02:17 +01:00
Sebastian Valle
a6ed792ac4
Merge pull request #2470 from lioncash/ranged-for
...
video_core/engines/maxwell_3d: Simplify for loops into ranged for loops within InitializeRegisterDefaults()
2019-05-19 09:01:19 -05:00
Fernando Sahmkow
fc975e9021
maxwell_3d: reduce sevirity of different component formats assert.
...
This was reduced due to happening on most games and at such constant
rate that it affected performance heavily for the end user. In general,
we are well aware of the assert and an implementation is already
planned.
2019-05-14 17:12:54 -04:00
Lioncash
b01cce716e
video_core/engines/engine_upload: Amend constructor initializer list order
...
Silences a -Wreorder warning.
2019-05-14 13:43:28 -04:00
Lioncash
9b6d993e52
video_core/engines/engine_upload: Default destructor in the cpp file
...
Avoids inlining destruction logic where applicable, and also makes
forward declarations not cause unexpected compilation errors depending
on where the State class is used.
2019-05-14 13:41:41 -04:00
Lioncash
ec1c69258a
video_core/engines/engine_upload: Remove unnecessary const on parameters in function declarations
...
These only apply in the definition of the function. They can be omitted
from the declaration.
2019-05-14 13:40:09 -04:00
Lioncash
0f83c8dffa
video_core/engines/engine_upload: Remove unnecessary includes
2019-05-14 13:39:04 -04:00
Lioncash
5db1b54b58
video_core/engines/maxwell3d: Get rid of three magic values in CallMethod()
...
We can use the named constant instead of using 32 directly.
2019-05-14 09:02:47 -04:00
Lioncash
48ce5880a0
video_core/engines/maxwell_3d: Simplify for loops into ranged for loops within InitializeRegisterDefaults()
...
Lessens the amount of code that needs to be read, and gets rid of the
need to introduce an indexing variable. Instead, we just operate on the
objects directly.
2019-05-14 08:53:19 -04:00
Lioncash
c212fc9b2c
video_core/engines/maxwell_3d: Add is_trivially_copyable_v check for Regs
...
std::memset is used to clear the entire register structure, which
requires that the Regs struct be trivially copyable (otherwise undefined
behavior is invoked). This prevents the case where a non-trivial type is
potentially added to the struct.
2019-05-14 08:47:56 -04:00
bunnei
c27b81cb85
Merge pull request #2429 from FernandoS27/compute
...
Corrections and Implementation on GPU Engines
2019-05-09 13:19:22 -04:00
ReinUsesLisp
d4df803b2b
shader_ir/other: Implement IPA.IDX
2019-05-02 21:46:37 -03:00
ReinUsesLisp
71aa9d0877
shader_ir/memory: Implement physical input attributes
2019-05-02 21:46:25 -03:00
ReinUsesLisp
bd81a03d9d
gl_shader_decompiler: Declare all possible varyings on physical attribute usage
2019-05-02 21:46:25 -03:00
ReinUsesLisp
7632a7d6d2
shader_bytecode: Add AL2P decoding
2019-05-02 21:46:25 -03:00
Fernando Sahmkow
e64c41efe8
Refactors and name corrections.
2019-05-01 15:31:39 -04:00
bunnei
c52233ec8b
Merge pull request #2322 from ReinUsesLisp/wswitch
...
video_core: Silent -Wswitch warnings
2019-04-28 22:24:58 -04:00
Fernando Sahmkow
b3118ee316
Fixes and Corrections to DMA Engine
2019-04-23 15:28:18 -04:00
Fernando Sahmkow
f1e5314f1a
Add Swizzle Parameters to the DMA engine
2019-04-23 11:21:00 -04:00
Fernando Sahmkow
e140e2ebc6
Add Documentation Headers to all the GPU Engines
2019-04-23 08:44:52 -04:00
Fernando Sahmkow
021d28c9b8
Corrections and styling
2019-04-23 08:02:24 -04:00
Fernando Sahmkow
701ce1c9d0
Implement Maxwell3D Data Upload
2019-04-22 19:27:36 -04:00
Fernando Sahmkow
e4ff140b99
Introduce skeleton of the GPU Compute Engine.
2019-04-22 19:05:43 -04:00
Fernando Sahmkow
a91d3fc639
Revamp Kepler Memory to use a subegine to manage uploads
2019-04-22 18:50:56 -04:00
bunnei
68b707711a
Merge pull request #2411 from FernandoS27/unsafe-gpu
...
GPU Manager: Implement ReadBlockUnsafe and WriteBlockUnsafe
2019-04-22 17:09:00 -04:00
bunnei
01100f8afd
Merge pull request #2400 from FernandoS27/corret-kepler-mem
...
Implement Kepler Memory on both Linear and BlockLinear.
2019-04-22 16:47:05 -04:00
bunnei
da0c3bc658
Merge pull request #2407 from FernandoS27/f2f
...
Do some corrections in conversion shader instructions.
2019-04-20 00:42:34 -04:00
ReinUsesLisp
fbe8d1ceaa
video_core: Silent -Wswitch warnings
2019-04-18 15:54:39 -03:00
bunnei
5bd5140bde
Merge pull request #2348 from FernandoS27/guest-bindless
...
Implement Bindless Textures on Shader Decompiler and GL backend
2019-04-17 20:59:49 -04:00
bunnei
0cfbd3325b
Merge pull request #2315 from ReinUsesLisp/severity-decompiler
...
shader_ir/decode: Reduce the severity of common assertions
2019-04-16 22:21:19 -04:00
Fernando Sahmkow
ef381e6924
Use ReadBlockUnsafe on TIC and TSC reading
...
Use ReadBlockUnsafe on TIC and TSC reading as memory is never flushed
from host GPU there.
2019-04-15 23:10:24 -04:00
Fernando Sahmkow
3e96c367bd
Use WriteBlock and ReadBlock.
2019-04-15 22:42:34 -04:00
Fernando Sahmkow
bec28d692d
Implement Block Linear copies in Kepler Memory.
2019-04-15 21:22:16 -04:00
Fernando Sahmkow
aa471274d9
Do some corrections in conversion shader instructions.
...
Corrects encodings for I2F, F2F, I2I and F2I
Implements Immediate variants of all four conversion types.
Add assertions to unimplemented stuffs.
2019-04-15 19:16:27 -04:00
Fernando Sahmkow
8a099ac99f
Correct Kepler Memory on Linear Pushes.
2019-04-15 14:51:36 -04:00
ReinUsesLisp
5c280e6ff0
shader_ir: Implement STG, keep track of global memory usage and flush
2019-04-14 00:25:32 -03:00
bunnei
353a099481
Merge pull request #2366 from FernandoS27/xmad-fix
...
Correct XMAD mode, psl and high_b on different encodings.
2019-04-09 19:15:01 -04:00
Fernando Sahmkow
5c55ae4e18
Correct LOP_IMN encoding
2019-04-08 13:39:12 -04:00
Fernando Sahmkow
16adc735a5
Correct XMAD mode, psl and high_b on different encodings.
2019-04-08 13:01:17 -04:00
Fernando Sahmkow
492040bd9c
Move ConstBufferAccessor to Maxwell3d, correct mistakes and clang format.
2019-04-08 11:36:11 -04:00
Fernando Sahmkow
4841440382
Implement TXQ_B
2019-04-08 11:29:52 -04:00
Fernando Sahmkow
ac3ba9a33e
Corrections to TEX_B
2019-04-08 11:28:44 -04:00
Fernando Sahmkow
7af82ca022
Implement Bindless Handling on SetupTexture
2019-04-08 11:23:46 -04:00
Fernando Sahmkow
e28fd3d0a5
Implement Bindless Samplers and TEX_B in the IR.
2019-04-08 11:23:42 -04:00
ReinUsesLisp
ddcb711ee8
maxwell_3d: Reduce severity of ProcessSyncPoint
2019-04-06 02:18:20 -03:00
bunnei
864280fabc
Merge pull request #2317 from FernandoS27/sync
...
Implement SyncPoint Register in the GPU.
2019-04-05 23:50:54 -04:00
Fernando Sahmkow
fc91e21206
Implement SyncPoint Register in the GPU.
2019-04-05 19:19:30 -04:00
Lioncash
22f02076c6
video_core/engines: Make memory manager members private
...
These aren't used externally by anything, so they can be made private
data members.
2019-04-05 18:26:43 -04:00
Lioncash
26223f8124
video_core/engines: Remove unnecessary inclusions where applicable
...
Replaces header inclusions with forward declarations where applicable
and also removes unused headers within the cpp file. This reduces a few
more dependencies on core/memory.h
2019-04-05 18:26:32 -04:00