Add code required to use OpenGL assembly programs based on
NV_gpu_program5. Decompilation for ARB programs is intended to be added
in a follow up commit. This does **not** include ARB decompilation and
it's not in an usable state.
The intention behind assembly programs is to reduce shader stutter
significantly on drivers supporting NV_gpu_program5 (and other required
extensions). Currently only Nvidia's proprietary driver supports these
extensions.
Add a UI option hidden for now to avoid people enabling this option
accidentally.
This code path has some limitations that OpenGL compatibility doesn't
have:
- NV_shader_storage_buffer_object is limited to 16 entries for a single
OpenGL context state (I don't know if this is an intended limitation, an
specification issue or I am missing something). Currently causes issues
on The Legend of Zelda: Link's Awakening.
- NV_parameter_buffer_object can't bind buffers using an offset
different to zero. The used workaround is to copy to a temporary buffer
(this doesn't happen often so it's not an issue).
On the other hand, it has the following advantages:
- Shaders build a lot faster.
- We have control over how floating point rounding is done over
individual instructions (SPIR-V on Vulkan can't do this).
- Operations on shared memory can be unsigned and signed.
- Transform feedbacks are dynamic state (not yet implemented).
- Parameter buffers (uniform buffers) are per stage, matching NVN and
hardware's behavior.
- The API to bind and create assembly programs makes sense, unlike
ARB_separate_shader_objects.
"Not equal" operators on GLSL seem to behave as unordered when we expect
an ordered comparison.
Manually emulate this checking for LGE values (numbers, not-NaNs).
This should fix grass interactions on Breath of the Wild on Vulkan.
It is currently untested against validation layers.
Nvidia's Windows 443.09 beta driver or Linux 440.66.12 is required for
now.
In file included from src/video_core/renderer_opengl/renderer_opengl.cpp:25:
In file included from src/./video_core/renderer_opengl/gl_rasterizer.h:26:
In file included from src/./video_core/renderer_opengl/gl_fence_manager.h:11:
src/./video_core/fence_manager.h:91:32: error: use 'template' keyword
to treat 'Write' as a dependent template name
memory_manager.Write<u32>(current_fence->GetAddress(), current_fence->GetPayload());
^
template
src/./video_core/fence_manager.h:137:32: error: use 'template'
keyword to treat 'Write' as a dependent template name
memory_manager.Write<u32>(current_fence->GetAddress(), current_fence->GetPayload());
^
template
Reduces some header churn and reduces rebuilds when some header
internals change.
While we're at it we can also resolve a missing include in buffer_cache.
Xenoblade 2 invokes a draw call with zero vertices.
This is likely due to indirect drawing (glDrawArraysIndirect).
This causes a crash in the staging buffer pool when trying to create a
buffer with a size of zero. To workaround this, skip index buffer setup
entirely when the number of indices is zero.
Drop MemoryBarrier from the buffer cache and use Maxwell3D's register
WaitForIdle.
To implement this on OpenGL we just call glMemoryBarrier with the
necessary bits.
Vulkan lacks this synchronization primitive, so we set an event and
immediately wait for it. This is not a pretty solution, but it's what
Vulkan can do without submitting the current command buffer to the queue
(which ends up being more expensive on the CPU).
Using deko3d as reference:
4e47ba0013/source/maxwell/gpu_3d_state.cpp (L42)
We were using bits 3 and 4 to determine depth clamping, but these are
the same both enabled and disabled:
state->depthClampEnable ? 0x101A : 0x181D
The same happens on Nvidia's OpenGL driver, where they do something like
this (default capabilities, GL 4.5 compatibility):
(state & DEPTH_CLAMP) != 0 ? 0x201a : 0x281c
There's always a difference between the first bits in this register, but
bit 11 is consistently disabled on both deko3d/NVN and OpenGL. This
commit changes yuzu's behaviour to use bit 11 to determine depth
clamping.
- Fixes depth issues on Super Mario Odyssey's intro.
This reverts commit 94b0e2e5da.
preserve_contents proved to be a meaningful optimization. This commit
reintroduces it but properly implemented on OpenGL.
We have to make sure the clear removes all the previous contents of the
image.
It's not currently implemented on Vulkan because we can do smart things
there that's preferred to be introduced in a separate commit.
Deduplicate code shared between vk_pipeline_cache and gl_shader_cache as
well as shader decoder code.
While we are at it, fix a bug in gl_shader_cache where compute shaders
had an start offset of a stage shader.
Signed integer addition overflow might be undefined behavior. It's free
to change operations to UAdd and use unsigned integers to avoid
potential bugs.