Commit Graph

5813 Commits

Author SHA1 Message Date
Fernando S
3843995ceb Merge pull request #6919 from ameerj/vk-int8-capability
vulkan_device: Add a check for int8 support
2021-08-25 23:46:08 +02:00
Ameer J
de71a4d70d Merge pull request #6894 from FernandoS27/bunneis-left-ear
GPU_MemoryManger: Fix GetSubmappedRange.
2021-08-25 16:50:03 -04:00
ameerj
4d535799eb vulkan_device: Add a check for int8 support
Silences validation errors when shaders use int8 without specifying its support to the API
2021-08-24 21:22:41 -04:00
ameerj
e0397f00d0 vk_rasterizer: Only clear depth and stencil buffers when set in attachment aspect mask
Silences validation errors for clearing the depth/stencil buffers of framebuffer attachments that were not specified to have depth/stencil usage.
2021-08-21 02:37:15 -04:00
Ameer J
bde6b899a1 Merge pull request #6888 from v1993/patch-3
video_core: eliminate constant ternary
2021-08-21 00:16:18 -04:00
Fernando Sahmkow
ef2066b272 GPU_MemoryManger: Fix GetSubmappedRange. 2021-08-19 22:57:22 +02:00
Valeri
4fd655cb46 video_core: eliminate constant ternary
`via_header_index` is already checked above, so it would never be true in this branch
2021-08-19 21:22:05 +03:00
ameerj
b384129c63 h264: Lower max_num_ref_frames
GPU decoding seems to be more picky when it comes to the maximum number of reference frames.
2021-08-16 14:40:53 -04:00
ameerj
cd016d3cb5 configure_graphics: Add GPU nvdec decoding as an option
Some system configurations may see visual regressions or lower performance using GPU decoding compared to CPU decoding. This setting provides the option for users to specify their decoding preference.

Co-Authored-By: yzct12345 <87620833+yzct12345@users.noreply.github.com>
2021-08-16 14:40:53 -04:00
ameerj
a832aa699f codec: Improve libav memory alloc and cleanup 2021-08-16 14:40:53 -04:00
ameerj
bc3efb79cc codec: Fallback to CPU decoding if no compatible GPU format is found 2021-08-16 14:40:53 -04:00
lat9nq
92bc51b66a cmake: Add VDPAU and NVDEC support to FFmpeg
Adds {h264_,vp9_}{nvdec,vdpau} hwaccels.
2021-08-16 14:40:52 -04:00
ameerj
537c6ac8fe vk_blit_screen: Fix non-accelerated texture size calculation
Addresses the potential OOB access in UnswizzleTexture.
2021-08-16 14:28:10 -04:00
Merry
1770503185 xbyak: Update include path 2021-08-15 19:26:38 +01:00
bunnei
87d63b858a Merge pull request #6861 from yzct12345/const-mempy-is-all-the-speed
decoders: Optimize memcpy for the other functions
2021-08-15 02:38:12 -07:00
bunnei
0509fe3377 Merge pull request #6838 from ameerj/sws-align
vic: Specify sws_scale height stride.
2021-08-12 11:28:33 -07:00
ameerj
356e10898f codec: Replace deprecated av_init_packet usage 2021-08-12 01:28:01 -04:00
ameerj
659039ca6d nvdec: Implement GPU accelerated decoding for all platforms
Supplements the VAAPI intel gpu decoder by implementing the D3D11VA decoder for Windows, and CUVID/VDPAU for Nvidia and AMD on drivers linux respectively.
2021-08-12 01:28:01 -04:00
yzct12345
430255caf8 decoders: Templates allow memcpy optimizations 2021-08-12 04:45:25 +00:00
Fernando S
6a082df427 Merge pull request #6820 from yzct12345/split-cache
texture_cache: Split out template definitions
2021-08-10 12:23:05 +02:00
ameerj
a779cede7c vic: Specify sws_scale height stride.
Silences a sws_scale runtime warning about unaligned strides.
2021-08-09 23:24:16 -04:00
Mai M
2da91ec75b Merge pull request #6844 from ameerj/vp9-empty-frame
vp9: Ensure the first frame is complete
2021-08-08 19:02:39 -04:00
ameerj
fa22695705 vp9: Ensure the first frame is complete
Silences a runtime error due to the first frame missing the frame data, and being set to hidden despite being a key-frame.
2021-08-08 13:49:00 -04:00
yzct12345
c4eafcc861 texture_cache: Address ameerj's review 2021-08-08 11:02:51 +00:00
Fernando S
859deda3bb Merge pull request #6834 from K0bin/buffer-image-granularity
Respect Vulkan bufferImageGranularity
2021-08-08 11:57:40 +02:00
bunnei
bd0e1d3a25 Merge pull request #6830 from ameerj/nvdec-unimpld-codec
nvdec: Better logging for unimplemented codecs
2021-08-07 12:37:39 -07:00
Robin Kertels
bb29dcb7f2 vulkan_memory_allocator: Respect bufferImageGranularity 2021-08-07 15:28:05 +02:00
ameerj
928b64d2ce nvdec: Better logging for unimplemented codecs 2021-08-07 01:08:33 -04:00
bunnei
268b5764c7 Merge pull request #6791 from ameerj/astc-opt
astc_decoder: Various performance and memory optimizations
2021-08-06 21:45:24 -07:00
yzct12345
e80323b8b0 texture_cache: Address ameerj's review 2021-08-07 01:27:47 +00:00
bunnei
f183668a87 Merge pull request #6799 from ameerj/vp9-fixes
nvdec: Fix VP9 reference frame refreshes
2021-08-06 17:46:46 -07:00
ameerj
e3688f0627 vp9: Cleanup unused variables
With reference frames refreshes fix, we no longer need to buffer two frames in advance.
We can also remove other unused or otherwise unneeded variables.
2021-08-06 20:08:11 -04:00
ameerj
a3f80a97a3 vp9: Fix reference frame refreshes
This resolves the artifacting when decoding VP9 streams.
2021-08-06 20:08:08 -04:00
yzct12345
02e98f6c93 texture_cache: Don't change copyright year 2021-08-05 20:52:12 +00:00
yzct12345
5566f3dbc0 texture_cache: Address ameerj's review 2021-08-05 20:46:24 +00:00
yzct12345
f9563c8f24 texture_cache: Split templates out 2021-08-05 13:52:30 +00:00
yzct12345
2868d4ba84 nvdec: Implement VA-API hardware video acceleration (#6713)
* nvdec: VA-API

* Verify formatting

* Forgot a semicolon for Windows

* Clarify comment about AV_PIX_FMT_NV12

* Fix assert log spam from missing negation

* vic: Remove forgotten debug code

* Address lioncash's review

* Mention VA-API is Intel/AMD

* Address v1993's review

* Hopefully fix CMakeLists style this time

* vic: Improve cache locality

* vic: Fix off-by-one error

* codec: Async

* codec: Forgot the GetValue()

* nvdec: Address ameerj's review

* codec: Fallback to CPU without VA-API support

* cmake: Address lat9nq's review

* cmake: Make VA-API optional

* vaapi: Multiple GPU

* Apply suggestions from code review

Co-authored-by: Ameer J <52414509+ameerj@users.noreply.github.com>

* nvdec: Address ameerj's review

* codec: Use anonymous instead of static

* nvdec: Remove enum and fix memory leak

* nvdec: Address ameerj's review

* codec: Remove preparation for threading

Co-authored-by: Ameer J <52414509+ameerj@users.noreply.github.com>
2021-08-03 23:43:11 -04:00
yzct12345
f56d0db5bd decoders: Optimize swizzle copy performance (#6790)
This makes UnswizzleTexture up to two times faster. It is the main bottleneck in NVDEC video decoding.
2021-08-02 11:18:58 -04:00
Fernando S
30f0b7cf31 Merge pull request #6720 from ameerj/vk-screenshot
renderer_vulkan: Implement screenshots
2021-08-01 13:31:33 +02:00
Ameer J
db32c3762b Merge pull request #6765 from ReinUsesLisp/y-negate-vk
vk_rasterizer: Flip viewport on Y_NEGATE
2021-08-01 01:47:37 -04:00
ameerj
c439fc9be9 astc_decoder: Reduce workgroup size
This reduces the amount of over dispatching when there are odd dimensions (i.e. ASTC 8x5), which rarely evenly divide into 32x32.
2021-08-01 01:22:27 -04:00
ameerj
5ab8053511 astc_decoder: Compute offset swizzles in-shader
Alleviates the dependency on the swizzle table and a uniform which is constant for all ASTC texture sizes.
2021-08-01 01:22:26 -04:00
ameerj
b2862e4772 astc_decoder: Make use of uvec4 for payload data 2021-07-31 22:28:04 -04:00
ameerj
a75d70fa90 astc_decoder: Simplify Select2DPartition 2021-07-31 21:36:26 -04:00
ameerj
5665d05547 astc_decoder: Optimize the use EncodingData
This buffer was a list of EncodingData structures sorted by their bit length, with some duplication from the cpu decoder implementation.
We can take advantage of its sorted property to optimize its usage in the shader.

Thanks to wwylele for the optimization idea.
2021-07-31 21:36:26 -04:00
ameerj
15c0c213b1 astc.h: Move data to cpp implementation
Moves leftover values that are no longer used by the gpu decoder back to the cpp implementation.
2021-07-31 21:26:42 -04:00
bunnei
7530594602 Merge pull request #6759 from ReinUsesLisp/pipeline-statistics
renderer_vulkan: Add setting to log pipeline statistics
2021-07-30 11:18:52 -07:00
ReinUsesLisp
b185567a03 vk_rasterizer: Flip viewport on Y_NEGATE
Matches OpenGL's behavior. I don't believe this register flips geometry,
but we have to try to match behavior on both backends.
2021-07-29 02:17:53 -03:00
ameerj
7ac99bb127 renderers: Add explicit invert_y bool to screenshot callback
OpenGL and Vulkan images render in different coordinate systems. This allows us to specify the coordinate system of the screenshot within each renderer
2021-07-28 21:46:08 -04:00
ameerj
75e7f54fb0 renderer_vulkan: Implement screenshots 2021-07-28 21:45:55 -04:00