For each draw, Citra will rebind all descriptor set slots and may redundantly re-bind descriptor-sets that were already bound. Instead it should only bind the descriptor-sets that have either changed or have had their buffer-offsets changed. This also allows entire calls to `vkCmdBindDescriptorSets` to be removed in the case that nothing has changed between draw calls.
* android: Android 14 support
* android: New home UI flow
Port of the yuzu-android home UI with a few Citra specific tweaks.
A few important things to note
- New and existing Citra users will be guided through the new setup flow
- Existing game directory location is discarded and will have to be reselected
- Protections around making sure the user has selected a user directory were reworked to fit this new UI. I removed async directory init and DirectoryStateReceivers and check during MainActivity's onResume callback.
- Removed Citra premium. The light/dark theme is now available for everyone.
* android: New blue app theme
* android: Extend UI into status/navigation bar area
* android: Remove yellow theme specific styles
* android: Disable status/navigation bar contrast enforcement
We handle it ourselves so there's no need to use a contrasty background on the system bars
* android: GPU Driver Manager
Includes a rewrite of FileUtil with some helper functions for the manager
* android: Rework NativeLibrary in Kotlin
Besides the rewrite this cleans up the alert dialogs that are used for system errors. Generally removes unused JNI code and makes things a little more consistent.
* android: Home menu support + downloader
* android: Enable minify and resource shrinking
* android: Remove premium page and expose texture filtering modes
* android: Update AGP to 8.1.2
* android: Don't display emulation in cutout area
We don't currently handle the notch properly in the emulation fragment so just don't render under it for now.
* android: native.cpp ClangFormat fixes
* core: SystemTitles: Include std::optional
Without it, the android build would fail
* vk: android: Properly override GetDriverLibrary
* vk_instance: Blacklist timeline semaphore ext on turnip
* vk_platform: Hardcode apiVersion to VK_API_VERSION_1_3
* android: native: Use const where applicable
* android: native: Array pointer access style fix
* android: Share relevant log
Shares the old log if it exists and you haven't booted a game yet and shares the current log if you have booted a game.
* android: Apply dark theme color for software keyboard text
---------
Co-authored-by: GPUCode <geoster3d@gmail.com>
* android: Unify DocumentNode's `key` and `name`
They're effectively the same data, just obtained in different ways.
* android: Remove getFilenameWithExtensions method
After the previous commit, there's only one remaining use of
getFilenameWithExtensions. Let's get rid of that one in favor of
DocumentFile.getName so we no longer need to do manual URI parsing.
* android: Use case insensitivity in DocumentsTree
External storage on Android is case insensitive. This is still the case
when accessing it through SAF. (Of course, SAF makes no guarantees about
whether the storage location picked by the user is backed by external
storage or whether it's case insensitive, but I'm just going to ignore
that for now because I am *so tired of SAF*)
Because the underlying file system is case insensitive, Citra's caching
layer that had to be implemented because SAF's performance is atrocious
also needs to be case insensitive. Otherwise, we get a problem in the
following scenario:
1. Citra wants to check if a particular folder exists in sdmc, and if
not, create it.
2. The folder does exist, but it has a different capitalization than
Citra expects, due to a mismatch between Citra's code and (typically)
files dumped from a real 3DS using ThreeSD.
3. Citra tries to open the folder, but DocumentsTree fails to find it,
because the case doesn't match.
4. Citra then tries to create the folder, but creating the folder fails,
because the underlying filesystem considers the folder to exist.
5. The game fails to start.
(Sorry, did I say creating the folder fails? Actually, a new folder does
get created, with " (1)" appended to the end of the name. SAF makes no
guarantees whatsoever about what happens in this situation – it's all
determined by the storage provider!)
This commit makes the caching layer case insensitive so that the
described scenario will work better.
* externals: Add oaksim submodule
Used for emitting ARM64 assembly
* common: Implement aarch64 ABI
Utilize oaknut to implement a stack frame.
* tests: Allow shader-jit tests for x64 and a64
Run the shader-jit tests for both x86_64 and arm64 targets
* video_core: Initialize arm64 shader-jit backend
Passes all current unit tests!
* shader_jit_a64: protect/unprotect memory when jit-ing
Required on MacOS. Memory needs to be fully unprotected and then
re-protected when writing or there will be memory access errors on
MacOS.
* shader_jit_a64: Fix ARM64-Imm overflow
These conditionals were throwing exceptions since the immediate values
were overflowing the available space in the `EOR` instructions. Instead
they are generated from `MOV` and then `EOR`-ed after.
* shader_jit_a64: Fix Geometry shader conditional
* shader_jit_a64: Replace `ADRL` with `MOVP2R`
Fixes some immediate-generation exceptions.
* common/aarch64: Fix CallFarFunction
* shader_jit_a64: Optimize `SantitizedMul`
Co-authored-by: merryhime <merryhime@users.noreply.github.com>
* shader_jit_a64: Fix address register offset behavior
Based on https://github.com/citra-emu/citra/pull/6942
Passes unit tests.
* shader_jit_a64: Fix `RET` address offset
A64 stack is 16-byte aligned rather than 8. So a direct port of the x64
code won't work. Fixes weird branches into invalid memory for any
shaders with subroutines.
* shader_jit_a64: Increase max program size
Tuned for A64 program size.
* shader_jit_a64: Use `UBFX` for extracting loop-state
Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>
* shader_jit_a64: Optimize `SUB+CMP` to `SUBS`
Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>
* shader_jit_a64: Optimize `CMP+B` to `CBNZ`
Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>
* shader_jit_a64: Use `FMOV` for `ONE` vector
Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>
* shader_jit_a64: Remove x86-specific documentation
* shader_jit_a64: Use `UBFX` to extract exponent
Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>
* shader_jit_a64: Remove redundant MIN/MAX `SRC2`-NaN check
Special handling only needs to check SRC1 for NaN, not SRC2.
It would work as follows in the four possible cases:
No NaN: No special handling needed.
Only SRC1 is NaN: The special handling is triggered because SRC1 is NaN, and SRC2 is picked.
Only SRC2 is NaN: FMAX automatically picks SRC2 because it always picks the NaN if there is one.
Both SRC1 and SRC2 are NaN: The special handling is triggered because SRC1 is NaN, and SRC2 is picked.
Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>
* shader_jit/tests:: Add catch-stringifier for vec2f/vec3f
* shader_jit/tests: Add Dest Mask unit test
* shader_jit_a64: Fix Dest-Mask `BSL` operand order
Passes the dest-mask unit tests now.
* shader_jit_a64: Use `MOVI` for DestEnable mask
Accelerate certain cases of masking with MOVI as well
Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>
* shader_jit/tests: Add source-swizzle unit test
This is not expansive. Generating all `4^4` cases seems to make Catch2
crash. So I've added some component-masking(non-reordering) tests based
on the Dest-Mask unit-test and some additional ones to test
broadcasts/splats and component re-ordering.
* shader_jit_a64: Fix swizzle index generation
This was still generating `SHUFPS` indices and not the ones that we wanted for the `TBL` instruction. Passes all unit tests now.
* shader_jit/tests: Add `ShaderSetup` constructor to `ShaderTest`
Rather than using the direct output of `CompileShaderSetup` allow a
`ShaderSetup` object to be passed in directly. This enabled the ability
emit assembly that is not directly supported by nihstro.
* shader_jit/tests: Add `CALL` unit-test
Tests nested `CALL` instructions to eventually reach an `EX2`
instruction.
EX2 is picked in particular since it is implemented as an even deeper
dispatch and ensures subroutines are properly implemented between `CALL`
instructions and implementation-calls.
* shader_jit_a64: Fix nested `BL` subroutines
`lr` was getting writen over by nested calls to `BL`, causing undefined
behavior with mixtures of `CALL`, `EX2`, and `LG2` instructions.
Each usage of `BL` is now protected with a stach push/pop to preserve
and restore teh `lr` register to allow nested subroutines to work
properly.
* shader_jit/tests: Allocate generated tests on heap
Each of these generated shader-test objects were causing the stack to
overflow. Allocate each of the generated tests on the heap and use
unique_ptr so they only exist within the life-time of the `REQUIRE`
statement.
* shader_jit_a64: Preserve `lr` register from external function calls
`EMIT` makes an external function call, and should be preserving `lr`
* shader_jit/tests: Add `MAD` unit-test
The Inline Asm version requires an upstream fix:
https://github.com/neobrain/nihstro/issues/68
Instead, the program code is manually configured and added.
* shader_jit/tests: Fix uninitialized instructions
These `union`-type instruction-types were uninitialized, causing tests
to indeterminantly fail at times.
* shader_jit_a64: Remove unneeded `MOV`
Residue from the direct-port of x64 code.
* shader_jit_a64: Use `std::array` for `instr_table`
Add some type-safety and const-correctness around this type as well.
* shader_jit_a64: Avoid c-style offset casting
Add some more const-correctness to this function as well.
* video_core: Add arch preprocessor comments
* common/aarch64: Use X16 as the veneer register
https://developer.arm.com/documentation/102374/0101/Procedure-Call-Standard
* shader_jit/tests: Add uniform reading unit-test
Particularly to ensure that addresses are being properly truncated
* common/aarch64: Use `X0` as `ABI_RETURN`
`X8` is used as the indirect return result value in the case that the
result is bigger than 128-bits. Principally `X0` is the general-case
return register though.
* common/aarch64: Add veneer register note
`LR` is generally overwritten by `BLR` anyways, and would also be a safe
veneer to utilize for far-calls.
* shader_jit_a64: Remove unneeded scratch register from `SanitizedMul`
* shader_jit_a64: Fix CALLU condition
Should be `EQ` not `NE`. Fixes the regression on Kid Icarus.
No known regressions anymore!
---------
Co-authored-by: merryhime <merryhime@users.noreply.github.com>
Co-authored-by: JosJuice <JosJuice@users.noreply.github.com>
Adds the current viewport and scissor to the dynamic pipeline state to
reduce redundant viewport/scissor assignments in the command buffer.
This greatly reduces the amount of API calls to `vkCmdSetViewport` and
`vkCmdSetScissor` by only emitting the API call when the state actually
changes.
* vk_stream_buf: Avoid protected memory heaps
* Add an "Exclude" argument when finding a memory-type that avoids
`VK_MEMORY_PROPERTY_PROTECTED_BIT` by default
* vk_stream_buf: Utilize dedicated allocations when preferred by driver
`VK_KHR_dedicated_allocation` is part of the core Vulkan 1.1
specification and should be utilized when `prefersDedicatedAllocation`
is set.
* game_list: Treat demos as applications
Allows the dumping of RomFS from demos.
* game_list: Add TODO about using bitmasks for title ID high checks.
---------
Co-authored-by: Steveice10 <1269164+Steveice10@users.noreply.github.com>
`DebugScope` was capturing a `string_view` in a lambda which is only
valid during the scope of this ctor. When the lambda gets invoked at a
later time, it will read undefined garbage. The lambda needs to make a
deep copy of this `string_view` into a `string` so that it is valid by
the time the scheduler invokes this lambda.
* Implement missing http:c functionality.
* More implementation details and cleanup.
* Organize code
* Disable treat errors as warnings for httplib
* Fix defines
* Remove pragmas that do nothing and mark as SYSTEM
* Make httplib system
* Try to fix issue from httplib
* Apply suggestions
* Fix header ordering
* Fix compilation issue
* Create and use ctx.CommandID()
* Add and use Common::TruncateString
* Apply more suggestions
* Apply suggestions
* Fix compilation
* Apply suggestions
* Fix format
* Revert SplitURL to previous version
* Apply suggestions
* qt: Partially fix Wayland on NVIDIA.
* qt: Fix Vulkan under Wayland.
Showing and hiding the window here messes up the surface,
causing an instant crash on load.
* qt: Properly set up GLES context when requested.
* video_core: Abstract shader generators.
* shader: Extract common generator structures and move generators to specific namespaces.
* shader: Minor fixes and clean-up.
* code: Prepare frontend for vulkan support
* citra_qt: Add vulkan options to the GUI
* vk_instance: Collect tooling info
* renderer_vulkan: Add vulkan backend
* qt: Fix fullscreen and resize issues on macOS. (#47)
* qt: Fix bugged macOS full screen transition.
* renderer/vulkan: Fix swapchain recreation destroying in-use semaphore.
* renderer/vulkan: Make gl_Position invariant. (#48)
This fixes an issue with black artifacts in Pokemon games on Apple GPUs.
If the vertex calculations differ slightly between render passes, it can
cause parts of model faces to fail depth test.
* vk_renderpass_cache: Bump pixel format count
* android: Custom driver code
* vk_instance: Set moltenvk configuration
* rasterizer_cache: Proper surface unregister
* citra_qt: Fix invalid characters
* vk_rasterizer: Correct special unbind
* android: Allow async presentation toggle
* vk_graphics_pipeline: Fix async shader compilation
* We were actually waiting for the pipelines regardless of the setting, oops
* vk_rasterizer: More robust attribute loading
* android: Move PollEvents to OpenGL window
* Vulkan does not need this and it causes problems
* vk_instance: Enable robust buffer access
* Improves stability on mali devices
* vk_renderpass_cache: Bring back renderpass flushing
* externals: Update vulkan-headers
* gl_rasterizer: Separable shaders for everyone
* vk_blit_helper: Corect depth to color convertion
* renderer_vulkan: Implement reinterpretation with copy
* Allows reinterpreteration with simply copy on AMD
* vk_graphics_pipeline: Only fast compile if no shaders are pending
* With this shaders weren't being compiled in parallel
* vk_swapchain: Ensure vsync doesn't lock framerate
* vk_present_window: Match guest swapchain size to vulkan image count
* Less latency and fixes crashes that were caused by images being deleted before free
* vk_instance: Blacklist VK_EXT_pipeline_creation_cache_control with nvidia gpus
* Resolves crashes when async shader compilation is enabled
* vk_rasterizer: Bump async threshold to 6
* Many games have fullscreen quads with 6 vertices. Fixes pokemon textures missing with async shaders
* android: More robust surface recreation
* renderer_vulkan: Fix dynamic state being lost
* vk_pipeline_cache: Skip cache save when no pipeline cache exists
* This is the cache when loading a save state
* sdl: Fix surface initialization on macOS. (#49)
* sdl: Fix surface initialization on macOS.
* sdl: Fix render window events not being handled under Vulkan.
* renderer/vulkan: Fix binding/unbinding of shadow rendering buffer.
* vk_stream_buffer: Respect non coherent access alignment
* Required by nvidia GPUs on MacOS
* renderer/vulkan: Support VK_EXT_fragment_shader_interlock for shadow rendering. (#51)
* renderer_vulkan: Port some recent shader fixes
* vk_pipeline_cache: Improve shadow detection
* vk_swapchain: Add missing check
* renderer_vulkan: Fix hybrid screen
* Revert "gl_rasterizer: Separable shaders for everyone"
Causes crashes on mali GPUs, will need separate PR
This reverts commit d22d556d30.
* renderer_vulkan: Fix flipped screenshot
---------
Co-authored-by: Steveice10 <1269164+Steveice10@users.noreply.github.com>
* sw_framebuffer: Take factors into account for min/max blending
* renderer_gl: Take factors into account for min/max blending
* Address review comments
* gl_shader_gen: Fix frambuffer fetch on qcom and mali
* renderer_opengl: Add fallback path for mesa
* gl_shader_gen: Avoid emitting blend emulation if minmax_factor is present
* renderer_software: Multi-thread processing
* Doubles the performance in most cases
* renderer_software: Move memory access out of the raster loop
* Profiling shows this has a significant impact
* savestates: add a build_name field to the header
* savestates: display build name on save/load menu
* savestates: add zero member to header just in case of UB from an older save state
* savestates: add legacy hash lookup
* savestate_data: update hash database
* rasterizer_cache: Dont consider res_scale during recycle
* rasterizer_cache: Switch to plain erase loop
* rasterizer_cache: Fix crash due to memory corruption
* renderer_gl: Make rasterizer normal class member
* It doesn't need to be heap allocated anymore
* gl_rasterizer: Remove default_texture
* It's unused
* gl_rasterizer: General cleanup
* gl_rasterizer: Lower case lambdas
* Match style with review comments from vulkan backend
* rasterizer_cache: Prevent memory leak
* Since the switch from shared_ptr these surfaces were no longer being destroyed properly. Use our garbage collector for that purpose to destroy it safely for both backends
* rasterizer_cache: Make temp copy of old surface
* The custom surface would override the memory region of the old region resulting in garbage data, this ensures the custom surface is constructed correctly
* citra_qt: Manually create dialog tabs
* Allows for custom constructors which is very useful. While at it, global state is now eliminated from configuration
* citra_qt: Eliminate global system usage
* core: Remove global system usage in memory and HIO
* citra_qt: Use qOverload
* tests: Run clang format
* gl_texture_runtime: Fix surface scaling
* Move mii to own namespace and add checksummed mii data
* Fix compile issues
* Make mii classes trivial and add cast operator
* Fix Android side
* Add new line at the end of files.
* Make miidata a struct and crc16 a u32_be as per switch code.
* Apply suggestions
* Change back crc to u16 and set padding to 0.
* rasterizer_cache: Sentence surfaces
* gl_texture_runtime: Remove runtime side allocation cache
* rasterizer_cache: Adjust surface scale during reinterpreration
* Fixes pixelated outlines. Also allows to remove the d24s8 specific hack and is more generic in general
* rasterizer_cache: Remove Expand flag
* Begone!
* rasterizer_cache: Cache framebuffers with surface id
* rasterizer_cache: Sentence texture cubes
* renderer_opengl: Move texture mailbox to separate file
* Makes renderer_opengl cleaner overall and allows to report removal threshold from runtime instead of hardcoding. Vulkan requires this
* rasterizer_cache: Dont flush cache on layout change
* rasterizer_cache: Overhaul framebuffer management
* video_core: Remove duplicate
* rasterizer_cache: Sentence custom surfaces
* Vulkan cannot destroy images immediately so this ensures we use our garbage collector for that purpose
* service/gsp: Implement saving of framebuffers in SaveVramSysArea.
* Address review comments.
* service/apt: Separate capture info and capture buffer info.
The former is used with the RequestForSysApplet message and GetCaptureInfo.
The latter is used with SendCaptureBufferInfo and ReceiveCaptureBufferInfo.
* service/apt: Add and implement more service commands.
* service/apt: Implement power button.
* Address review comments and fix GetApplicationRunningMode bug.
* kernel: Properly clean up process threads on exit.
* kernel: Track process-owned memory and free on destruction.
* apt: Implement DoApplicationJump via home menu when available.
* kernel: Move TLS allocation management to owning process.
When we targeted API <32, the notification permission would automatically be requested on startup. This restores that behavior temporarily while we work on new UX.
* shader_jit/tests: Add support for multiple inputs
Allows for multiple `Vec4f` inputs for each run
* shader_jit/tests: Add additional shader-jit tests
Add some more expansive tests for each of the shader-instructions for
regression-testing. `MAD`/`MADI` is not added due to an upstream bug in
nihstro:
https://github.com/neobrain/nihstro/issues/68
* android: Migrate to Kotlin DSL
Includes updates to all android dependencies/ndk (minus billing) and adds support for Kotlin, Android 13, and view binding.
* android: Remove unused tests
* android: Remove unused dependencies
Xbyak has a complete utility-class for determining the host-processor's
ISA-features such as SSE4.1, AVX, AVX2, AVX512{F,VL,DQ,VBMI,etc}, and so
on for further potential optimizations.
Was getting an unhandled `invalid_argument` [exception](https://en.cppreference.com/w/cpp/thread/thread/join) during
shutdown on my linux machine. This removes the need for a `StopBackendThread` function entirely since `jthread`
[automatically handles both checking if the thread is joinable and stopping the token before attempting to join](https://en.cppreference.com/w/cpp/thread/jthread/~jthread) in the case that `StartBackendThread` was never called.
Loop on stop_token and remove final_entry in Entry.
Move Backend thread out of Impl Constructor to its own function.
Add Start function for backend thread.
Use stop token in PopWait and check if entry filename is nullptr before logging.
This fixes a lost wakeup in SPSCQueue. If the reader is in just the right position, the writer's notification will be lost and this will be a problem if the writer then does something to wait on the reader.
This was discovered to affect my upcoming stacktrace PR. I don't think any performance decrease will be noticeable because an uncontended mutex is smart enough to skip the syscall. This PR might also resolve some rare deadlocks but I don't know of any examples.