Adds detection of additional CPU flags to cpu_detect and additions to telemetry output.
This is not exhaustive but guided by features that [dynarmic utilizes](bcfe377aaa/src/dynarmic/backend/x64/host_feature.h (L12-L33)) as well as features that are currently utilized but not reported to telemetry(invariant_tsc). This is intended to guide future optimizations.
AVX512 in particular is broken up into its individual subsets and some other processor features such as [sha](https://en.wikipedia.org/wiki/Intel_SHA_extensions) and [gfni](https://en.wikipedia.org/wiki/AVX-512#GFNI) are added to have some forward-facing data-points.
What used to be a single `CPU_Extension_x64_AVX512` telemetry field
is also broken up into individual `CPU_Extension_x64_AVX512{F,VL,CD,...}` fields.
Set the zero-enum value to Unknown
Move the Manufacterer enum into the CPUCaps structure namespace
Add "ParseManufacturer" utility-function
Fix cpu/brand string buffer sizes(!)
It is possible for virtual_offset to not be 0 when the iterator is at the beginning, and thus, std::prev(it) may be evaluated, leading to a crash in debug mode.
Co-Authored-By: Fernando S. <1731197+FernandoS27@users.noreply.github.com>
Was getting an unhandled `invalid_argument` [exception](https://en.cppreference.com/w/cpp/thread/thread/join) during
shutdown on my linux machine. This removes the need for a `StopBackendThread` function entirely since `jthread`
[automatically handles both checking if the thread is joinable and stopping the token before attempting to join](https://en.cppreference.com/w/cpp/thread/jthread/~jthread) in the case that `StartBackendThread` was never called.
Inlines implementation of exclusive instructions into JITted code,
improving performance of applications relying heavily on these
instructions.
We also fastmem these instructions for additional speed, with
support for appropriate recompilation on fastmem failure.
An unsafe optimization to disable the intercore global_monitor is also
provided, should one wish to rely solely on cmpxchg semantics for
safety.
See also: merryhime/dynarmic#664
Addresses https://github.com/yuzu-emu/yuzu/issues/7881 to fix linux
builds.
`YUZU_NON_COPYABLE` deletes the `T(const T&)` constructor which will
cause the implicitly defined default ctor/dtor to no-longer generate.
These functions allow to construct a string view from an input buffer, avoiding the copy done by the non string view counterparts. However, callers must be cognizant of the viewed buffer's lifetime to avoid a use-after-free.
The string constructor of UUID states:
Should the input string not meet the above requirements, an assert will be triggered and an invalid UUID is set instead.
This is a fixed and revised implementation of UUID that uses an array of bytes as its internal representation of a UUID instead of a u128 (which was an array of 2 u64s).
In addition to this, the generation of RFC 4122 Version 4 compliant UUIDs is also implemented.
In addition to requiring nanosecond precision, using the native clock requires that the hardware TSC has a precision greater than the emulated CPU and its clock counter.
If the build is from a non-repository, these functions will return empty. This
patch allows using defines to CMake to set version info such as
-DGIT_BRANCH=master.
CallbackStatus instances aren't the cheapest things to copy around
(relative to everything else), given that they're currently 520 bytes in
size and are currently copied numerous times when callbacks are invoked.
Instead, we can pass the status by const reference to avoid all the
copying.
In my testing, waiting for 200ms provided the same level of precision as the previous implementation when estimating the RDTSC frequency.
This significantly improves the yuzu executable launch times since we reduced the wait time from 3 seconds to 200 milliseconds.
Loop on stop_token and remove final_entry in Entry.
Move Backend thread out of Impl Constructor to its own function.
Add Start function for backend thread.
Use stop token in PopWait and check if entry filename is nullptr before logging.
VS2022 seems to introduce an optimization when moving vectors to check for equality of the element values. AlignmentAllocator needed to overload the equality operator to fix compilation of its usage in vector moving.
To keep the TAS inputs synced to the game speed even through lag spikes and loading zones, deeper access is required.
First, the `TAS::UpdateThread` has to be executed exactly once per frame. This is done by connecting it to the service method the game calls to pass parameters to the GPU: `Service::VI::QueueBuffer`.
Second, the loading time of new subareas and/or kingdoms (SMO) can vary. To counteract that, the `CPU_BOOST_MODE` can be detected: In the `APM`-interface, the call to enabling/disabling the boost mode can be caught and forwarded to the TASing system, which can pause the script execution if neccessary and enabled in the settings.
First of all, TASing requires a script to play back. The user can select the parent directory at `System -> Filesystem`, next to an option to pause TAS during loads: This requires a "hacky" setup deeper in the code and will be added in the last commit.
Also, Hotkeys are being introduced: CTRL+F5 for playback start/stop, CTRL+F6 for re-reading the script and CTRL+F7 for recording a new script.
The base playback system supports up to 8 controllers (specified by `PLAYER_NUMBER` in `tas_input.h`), which all change their inputs simulataneously when `TAS::UpdateThread` is called.
The recording system uses the controller debugger to read the state of the first controller and forwards that data to the TASing system for recording. Currently, this process sadly is not frame-perfect and pixel-accurate.
Co-authored-by: Naii-the-Baf <sfabian200@gmail.com>
Co-authored-by: Narr-the-Reg <juangerman-13@hotmail.com>
The log filter was being ignored on initialization due to the logging instance being initialized before the config instance, so the log filter was set to its default value.
This fixes that oversight, along with using descriptive exceptions instead of abort() calls.
Some system configurations may see visual regressions or lower performance using GPU decoding compared to CPU decoding. This setting provides the option for users to specify their decoding preference.
Co-Authored-By: yzct12345 <87620833+yzct12345@users.noreply.github.com>