-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Super FX speed test #340
Comments
Nice test! Would it be possible to color the text green/red based to reflect correct/incorrect based on the hw results? |
That would get a bit complicated with the 2 different chip versions and 8 setting combinations. It is also complicated to decide if a small difference is acceptable because some tests like the plot tests need to be accurate to 1/8th of a cycle. The cycle count can change after 8 plot instructions because then it will write the pixel data to ram. |
Here is a comparison of some differences. These are not all the differences but I think it is enough for now :) The cycles mentioned below with the 10MHz tests are 10MHz cycles so 1 cycle = 2x 21MHz cycles. MiSTer vs Stunt Race FX (GSU1): 10MHz, MS0, No cache: 10MHz, MS0, Cache on 10MHz, MS1, Cache on 10MHz PLOT, Cache on The PLOT -> LOOP-> NOP loop takes 3 cycles so 8 plots takes 8x3= 24 cycles. This is enough cycles to save the secondary pixel cache to RAM for 4 & 16 color data without waiting so PLOT should only take 1 cycle. For 256 color PLOT is 0.125 cycles slower ($280 vs $266) so it seems to wait 1 cycle every 8 plots. PLOT with color #$FC should be treated as no-plot in 4 color transparent mode since low 2 bits are zero. 21MHz, MS0, No cache 21MHz, MS1, No cache 21MHz, MS0, Cache on 21MHz, MS1, Cache on 21MHz PLOT Cache on |
If i remember right GSU code was written as a functional analog, not cycle accurate. So, most likely it needs rework with cycle accuracy. |
With this list it may seem that not much is accurate but many of the instructions in 21MHz mode (and 10Mhz with cache) are accurate. Almost all of the instructions that are not accurate are about reading/writing from ROM/RAM and the multiplier instructions. |
Fixed some timings. I do not yet understand the logic of instructions |
Some from https://en.wikibooks.org/wiki/Super_NES_Programming/Super_FX_tutorial#Instruction_Set
ROM/RAM/Cache columns are execution time in cycles. LJMP seems pretty tight. o_O |
Thanks @srg320. I have some findings.
SNES_MiSTer/rtl/chip/GSU/GSU.vhd Lines 680 to 681 in a6daf9b
4-color transparency should only check the lower 2 bits so this should be added: SNES_MiSTer/rtl/chip/GSU/GSU.vhd Lines 1123 to 1131 in a6daf9b
I did some tests to figure out the PLOT pixel cache save logic: If executing from ROM or Cache and the pixel cache is being saved to RAM and it executes an For example executing the loop STB->PLOT->LOOP->NOP will only take 5 cycles @ 10Mhz because it doesn’t wait for the RAM writes. It must be interrupting the pixel cache save at the end of writing a byte because otherwise both pixel caches would fill up and PLOT would go into wait state. Here are some test roms.
|
That's interesting. Thanks.
I agree, executing an any RAM write instructions do not stop the queue of next instructions until any RAM access appears. And this is implemented in the core in last commit. |
I am also interested in the ROM access time when the cache is loaded. I suspect that this time is faster than the time to load byte from ROM. |
The tests on the first page at 21Mhz with Cache on seem to be all fixed. The plot tests also look good. 21Mhz without Cache and 10Mhz still need to be fixed. However the latest fixes caused everything executing from ROM at 21MHz to be 2 cycles too slow. From 5 to 7 cycles per byte. I have attached a test rom that runs the SFX code from ROM. Most results without cache should have the same results as the version that runs from Cart RAM, except for instructions that access RAM/ROM. For example Unfortunately I am unable to make reference captures for the ROM versions because that would need a modified Super FX cartridge.
Ok but I meant the RAM write buffer will have priority and will pause the pixel cache write. I will give an example from my test: ibt R0, #$34
iwt R3, #$1031
plots 7
cache
plot ; 8th plot, start pixel cache write (256-color 8 bytes)
stb (R3) ; pause pixel cache write, RAM buffer will write $34 to $701031
inc R0
; pixel cache will overwrite $701031 ($34) with $FF
Which ROM access do you mean? As far as I know ROM access is the same as RAM. 3 cycles at 10Mhz and 5 cycles at 21Mhz. The |
Ok. I wonder what the result would be if you add one or two
From this test you can see that in the Load/Store Word to/from RAM commands the second (MSB) access is shorter by 1 cycle. Perhaps when loading the cache (16 bytes sequential access) the access time is less than 5 cycles (some kind of burst mode). |
Here is a test that measures how long it takes for an instruction to complete. It counts in a loop until the SFX is stopped so higher numbers mean it took longer. Small differences don't matter so much. One 21MHz cycle results in a difference of around $66 (102) loops. For example
nop
is $0134 in 21Mhz cache mode which is 1 cycle.add #
is 2 cycles and results in $0199 loops.It can be run on an original Super FX cart by swapping the cartridge while the console is on. The code runs in WRAM on SNES and Cart RAM/Cache on Super FX.
Here are reference captures of a StarFox cart (Mario Chip), Stunt Race FX (GSU1) and Yoshi's Island (GSU2)
https://drive.google.com/drive/folders/15ac9U-x__n0AgOlWa3FGo5eEMShZYl5g?usp=sharing
The Mario Chip (v1) is unstable with reading/writing to Cart RAM. Some tests timeout which doesn't happen with the GSU chips.
Another difference with the Mario Chip is that the cache opcode will work immediately with GSU while it seems the Mario Chip needs 16 bytes to fill first so not all instructions are faster in this test with the StarFox cart.
The
ljmp
instruction is also quite weird. It takes much longer on the GSU chip than on Mario Chip. Not sure what's going on there.With cache off the MiSTer core runs faster in 10Mhz than 21Mhz which is strange.
Buttons:
Left/Right: Switch to different tests
Select: Toggle 10/21Mhz
Y: Toggle High speed multiplier
B: Toggle Cache
MiSTer captures:
sfx_test_MiSTer_captures.zip
https://drive.google.com/drive/folders/1noo2pRPoexCtVPgqSbzaexr61WOvjHNW?usp=sharing
Test rom:
SuperFX.sfc.zip
Source:
https://github.com/paulb-nl/sfx_speed_test
The text was updated successfully, but these errors were encountered: