-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fork fixes issue where RP would not launch #283
base: master
Are you sure you want to change the base?
Conversation
QString truncates strings after the first occurance of a null character. If a sysfs file contains null characters in the middle of the contents, the output of getValueFromSysFsFile() is incomplete. This replaces all occurances of null chars in the QByteArray returned from QFile::readAll() before converting it to a QString. This also fixes segfaults on startup when incomplete pp_od_clk_voltage tables are being parsed.
If the PC returns from sleep/hibernation we can enforce restoration of the correct fan control mode by checking if the correct value is set for pwm_enable and restore the value if necessary
To get around the issue that PWM isn't restored after hibernation it's enough to check the pwm_enable file - we don't need to also store the current PWM state into a variable.
Make sure an item is actually selected and can be moved in the desired direction. I.e. don't try to move left if it is already the first item and don't move right if it is the last. Without this bounds check clicking on move left/right leads to crashes due to out of range acccesses on ui->listWidget.
…same ValueID type Changed ValueID from a simple enum to a class that contains two components: the ValueID (as before) and an instance id. This allows to store and address multiple instances of the same value type in gpu::gpuData using unique keys. This commit only adds the new class and updates syntax for some uses of ValueID, without changing any program logic. Overall the new ValueID class should be a drop in for the old enum if only one instance of a ValueID type is needed. Some explicit static_cast's to ints and type conversion to ints in settings.cpp and dialogs/dialog_deinetopbaritem.cpp needed to be updated though. This is prep work for introducing support for multiple temperature sensors when available by hwmon. Sensors will be distinguished by each having a unique ValueID::instance value per sensor for each of the TEMPERATURE_* valueID types.
When temperature sensors via the HWMON backend are supported, detect all available temperature sensors. If the labels provided by hwmon are known, we assign known instance ids to the ValueIDs of the sensor data. As of now, this includes edge, junction and mem temperature, which will be assigned the intance of T_EDGE, T_JUNCTION and T_MEM for all TEMPERATURE_* values. For those instances, getNameOfValueID() now returns the appropriate localized strings for UI labels. For all sensors detected with unknown labels, a free instance id is assigned, the label provided by HWMON is used to construct labels for the UI, as returned by getNameOfValueID().
makes data list aware of potential multiple instances of TEMPERATURE_CURRENT values from multiple temperature sensors. Readings from all sensors will now be listed properly.
makes radeon_profile::resetMinMax() aware of multiple temperature sensors. When the user requests a reset of the min/max values in the settings tab, now the min/max values for all temperature sensors will be reset.
Make topbar items aware of multiple instances of TEMPERATURE_CURRENT values, so user can display sensors readings from any sensor.
When a default topbar schema is created, make sure the main large label for temperature is using the T_EDGE sensor. Also conditionally generate a label pair for T_JUNCTION and T_MEM sensors, when available by the driver.
allow user to select the temperature sensor to monitor for each TEMPERATURE type event.
add a third column, display crit and emergency constants for temp sensors, if available.
Hysteresis was a global setting, applied independently of fan profile. This is somewhat confusing, as one could expect it to be an aspect of a fan profile, since one sets it in the fan profile tab. This adds a new FanProfile struct that can store additional settings besides the temperature steps and adds hysteresis as one of those settings. The global hysteresis value stored in older radeon-profile-settings will still be read and applied to all loaded profiles. Once new config files are written, this values is moved to the xml file and stored as attributes of each fan profile respectively.
Adds a new combo box to the fan profile tab to allow the user to select a temperature sensor as data source for this fan curve. radeon_profile::adjustFanSpeed() will now read the current temperature of the temperature sensor selected in the currently active fan profile. This setting is a per fan profile setting. It is stored in the newly added FanProfile struct and written to and read from the xml config file for each fan profile. By default, the edge temperature sensor is selected. This also applies when an older config file is read without the new config attribute.
Are you sure this merge would help? I tried it myself, and it doesn't work, failing with the same error message |
It does the trick for me, but could you provide more details, like driver version, distro, kernel parameters, etc. radeon-profile.mp4 |
True, the no file name specified might not be the actual problem. I also get a segmentation fault. My details are (i did also test before with mesa 21.2.x with the same result):
|
If my fork fixed your startup issue then it was probably this bug #279 , which is fixed in my branch. I made a pull request (#280) for my fixes as well, but marazmista seems to be inactive at the moment. Feel free to use my branch though, I commited a number of bug fixes in there. The "QFSFileEngine::open" error message from #278 seems unrelated to any of my fixes. It also is probably not the cause of the segfault, but a call to "QString::toUInt()" as Natim commented in #278. I'll see if I can reproduce that and find the bug. |
@emerge-e-world sadly your fork does not fix the issue... Without sudo your version starts, but with sudo i get the segfault... |
could you compile it with debug messages enabled and post the output leading up to the segfault?
Then run the resulting |
|
Thanks. It looks like there the error happens when parsing one of the pp tables (containing the power level / clock descriptions for your card). I suspect that the amdgpu drivers changed the output for navi 2x cards somewhat, and this may happen with other Navi 2x based cards as well. |
|
Thanks for the help! Really happy about getting some feedback, and hoping for the application to soon work again =). |
Parsing tables in pp_od_clk_voltage is somewhat fragile in the current implementation, as assumptions made on the contents of that file tend to not apply with newer cards. This may lead to segfaults that are easier to debug when users can submit debug output that somewhat shows where the code breaks.
Preliminary fixes to support navi2x cards. This is the first step to get radeon-profile functional with navi2x based cards, by working around pp_od_clk_voltage file without FreqVolt pair tables, but OD_SCLK/OD_MCLK ables like vega20. Proper voltage control may still require support for the new OD_VDDGFX_OFFSET value that is not implemented yet. OC features might be missing, but startup without segfaulting should work.
This is a temporary workaround to make the incomplete pp_od_clk_voltage parsing logic for navi2x work without segfaulting, when no voltage steps information is available.
When parsing the OD_RANGE or volt/freq pairs, three colums are expected in those tables. However, bounds checks only tested for the case when only one column (i.e. a new table header) is found. Properly abort parsing when less than 3 columns are found and ignore those lines.
You're welcome! I hope I'll be able to make it work for your card, and in extension navi cards overall. It is a bit trickier without the hardware to test - originally I started my fork to collect some general bug fixes and fix things that broke with my radeon VII in particular, but I think I have a good idea what is going wrong. Anyhow, I've got a preliminary version that should fix the segfault, and work around some things missing that may need some more work. Assuming that particular segfault is the only major problem with your card, it should now start and then be mostly functional. The part that may need more work is manual voltage/frequency settings in the overclocking tab. It may show nothing or not apply correctly. The slider for automatic percent overclock should likely work though. Anything else I expect to work. You find those fixes in this branch: You can clone it with: and then compile and run as usual:
In case it still crashes elsewhere, I added some more debug output to pin it down better. So if that happens, posting the output again would help. |
Thanks, so this actually works. I can now set the power cap as well. This is the log during that:
My card is water cooled, so i don't have any fans on them. I'll need to play around with the settings and see if i the overclock sliders work. Thanks anyhow for this start. |
How can i now enable the daemon, so i don't have to run as root? |
awesome, that looks very good. The fan control should most likely not be any different with navi, the debug output suggests this is detected correctly. So I'd assume this would work if you had an air cooler attached. We'll have to see if someone with another card can test this. The daemon is a separate package. I did not make any changes to the daemon for my fork, so you can just use the latest binary package from your distro if it has one (the package should be called "radeon-profile-daemon"), or compile it yourself from marazmista's repo here: To build follow the instructions from there (it's the same qmake && make steps) and then |
the changes to fix the startup segfault for navi2x (RX 6000 series cards) are now merged into the master branch in my repo. So anybody who had the same issue as @eitch, please use the master branch from: |
@emerge-e-world right, that worked. I installed it from the repo and am now able to start the client without sudo. Overclocking doesn't seem to do anything. Looking at the screen: I can't seem to be able to change the clocks manually, and the percent overclock does nothing, my clock always stays at 2475MHz. Any ideas? |
I think there may be a rather confusing UI bug in the General OC tab. Could you try this: enable both percent overclock and manual frequency control. Then move the core and/or mem clock percent slider up, then push apply at least twice. Now if the frequency values shown in the tables change it may have worked. If so you may try to run some workload and see if the clocks actually go higher. (They may just spike for short bursts – if your temps are getting to high. You can plot and monitor both in the Graphs tab.) Another idea: If you click on the states table tab, is there a "Set Ranges" button at the bottom shown? I expect not, but if there is, can you try to click on it and increase the "maximum core clock" slider there, then hit Save and Apply, and check if your clocks increase. |
I had a look at the amdgpu kernel driver doc to find any changes in the clock/voltage control for navi2x to figure out what needs changing. It seems – except for a new voltage offset feature – it should work the same as with navi1x and vega20 cards. And therefore in principle work with the existing code. However, your Maybe it is as simple as it being disabled. Did you boot with a amdgpu.ppfeaturemask kernel parameter? It is a parameter for the amdgpu driver to unlock some by default disabled driver features, including some overclocking bits. If you didn't, could you try the following? Add the parameter
then reboot.
The "amdgpu.ppfeaturemask=0xffffffff" string should be somewhere in there. If that worked, please post the output of:
again. If there is now an OD_VDDC_CURVE section in there, you can try again applying an overclock. There should then also be the "set ranges" button now available under the states table tab, where you can increase the max allowed clock speed. |
Sadly this is not the case, as i have had this feature activated quite a while already:
|
OK, well that would have been too easy. I'll see if I can work on getting the OD_RANGES settings to work (the set ranges button), since the values are there. Maybe not setting those caps the max allowed clocks. As well as exposing the new VDDGFX_OFFSET to the UI. It may take a couple of days since I can get to it though. In the mean time, could you check of the amdgpu drivers reports any errors when you try to apply OC settings? This should get you any kernel messages from the driver.
|
Sorry for the delay, i've been on holidays. Here is the listing:
|
It doesn't matter which combinations of buttons i press, the logs don't change. I'm on kernel 5.16.12 |
Does this also help you:
|
Tried it out on RX580 and I don't think that frequency curves are getting applied. Only "frequency control" toggles seems to work for sure (because it activates 1000 MHz intermediate memory state which is disabled by default in drivers due to causing flickering). "Percent overlock" might work but "GPU max core clk: / max mem clk: " startup message always shows default ones UNLESS
|
I was having this issue which prevented the application from starting altogether.
Since this seems to have been fixed in this fork, I think it would make sense to merge it upstream.