Performance troubleshooting

Home Page Forums users Performance troubleshooting

Viewing 6 reply threads
  • Author
    Posts
    • #29301
      einar.hjortdal
      Participant

      When I begun my journey with openmamba I experienced very slow performance, I suspected it was exclusively due to the nuoveau driver not working right.

      The nvidia-470 package was then fixed and it seems to be running correctly on my system right now.

      Unfortunately, while the nvidia-470 did improve performance significantly, I am still experiencing very poor performance. And I am not sure what is causing it.

      I have looked at logs using KSystemLog: there does not appear to be any error+ level entries besides Failed to start colord.service. and [drm:drm_new_set_master [drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to grab modeset ownership. I do not believe the nvidia error is problematic, a google search suggests it is nothing to worry about.

      I develop web applications using VSCode and Edge. Starting VSCode takes minutes. Dragging tabs in Edge freezes the interface for a few seconds and animations are all bugging out as if the system could not render them. I need to work on some animations for a web frontend and the lag & bugginess is making it impossible.

      I am not sure how to figure out why I am experiencing this bad performance. I have verified that the issue is limited to openmamba as I don’t experience this in Windows and on a Fedora live cd. Please, give me directions on how to figure out the source of the problem.

    • #29302
      einar.hjortdal
      Participant

      I have noticed that this issue seems to go mostly away after waking up the system from sleep, and logging back in with sddm.

    • #29304
      Silvan
      Keymaster

      The described symptoms of low preformances are related to VSCode and Edge. According to a former report VSCode is manually installed in user home and the report showed a lot of processes running.

      In order to be able to help troubleshooting bad performances in openmamba installation itself it is adviced to report the issue when no custom software is installed and running, i.e. no scripts and processes from third party software or else recognize that the performance throubleshooting help requested is related to running custom installed VSCode and Edge. Additionally, openmamba provides VSCode with the package visual-studio-code-bin which I frequently use for software development.

      General comments follow.
      When observing slow performance with an openmamba component, i.e. extracting a tar archive as written in former report, the operation can be monitored for CPU and and I/O usage. If it is a CPU performance problem, tools would report 100% usage of one or more cores. A tool like cpupower-gui might help to check and set the CPU usage for appropriate performances in terms of clock frequency. When I/O is causing slowness, different things might be checked which I’m skipping here. When no system log errors are show, CPU and I/O usage are reported as idle while uncompressing but the user sees slowness it would be the kernel that is not working correctly. openmamba uses the LTS 6.6 kernel, if the hardware is recent it might be better supported by more recent kernel. This (which kernel version when it works better) can be cross-checked since you have reported a list of other Linux distribution where you don’t have the issue.

       

      • #29305
        einar.hjortdal
        Participant

        In order to be able to help troubleshooting bad performances in openmamba installation itself it is adviced to report the issue when no custom software is installed and running

        I would like to clarify: the issue shows even without user-installed VSCode and/or Microsoft Edge.

        The slowdown can be clearly experienced with Dolphin and KSystemLog, for example.

        It is however very obvious when VSCode and Edge are running because they’re more complex programs that use more system resources than anything else my openmamba installation has, as far as I know.

        openmamba uses the LTS 6.6 kernel, if the hardware is recent it might be better supported by more recent kernel

        The hardware is relatively old: intel i7 4790K and nvidia gtx 780, I do not believe kernel support is the issue.

        So far I can consistently replicate the disappearance of the issue after one single sleep/wake cycle. Something clearly happens with that, but I don’t know what.

        Every time I boot openmamba, I immediately put the system to sleep and wake it up, this resolves the issue until the next reboot.

        Given that sleep/wake reliably solves the issue, do you have any guess for what to investigate?

        A tool like cpupower-gui might help to check and set the CPU usage for appropriate performances in terms of clock frequency. When I/O is causing slowness, different things might be checked which I’m skipping here. When no system log errors are show, CPU and I/O usage are reported as idle while uncompressing but the user sees slowness it would be the kernel that is not working correctly.

        I can verify if cpupower-gui shows anything before/after a sleep/wake cycle.

    • #29307
      Silvan
      Keymaster

      You may want to try to remove the cpupower-gui package which is currently the only guess about CPU frequency improperly set.
      For further investigations from this side you may also want to send two reports: one after boot when the system is slow and another after a sleep/resume cycle.

      • #29308
        einar.hjortdal
        Participant

        Actually, I think cpupower-gui showed a lot! Thanks for the suggestion.

        After a boot, cpupower-gui shows each core is stuck at 800Mhz, after a sleep/wake it behaves as expected and boosts up to 4Ghz when needed.

        • #29309
          einar.hjortdal
          Participant

          Today after a system update and a clean boot, the problem was not replicable any longer. Maybe a patch fixed it?

        • #29310
          Silvan
          Keymaster

          Nothing was specifically patched. Generally speaking the only relevant update might be the kernel update but if you update daily it was not in today updates.

          I find it more likely that by running cpupower-gui you fixed its behaviour at startup, because in the old report there was this in the logs:

          cpupower-gui[5372]: Applying configuration...
          systemd[5330]: cpupower-gui-user.service: Main process exited, code=exited, status=255/EXCEPTION
          systemd[5330]: cpupower-gui-user.service: Failed with result 'exit-code'.
          systemd[5330]: Failed to start cpupower-gui-user.service.

          These considerations are based on and limited by the information I have.

        • #29311
          einar.hjortdal
          Participant

          I’ll try to be as precise as possible regarding what I did then. When you suggested to check the CPU frequency with cpupower-gui I did the following:

          1) fresh boot of openmamba. run cpupower-gui and noticed CPU being stuck at 800Mhz on all threads.

          2) sleep/wake the system. cpupower-gui shows CPU clock boosting correctly. Now I did update and a kernel update was included in the update. This doesn’t do good to isolating causes, I agree.

          3) rebooted. run cpupower-gui and noticed CPU boosting correctly.

          I was not aware that there is a cpupower-gui-user service. I do not know cpupower at all, I thought it was just a utility to show CPU behavior, I do not know if it actually does act on the CPU behavior.

          All I can say besides that is that this issue appeared in every installation of openmamba I have done on this system. I hope it’s gone for good on this installation!

    • #29575
      einar.hjortdal
      Participant

      This issue returned, after reinstalling openmbamba and removing .config

      I have not installed and run cpupower-gui yet, maybe we can investigate and find the root cause?

      • #29576
        einar.hjortdal
        Participant

        I f***ed around. This is before sleep-wake:

        
        $ grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_* 2>/dev/null | sed -n '1,40p'
        /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors:userspace performance schedutil
        /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq:800068
        /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver:intel_cpufreq
        /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:userspace
        /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq:4400000
        /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq:800000
        /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed:800000
        /sys/devices/system/cpu/cpu1/cpufreq/scaling_available_governors:userspace performance schedutil
        /sys/devices/system/cpu/cpu1/cpufreq/scaling_cur_freq:800087
        /sys/devices/system/cpu/cpu1/cpufreq/scaling_driver:intel_cpufreq
        /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor:userspace
        /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq:4400000
        /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq:800000
        /sys/devices/system/cpu/cpu1/cpufreq/scaling_setspeed:800000
        /sys/devices/system/cpu/cpu2/cpufreq/scaling_available_governors:userspace performance schedutil
        /sys/devices/system/cpu/cpu2/cpufreq/scaling_cur_freq:800145
        /sys/devices/system/cpu/cpu2/cpufreq/scaling_driver:intel_cpufreq
        /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor:userspace
        /sys/devices/system/cpu/cpu2/cpufreq/scaling_max_freq:4400000
        /sys/devices/system/cpu/cpu2/cpufreq/scaling_min_freq:800000
        /sys/devices/system/cpu/cpu2/cpufreq/scaling_setspeed:800000
        /sys/devices/system/cpu/cpu3/cpufreq/scaling_available_governors:userspace performance schedutil
        /sys/devices/system/cpu/cpu3/cpufreq/scaling_cur_freq:799928
        /sys/devices/system/cpu/cpu3/cpufreq/scaling_driver:intel_cpufreq
        /sys/devices/system/cpu/cpu3/cpufreq/scaling_governor:userspace
        /sys/devices/system/cpu/cpu3/cpufreq/scaling_max_freq:4400000
        /sys/devices/system/cpu/cpu3/cpufreq/scaling_min_freq:800000
        /sys/devices/system/cpu/cpu3/cpufreq/scaling_setspeed:800000
        /sys/devices/system/cpu/cpu4/cpufreq/scaling_available_governors:userspace performance schedutil
        /sys/devices/system/cpu/cpu4/cpufreq/scaling_cur_freq:800049
        /sys/devices/system/cpu/cpu4/cpufreq/scaling_driver:intel_cpufreq
        /sys/devices/system/cpu/cpu4/cpufreq/scaling_governor:userspace
        /sys/devices/system/cpu/cpu4/cpufreq/scaling_max_freq:4400000
        /sys/devices/system/cpu/cpu4/cpufreq/scaling_min_freq:800000
        /sys/devices/system/cpu/cpu4/cpufreq/scaling_setspeed:800000
        /sys/devices/system/cpu/cpu5/cpufreq/scaling_available_governors:userspace performance schedutil
        /sys/devices/system/cpu/cpu5/cpufreq/scaling_cur_freq:800000
        /sys/devices/system/cpu/cpu5/cpufreq/scaling_driver:intel_cpufreq
        /sys/devices/system/cpu/cpu5/cpufreq/scaling_governor:userspace
        /sys/devices/system/cpu/cpu5/cpufreq/scaling_max_freq:4400000
        

        This is after sleep-wake:

        
        $grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_* 2>/dev/null | sed -n '1,40p'
        /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors:userspace performance schedutil
        /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq:4000372
        /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver:intel_cpufreq
        /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:userspace
        /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq:4400000
        /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq:800000
        /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed:800000
        /sys/devices/system/cpu/cpu1/cpufreq/scaling_available_governors:userspace performance schedutil
        /sys/devices/system/cpu/cpu1/cpufreq/scaling_cur_freq:800000
        /sys/devices/system/cpu/cpu1/cpufreq/scaling_driver:intel_cpufreq
        /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor:userspace
        /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq:4400000
        /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq:800000
        /sys/devices/system/cpu/cpu1/cpufreq/scaling_setspeed:800000
        /sys/devices/system/cpu/cpu2/cpufreq/scaling_available_governors:userspace performance schedutil
        /sys/devices/system/cpu/cpu2/cpufreq/scaling_cur_freq:4000339
        /sys/devices/system/cpu/cpu2/cpufreq/scaling_driver:intel_cpufreq
        /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor:userspace
        /sys/devices/system/cpu/cpu2/cpufreq/scaling_max_freq:4400000
        /sys/devices/system/cpu/cpu2/cpufreq/scaling_min_freq:800000
        /sys/devices/system/cpu/cpu2/cpufreq/scaling_setspeed:800000
        /sys/devices/system/cpu/cpu3/cpufreq/scaling_available_governors:userspace performance schedutil
        /sys/devices/system/cpu/cpu3/cpufreq/scaling_cur_freq:4000242
        /sys/devices/system/cpu/cpu3/cpufreq/scaling_driver:intel_cpufreq
        /sys/devices/system/cpu/cpu3/cpufreq/scaling_governor:userspace
        /sys/devices/system/cpu/cpu3/cpufreq/scaling_max_freq:4400000
        /sys/devices/system/cpu/cpu3/cpufreq/scaling_min_freq:800000
        /sys/devices/system/cpu/cpu3/cpufreq/scaling_setspeed:800000
        /sys/devices/system/cpu/cpu4/cpufreq/scaling_available_governors:userspace performance schedutil
        /sys/devices/system/cpu/cpu4/cpufreq/scaling_cur_freq:800000
        /sys/devices/system/cpu/cpu4/cpufreq/scaling_driver:intel_cpufreq
        /sys/devices/system/cpu/cpu4/cpufreq/scaling_governor:userspace
        /sys/devices/system/cpu/cpu4/cpufreq/scaling_max_freq:4400000
        /sys/devices/system/cpu/cpu4/cpufreq/scaling_min_freq:800000
        /sys/devices/system/cpu/cpu4/cpufreq/scaling_setspeed:800000
        /sys/devices/system/cpu/cpu5/cpufreq/scaling_available_governors:userspace performance schedutil
        /sys/devices/system/cpu/cpu5/cpufreq/scaling_cur_freq:4001780
        /sys/devices/system/cpu/cpu5/cpufreq/scaling_driver:intel_cpufreq
        /sys/devices/system/cpu/cpu5/cpufreq/scaling_governor:userspace
        /sys/devices/system/cpu/cpu5/cpufreq/scaling_max_freq:4400000
        

        Definitely the cpu is stuck at 800mhz, after sleep-wake it can boost like normal. Do you know what could cause this?

    • #29577
      Silvan
      Keymaster

      The data points to thermald as a likely cause: the governor is set to “userspace” with scaling_setspeed=800000, which is how thermald controls CPU frequency for thermal management. It typically sets the minimum at boot as a starting point and should then raise it — but in your case it may not be doing so until the sleep-wake cycle triggers a thermal re-evaluation.

      It may be worth checking thermald’s logs to confirm:

      journalctl -u thermald

      In the meantime, switching the governor manually should help:

      echo "schedutil" | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

      Note that cpupower-gui is no longer installed by default and is no longer recommended. It has been replaced by thermald for Intel CPUs, which handles thermal management — keeping the CPU within safe temperature limits. Performance profile management (switching between power-saving, balanced and performance) is a separate concern, handled by power-profiles-daemon.

      From the distribution side, power-profiles-daemon has just been packaged and is currently in the devel-makedist repository, where it will go through testing before being made available in the base repository and installed automatically as a dependency of powerdevil for all users.

      • #29579
        einar.hjortdal
        Participant

        This is right after a boot

        
        $sudo journalctl -u thermald
        [...]
        -- Boot 55d42ef6be3b4e66ad022368c0fba8f4 --
        mag 31 11:26:35 ms7816 systemd[1]: Starting thermald.service...
        mag 31 11:26:35 ms7816 thermald[4773]: NO RAPL sysfs present
        mag 31 11:26:35 ms7816 thermald[4773]: 13 CPUID levels; family:model:stepping 0x6:3c:3 (6:60:3)
        mag 31 11:26:35 ms7816 thermald[4773]: NO RAPL sysfs present
        mag 31 11:26:35 ms7816 thermald[4773]: 13 CPUID levels; family:model:stepping 0x6:3c:3 (6:60:3)
        mag 31 11:26:36 ms7816 thermald[4773]: Using config file /etc/thermald/thermal-conf.xml
        mag 31 11:26:36 ms7816 thermald[4773]: Polling mode is enabled: 4
        mag 31 11:26:36 ms7816 systemd[1]: Started thermald.service.
        
        
        $grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_* 2>/dev/null | sed -n '1,40p' | grep scaling_cur_freq
        /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq:800019
        /sys/devices/system/cpu/cpu1/cpufreq/scaling_cur_freq:800068
        /sys/devices/system/cpu/cpu2/cpufreq/scaling_cur_freq:800313
        /sys/devices/system/cpu/cpu3/cpufreq/scaling_cur_freq:800067
        /sys/devices/system/cpu/cpu4/cpufreq/scaling_cur_freq:799757
        /sys/devices/system/cpu/cpu5/cpufreq/scaling_cur_freq:800054
        

        After a sleep/wake, thermald logs don’t have any new lines

        
        $sudo journalctl -u thermald
        [...]
        -- Boot 55d42ef6be3b4e66ad022368c0fba8f4 --
        mag 31 11:26:35 ms7816 systemd[1]: Starting thermald.service...
        mag 31 11:26:35 ms7816 thermald[4773]: NO RAPL sysfs present
        mag 31 11:26:35 ms7816 thermald[4773]: 13 CPUID levels; family:model:stepping 0x6:3c:3 (6:60:3)
        mag 31 11:26:35 ms7816 thermald[4773]: NO RAPL sysfs present
        mag 31 11:26:35 ms7816 thermald[4773]: 13 CPUID levels; family:model:stepping 0x6:3c:3 (6:60:3)
        mag 31 11:26:36 ms7816 thermald[4773]: Using config file /etc/thermald/thermal-conf.xml
        mag 31 11:26:36 ms7816 thermald[4773]: Polling mode is enabled: 4
        mag 31 11:26:36 ms7816 systemd[1]: Started thermald.service.
        
        
        $grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_* 2>/dev/null | sed -n '1,40p' | grep scaling_cur_freq
        /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq:4000319
        /sys/devices/system/cpu/cpu1/cpufreq/scaling_cur_freq:4000283
        /sys/devices/system/cpu/cpu2/cpufreq/scaling_cur_freq:800000
        /sys/devices/system/cpu/cpu3/cpufreq/scaling_cur_freq:4000328
        /sys/devices/system/cpu/cpu4/cpufreq/scaling_cur_freq:4002378
        /sys/devices/system/cpu/cpu5/cpufreq/scaling_cur_freq:4000768
        

        I did notice that cpuower-gui is no longer installed by default. I wait for further instructions to debug this issue.

        • #29581
          Silvan
          Keymaster

          The thermald log suggests a likely cause: “NO RAPL sysfs present” means thermald cannot find the Intel RAPL (Running Average Power Limit) interface it uses to monitor power consumption. Without it, thermald falls back to polling mode and sets the CPU to minimum frequency as a conservative measure, without knowing when it is safe to raise it.

          First, check whether the RAPL kernel module is available:

          ls /sys/class/powercap/

          If the directory is empty or missing, try loading it manually:

          sudo modprobe intel_rapl_common

          Then restart thermald and check whether the CPU frequency recovers:

          sudo systemctl restart thermald

          If RAPL is not available on your kernel build, an alternative is to disable thermald entirely — the i7-4790K has its own built-in thermal protection and will throttle itself if temperatures become critical:

          sudo systemctl disable --now thermald
          echo "schedutil" | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

          Note that power-profiles-daemon is now available in the repository, but it would not resolve this issue on its own as long as thermald is overriding the governor.

        • #29585
          einar.hjortdal
          Participant

          Interestingly, the directory does exist

          
          $ls /sys/class/powercap/
          intel-rapl  intel-rapl:0  intel-rapl:0:0  intel-rapl:0:1  intel-rapl:0:2
          
        • #29586
          Silvan
          Keymaster

          Interesting — RAPL is present, so thermald’s “NO RAPL sysfs present” message is unexpected. It may be a permissions issue accessing the files inside, or a bug in the installed version of thermald.

          It would be useful to see more of thermald’s log:

          journalctl -u thermald --no-pager | head -80

          In the meantime, since thermald is clearly not managing thermals correctly and the i7-4790K has its own built-in thermal protection, disabling it is likely a safe option:

          sudo systemctl disable --now thermald
          echo "schedutil" | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

          If the CPU frequency is normal after this, that would confirm thermald was the cause.

        • #29587
          einar.hjortdal
          Participant
          
          $sudo journalctl -u thermald --no-pager | head -80
          mag 26 20:19:37 ms7816 systemd[1]: Starting thermald.service...
          mag 26 20:19:38 ms7816 thermald[5176]: 13 CPUID levels; family:model:stepping 0x6:3c:3 (6:60:3)
          mag 26 20:19:38 ms7816 thermald[5176]: 13 CPUID levels; family:model:stepping 0x6:3c:3 (6:60:3)
          mag 26 20:19:38 ms7816 thermald[5176]: Using config file /etc/thermald/thermal-conf.xml
          mag 26 20:19:38 ms7816 thermald[5176]: Polling mode is enabled: 4
          mag 26 20:19:38 ms7816 systemd[1]: Started thermald.service.
          mag 26 21:38:58 ms7816 systemd[1]: Stopping thermald.service...
          mag 26 21:38:58 ms7816 thermald[5176]: Terminating ...
          mag 26 21:38:59 ms7816 thermald[5176]: terminating on user request ..
          mag 26 21:39:00 ms7816 systemd[1]: thermald.service: Deactivated successfully.
          mag 26 21:39:00 ms7816 systemd[1]: Stopped thermald.service.
          -- Boot e47ce3f9dccd46b9b18d9904e1a3badf --
          mag 26 21:39:35 ms7816 systemd[1]: Starting thermald.service...
          mag 26 21:39:35 ms7816 thermald[5098]: NO RAPL sysfs present
          mag 26 21:39:35 ms7816 thermald[5098]: 13 CPUID levels; family:model:stepping 0x6:3c:3 (6:60:3)
          mag 26 21:39:35 ms7816 thermald[5098]: 13 CPUID levels; family:model:stepping 0x6:3c:3 (6:60:3)
          mag 26 21:39:35 ms7816 thermald[5098]: Using config file /etc/thermald/thermal-conf.xml
          mag 26 21:39:35 ms7816 thermald[5098]: Polling mode is enabled: 4
          mag 26 21:39:35 ms7816 systemd[1]: Started thermald.service.
          mag 26 22:25:03 ms7816 systemd[1]: Stopping thermald.service...
          mag 26 22:25:03 ms7816 thermald[5098]: Terminating ...
          mag 26 22:25:04 ms7816 thermald[5098]: terminating on user request ..
          mag 26 22:25:05 ms7816 systemd[1]: thermald.service: Deactivated successfully.
          mag 26 22:25:05 ms7816 systemd[1]: Stopped thermald.service.
          -- Boot 3258e618cb914964a9fbf6125a131c9d --
          mag 26 22:25:42 ms7816 systemd[1]: Starting thermald.service...
          mag 26 22:25:43 ms7816 thermald[5117]: 13 CPUID levels; family:model:stepping 0x6:3c:3 (6:60:3)
          mag 26 22:25:43 ms7816 thermald[5117]: 13 CPUID levels; family:model:stepping 0x6:3c:3 (6:60:3)
          mag 26 22:25:43 ms7816 thermald[5117]: Using config file /etc/thermald/thermal-conf.xml
          mag 26 22:25:43 ms7816 thermald[5117]: Polling mode is enabled: 4
          mag 26 22:25:43 ms7816 systemd[1]: Started thermald.service.
          mag 26 22:48:20 ms7816 systemd[1]: Stopping thermald.service...
          mag 26 22:48:20 ms7816 thermald[5117]: Terminating ...
          mag 26 22:48:21 ms7816 thermald[5117]: terminating on user request ..
          mag 26 22:48:22 ms7816 systemd[1]: thermald.service: Deactivated successfully.
          mag 26 22:48:22 ms7816 systemd[1]: Stopped thermald.service.
          -- Boot 06b3845c283d478fb146c9e590047bf4 --
          mag 26 22:49:36 ms7816 systemd[1]: Starting thermald.service...
          mag 26 22:49:36 ms7816 thermald[5068]: NO RAPL sysfs present
          mag 26 22:49:36 ms7816 thermald[5068]: 13 CPUID levels; family:model:stepping 0x6:3c:3 (6:60:3)
          mag 26 22:49:36 ms7816 thermald[5068]: NO RAPL sysfs present
          mag 26 22:49:36 ms7816 thermald[5068]: 13 CPUID levels; family:model:stepping 0x6:3c:3 (6:60:3)
          mag 26 22:49:36 ms7816 thermald[5068]: Using config file /etc/thermald/thermal-conf.xml
          mag 26 22:49:37 ms7816 thermald[5068]: Polling mode is enabled: 4
          mag 26 22:49:37 ms7816 systemd[1]: Started thermald.service.
          mag 27 10:30:40 ms7816 thermald[5068]: Terminating ...
          mag 27 10:30:40 ms7816 systemd[1]: Stopping thermald.service...
          mag 27 10:30:41 ms7816 thermald[5068]: terminating on user request ..
          mag 27 10:30:42 ms7816 systemd[1]: thermald.service: Deactivated successfully.
          mag 27 10:30:42 ms7816 systemd[1]: Stopped thermald.service.
          -- Boot a9d80632af6a403bb89c1e67873c0fec --
          mag 27 16:42:49 ms7816 systemd[1]: Starting thermald.service...
          mag 27 16:42:49 ms7816 thermald[5014]: NO RAPL sysfs present
          mag 27 16:42:49 ms7816 thermald[5014]: 13 CPUID levels; family:model:stepping 0x6:3c:3 (6:60:3)
          mag 27 16:42:49 ms7816 thermald[5014]: NO RAPL sysfs present
          mag 27 16:42:49 ms7816 thermald[5014]: 13 CPUID levels; family:model:stepping 0x6:3c:3 (6:60:3)
          mag 27 16:42:49 ms7816 thermald[5014]: Using config file /etc/thermald/thermal-conf.xml
          mag 27 16:42:49 ms7816 thermald[5014]: Polling mode is enabled: 4
          mag 27 16:42:49 ms7816 systemd[1]: Started thermald.service.
          mag 31 11:25:58 ms7816 systemd[1]: Stopping thermald.service...
          mag 31 11:25:58 ms7816 thermald[5014]: Terminating ...
          mag 31 11:25:59 ms7816 thermald[5014]: terminating on user request ..
          mag 31 11:26:00 ms7816 systemd[1]: thermald.service: Deactivated successfully.
          mag 31 11:26:00 ms7816 systemd[1]: Stopped thermald.service.
          -- Boot 55d42ef6be3b4e66ad022368c0fba8f4 --
          mag 31 11:26:35 ms7816 systemd[1]: Starting thermald.service...
          mag 31 11:26:35 ms7816 thermald[4773]: NO RAPL sysfs present
          mag 31 11:26:35 ms7816 thermald[4773]: 13 CPUID levels; family:model:stepping 0x6:3c:3 (6:60:3)
          mag 31 11:26:35 ms7816 thermald[4773]: NO RAPL sysfs present
          mag 31 11:26:35 ms7816 thermald[4773]: 13 CPUID levels; family:model:stepping 0x6:3c:3 (6:60:3)
          mag 31 11:26:36 ms7816 thermald[4773]: Using config file /etc/thermald/thermal-conf.xml
          mag 31 11:26:36 ms7816 thermald[4773]: Polling mode is enabled: 4
          mag 31 11:26:36 ms7816 systemd[1]: Started thermald.service.
          mag 31 16:18:54 ms7816 thermald[4773]: Terminating ...
          mag 31 16:18:54 ms7816 systemd[1]: Stopping thermald.service...
          mag 31 16:18:55 ms7816 thermald[4773]: terminating on user request ..
          mag 31 16:18:56 ms7816 systemd[1]: thermald.service: Deactivated successfully.
          mag 31 16:18:56 ms7816 systemd[1]: Stopped thermald.service.
          

          I have applied these changes and rebooted, as suggested. The situation hasn’t changed, maybe the cause is something else?

          
          sudo systemctl disable --now thermald
          echo "schedutil" | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
          
          
          grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_* 2>/dev/null | sed -n '1,40p' | grep scaling_cur_freq
          /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq:800047
          /sys/devices/system/cpu/cpu1/cpufreq/scaling_cur_freq:800056
          /sys/devices/system/cpu/cpu2/cpufreq/scaling_cur_freq:800050
          /sys/devices/system/cpu/cpu3/cpufreq/scaling_cur_freq:800027
          /sys/devices/system/cpu/cpu4/cpufreq/scaling_cur_freq:800052
          /sys/devices/system/cpu/cpu5/cpufreq/scaling_cur_freq:800040
          

          Maybe the command was the issue.

          
          
          $cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
          userspace
          userspace
          userspace
          userspace
          userspace
          userspace
          userspace
          userspace
          

          I believe this should have been changed to schedutil, is that right? This changes at reboot.
          Without rebooting cpu frequency is as expected

          
          $grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_* 2>/dev/null | sed -n '1,40p' | grep scaling_cur_freq
          /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq:3165508
          /sys/devices/system/cpu/cpu1/cpufreq/scaling_cur_freq:3197528
          /sys/devices/system/cpu/cpu2/cpufreq/scaling_cur_freq:3197465
          /sys/devices/system/cpu/cpu3/cpufreq/scaling_cur_freq:2782051
          /sys/devices/system/cpu/cpu4/cpufreq/scaling_cur_freq:4260972
          /sys/devices/system/cpu/cpu5/cpufreq/scaling_cur_freq:4221910
          

          Therefore your suspicion was correct

    • #29590
      Silvan
      Keymaster

      The echo schedutil command at runtime restored normal frequency (3-4 GHz), which points to the userspace governor with 800 MHz as the cause. Since the governor still reverts to userspace after each reboot even with thermald disabled, it would be useful to check whether thermald is truly not running after the reboot:

      systemctl status thermald

      Also useful:

      cat /sys/devices/system/cpu/intel_pstate/status

      If thermald is stopped and disabled, the userspace governor being set at boot may be related to the intel_pstate driver mode. If the status is passive, the cpufreq framework is in use and something may be defaulting to userspace on this hardware.

      • #29591
        einar.hjortdal
        Participant
        
        $sudo systemctl status thermald
        ○ thermald.service - Thermal Daemon Service
        Loaded: loaded (/usr/lib/systemd/system/thermald.service; disabled; preset: enabled)
        Active: inactive (dead)
        $cat /sys/devices/system/cpu/intel_pstate/status
        passive
        
        • #29592
          Silvan
          Keymaster

          thermald is confirmed disabled and inactive, and intel_pstate is in passive mode (expected on Haswell without HWP support). The userspace governor being set at every boot may be related to the kernel default governor configuration, which is something that may need to be addressed at the distribution level.

          As a test that should also resolve the issue, you can add cpufreq.default_governor=schedutil to the kernel command line. Edit /etc/default/grub and add it to GRUB_CMDLINE_LINUX, then run:

          sudo update-grub

          After the next reboot the CPU governor should default to schedutil instead of userspace, and the frequency should behave normally without any manual intervention.

          From the distribution side, a kernel update is being prepared that sets schedutil as the default governor at boot, which should resolve the issue for all users without requiring any manual configuration.

Viewing 6 reply threads
  • You must be logged in to reply to this topic.