Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thinkpad X280 and i915 driver: Screen freezes for seconds occasionally #2023

Closed
Letterus opened this issue Aug 14, 2024 · 19 comments
Closed

Comments

@Letterus
Copy link

What Happened?

Using the current OS 8 preview/daily, Wayland session, ThinkPad X280, Intel graphics, I observe a complete freeze of the whole screen (including the mouse pointer) for seconds. During this time it just hangs and would not change output, but seems to be processing the last action in the background and - after the freeze - present the result (e.g. scrolling down, opening a new window). This observation is not bound to a certain behaviour/triggering action and appears from time to time. It seems to me I experienced the same behaviour on my X250 using X11 session with OS 6.1/7.1 as well.

Any hints on how to debug this? I did not find any helpful messages in /var/log/syslog for the time this issue arose.

Steps to Reproduce

Just use the Pantheon desktop. Behaviour does not seem to relate to any specific action.

I am using the Wayland session, I have enabled fractional scaling and set it to 125%

Expected Behavior

The UI/screen should not freeze.

OS Version

8.x (Early Access)

Software Version

Latest release (I have run all updates)

Log Output

No response

Hardware Info

ThinkPad X280, Intel graphics.

@leolost2605
Copy link
Member

Just noting things down for future reference:

I currently don't and never have observed any of the described behavior on gala/wingpanel/dock main so maybe a hardware issue?

However during development I've seen similar symptoms arise when misusing the pantheon wayland protocol from the client side.

@Letterus
Copy link
Author

@leolost2605 Thanks for commenting. Think I experienced the same with X11 and the X250 at least. Don't know if there's anything specific about the Lenovo X series architecture.

Do you think any relation to #2024 is possible, some kind of overflow?

@Letterus
Copy link
Author

Letterus commented Aug 14, 2024

Further note: I am/was running the Nextcloud desktop client on both machines. Currently it seems the freezes don't occur when I quit the client, but occour more often when I edit and save or move a local file synced via Nextcloud. Does this sound reasonable to you?

Edit: After checking this twice this really seems related to the Nextcloud desktop client. But don't know where to start debugging yet.

@Letterus
Copy link
Author

I'm not experiencing this issue anymore since the last updates. I don't know if this is by coincidence. But I'm closing it for now and going to reopen it in case it occurs again.

@Letterus
Copy link
Author

Reopening this as occasional hangs keep occuring. I think it's not only the Nextcloud client but tasks with heavy IO that lead to the screen becoming stuck for some seconds. I don't know the code, but it seems to me there is some piece connected with IO that's not working async.

@Letterus Letterus reopened this Aug 22, 2024
@leolost2605
Copy link
Member

Hmm that could very well be. I think KDE had a similar problem about doing heavy caching. Are you running an HDD by chance?

@Letterus
Copy link
Author

Nope, only SSD.

Is there a good way of debugging? Which place could I start digging into the code and maybe set some debug messages?

@Letterus
Copy link
Author

Found the following log messages in /var/log/syslog close to the last hang:

2024-08-28T11:25:17.419787+02:00 XinkPad280 geoclue[1485]: Failed to query location: Query location SOUP error: Not Found
2024-08-28T11:26:21.179612+02:00 XinkPad280 kernel: workqueue: delayed_fput hogged CPU for >13333us 128 times, consider switching to WQ_UNBOUND
2024-08-28T11:27:56.455675+02:00 XinkPad280 geoclue[1485]: message repeated 4 times: [ Failed to query location: Query location SOUP error: Not Found]
2024-08-28T11:28:10.571720+02:00 XinkPad280 zeitgeist-datah[2119]: zeitgeist-datahub.vala:210: Error during inserting events: GDBus.Error:org.gnome.zeitgeist.EngineError.InvalidArgument: Incomplete event: interpretation, manifestation and actor are required

Don't know if any of these may cause the issue? geoclue or zeitgeist-datahub?

@Letterus
Copy link
Author

During next hang appeared again:

2024-08-28T11:50:27.210917+02:00 XinkPad280 zeitgeist-datah[2141]: zeitgeist-datahub.vala:210: Error during inserting events: GDBus.Error:org.gnome.zeitgeist.EngineError.InvalidArgument: Incomplete event: interpretation, manifestation and actor are required

@Letterus
Copy link
Author

Letterus commented Aug 29, 2024

Edit: It further freezes. Even without zeitgeist-datahub running.

@Letterus
Copy link
Author

I freshly installed and just started GNOME Contacts, which had to load quite some addressbooks and lots of contacts of mine - and the whole screen froze again for quite some seconds. It seems to be related to IO, but it may be some synchronous waits as well as Zeitgeist as Evolution Data Server.

@Letterus
Copy link
Author

I made it freeze again by using Starfish app and opening a domain (that was somehow hanging and using lots of CPU cycles which lead to the "app is not answering do you want to kill it?" dialogue).

Log:

2024-08-30T09:21:18.814890+02:00 XinkPad280 gala[1785]: meta_window_set_stack_position_no_sync: assertion 'window->stack_position >= 0' failed
2024-08-30T09:21:43.049638+02:00 XinkPad280 xdg-desktop-por[2351]: g_application_get_resource_base_path: assertion 'G_IS_APPLICATION (application)' failed
2024-08-30T09:21:43.190290+02:00 XinkPad280 xdg-desktop-por[2351]: GtkDialog mapped without a transient parent. This is discouraged.
2024-08-30T09:21:43.199444+02:00 XinkPad280 gala[1785]: meta_window_set_stack_position_no_sync: assertion 'window->stack_position >= 0' failed
2024-08-30T09:21:43.199673+02:00 XinkPad280 gala[1785]: clutter_actor_set_child_below_sibling: assertion 'child->priv->parent == self' failed
2024-08-30T09:21:43.199761+02:00 XinkPad280 gala[1785]: clutter_actor_set_child_above_sibling: assertion 'child->priv->parent == self' failed
2024-08-30T09:21:43.199874+02:00 XinkPad280 gala[1785]: message repeated 2 times: [ clutter_actor_set_child_above_sibling: assertion 'child->priv->parent == self' failed]
2024-08-30T09:21:43.303313+02:00 XinkPad280 gala[1785]: WindowManager.vala:916: No transient found
2024-08-30T09:21:50.461054+02:00 XinkPad280 xdg-desktop-por[2351]: GtkDialog mapped without a transient parent. This is discouraged.
2024-08-30T09:21:50.470182+02:00 XinkPad280 gala[1785]: meta_window_set_stack_position_no_sync: assertion 'window->stack_position >= 0' failed
2024-08-30T09:21:50.470628+02:00 XinkPad280 gala[1785]: clutter_actor_set_child_below_sibling: assertion 'child->priv->parent == self' failed
2024-08-30T09:21:50.470817+02:00 XinkPad280 gala[1785]: clutter_actor_set_child_above_sibling: assertion 'child->priv->parent == self' failed
2024-08-30T09:21:50.471103+02:00 XinkPad280 gala[1785]: message repeated 2 times: [ clutter_actor_set_child_above_sibling: assertion 'child->priv->parent == self' failed]
2024-08-30T09:21:50.514123+02:00 XinkPad280 gala[1785]: WindowManager.vala:916: No transient found
2024-08-30T09:21:57.936407+02:00 XinkPad280 xdg-desktop-por[2351]: GtkDialog mapped without a transient parent. This is discouraged.
2024-08-30T09:21:57.955717+02:00 XinkPad280 gala[1785]: meta_window_set_stack_position_no_sync: assertion 'window->stack_position >= 0' failed
2024-08-30T09:21:57.956572+02:00 XinkPad280 gala[1785]: clutter_actor_set_child_below_sibling: assertion 'child->priv->parent == self' failed
2024-08-30T09:21:57.956833+02:00 XinkPad280 gala[1785]: clutter_actor_set_child_above_sibling: assertion 'child->priv->parent == self' failed
2024-08-30T09:21:57.957192+02:00 XinkPad280 gala[1785]: message repeated 2 times: [ clutter_actor_set_child_above_sibling: assertion 'child->priv->parent == self' failed]
2024-08-30T09:21:57.992184+02:00 XinkPad280 gala[1785]: WindowManager.vala:916: No transient found
2024-08-30T09:22:15.391481+02:00 XinkPad280 systemd[1572]: app-flatpak-hr.from.josipantolis.starfish-23144.scope: Consumed 11.872s CPU time.

@Letterus
Copy link
Author

Letterus commented Sep 1, 2024

Two further freezes occured while doing some accidental stuff like scrolling through mails and opening the browser. In the syslog I only found entries about rtkit-daemon which I now disabled to see if it is causing the issues. If it is may that prove the theory that the freezing is related to synchronously handled DBus calls/events?

@Letterus
Copy link
Author

Letterus commented Sep 1, 2024

Still observing freezes. Maybe they are shorter now and there no log messages at those times anymore…

@Letterus
Copy link
Author

Letterus commented Sep 1, 2024

Coming back to @leolost2605's first proposal: Hardware/driver issue.

From time to time dmesg tells:

[ 1094.058291] workqueue: delayed_fput hogged CPU for >13333us 4 times, consider switching to WQ_UNBOUND
[ 1417.338303] workqueue: delayed_fput hogged CPU for >13333us 8 times, consider switching to WQ_UNBOUND

Symptoms look like the i915 driver hanging issue documented here:
https://bbs.archlinux.org/viewtopic.php?id=246841&p=2

Edit: Currently I'm trying to add i915.enable_psr=0 to the kernel parameters, but I had to do it manually during boot time. Changing it in /etc/default/grub and executing update-grub2 and update-initramfs -u -k all had no effect. I don't know yet why.

Edit 2, note: Check the effect with cat /proc/cmdline and sudo cat /sys/module/i915/parameters/enable_psr.

@Letterus
Copy link
Author

Letterus commented Sep 1, 2024

The latter is interesting: Neither land kernel options as boot parameters in grub nor does the updated kernel (-41) happen to be booted. It's still the old one (-40). What's going on there? /boot/grub/grub.cfg looks updated and correct though.

@Letterus
Copy link
Author

Letterus commented Sep 2, 2024

Working with the machine having the i915.enable_psr=0 kernel boot parameter enabled for some time now - no freezes at all up to now. So it looks as this really is the driver issue mentioned above (that does not take place with every DE apparently).

I now need to figure out how to make the fix permanent as configuring grub does not work as pointed out above? Maybe that is a separate issue for another repo?

@Letterus
Copy link
Author

Letterus commented Sep 2, 2024

Permanent fix works by creating the file /etc/modprobe.d/i915.conf

and entering:

options i915 enable_psr=0

Afterwards make sure to execute:
sudo update-initramfs -u -k all

Then reboot.

Check the effect with
sudo cat /sys/module/i915/parameters/enable_psr

Still don't know why grub parameters don't work and why it would load the older kernel.

@Letterus Letterus changed the title Screen freezes for seconds occasionally Thinkpad X280 and i915 driver: Screen freezes for seconds occasionally Sep 2, 2024
@Letterus
Copy link
Author

The resolution to graphic hangs is described above.

Boot issues are resolved by the last updates as documented in elementary/switchboard-plug-about#335.

Closing this issue as resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants