Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make timeout configurable #6936

Open
2 of 4 tasks
samcofer opened this issue Aug 7, 2024 · 8 comments
Open
2 of 4 tasks

Make timeout configurable #6936

samcofer opened this issue Aug 7, 2024 · 8 comments
Labels
bug Something isn't working

Comments

@samcofer
Copy link

samcofer commented Aug 7, 2024

Edit: Summary of timeouts to make configurable:

  • Inner process spawn
  • Web socket

Original issue:

Is there an existing issue for this?

  • I have searched the existing issues

OS/Web Information

  • Web Browser: Chrome
  • Local OS: Windows 10
  • Remote OS: Ubuntu 22
  • Remote Architecture: x86_64
  • code-server --version:

4.14.1 5c19962 with Code 1.79.2

Steps to Reproduce

  1. Install the code-server version above, and launch it using this command: /usr/lib/rstudio-server/bin/code-server/bin/code-server --verbose --host=0.0.0.0 --port=40400 --disable-getting-started-override --disable-update-check --disable-telemetry

Expected

code server should start as expected on port 40400

Actual

The following error messages are spawned:

[2024-07-24T15:47:42.792Z] debug parent:3106566 spawned child process 3107105
[2024-07-24T15:47:52.794Z] error timed out
[2024-07-24T15:47:52.795Z] debug parent:3106566 disposing �[38;2;140;140;140m{}�[0m

Included a strace of the startup:
108342_2024-08-05-14_18_30_24Jul24Codeserver.log

Logs

No response

Screenshot/Video

No response

Does this bug reproduce in native VS Code?

This cannot be tested in native VS Code

Does this bug reproduce in GitHub Codespaces?

This cannot be tested in GitHub Codespaces

Are you accessing code-server over a secure context?

  • I am using a secure context.

Notes

This was previously working, and the only time-related change that I've been able to identify is a restriction on outbound traffic for most URLs. The only clue I have at the moment is that the way outbound traffic has been blocked doesn't immediately show as connection blocked/no route to host. curl commands to blocked addresses get stuck at Trying and essentially never time out. I've used code-server in a completely offline context, but I'm wondering if there's some unexpected bug or error happening because some outbound connectivity test being executed by a child process just never returns.

@samcofer samcofer added bug Something isn't working triage This issue needs to be triaged by a maintainer labels Aug 7, 2024
@code-asher
Copy link
Member

code-asher commented Aug 7, 2024

Does this happen with the latest code-server as well? (4.91.1)

I think this is probably unrelated to the network, that timeout I believe is for the handshake over IPC with the child process after spawning the child process, and there are no network calls happening there. So either the spawn itself has failed or Node has failed to set up the IPC correctly between the child and parent processes, somehow, but honestly I have no idea how that would happen.

@code-asher code-asher removed the triage This issue needs to be triaged by a maintainer label Aug 7, 2024
@samcofer
Copy link
Author

samcofer commented Aug 7, 2024

This environment is disconnected from the internet, so getting new versions of software in there is understandably tough. I can take a crack at using latest and see if that helps with this scenario. I can see if I can devise a way to determine if the child process spawning is failing, maybe ulimits or app-armor or something. If I can ask, what is the child process here attempting to do? That might help me to narrow down the weirdness.

@code-asher
Copy link
Member

Updating probably will not help anyway, I do not believe the spawning code has changed in a very long time. But it is probably worth double-checking just in case.

The child process is the "real" code-server process. The flow is:

  1. code-server checks for a CODE_SERVER_PARENT_PID environment variable
  2. Sees it does not exist, so it spawns the child (by forking) and sets CODE_SERVER_PARENT_PID
  3. The parent waits for the child to send a message and times out if it does not get a message in 10 seconds
  4. The child sees CODE_SERVER_PARENT_PID, so it sends a message to the parent
  5. The parent gets the message and sends one back
  6. The child gets the parent's message and moves on to actually start up code-server (set up the web server, import VS Code, etc)

The reason for this architecture is so that code-server can restart itself without having to spawn an entirely new process (so you can send it a USR2 signal for example to restart the inner process). Actually the original reason was to restart itself after an update but code-server no longer auto-updates so it is just the signal thing now.

@code-asher
Copy link
Member

I wonder if the fork call fails and we are not properly catching the error. 🤔

@UnCor3
Copy link

UnCor3 commented Aug 17, 2024

@code-asher I believe that the issue is related to this

const timeoutInterval = 10000 // 10s, matches VS Code's timeouts.

I ran into this issue in ISH Shell (ios unix like terminal app) which i fixed by increasing the timeout to 100 seconds

Sees it does not exist, so it spawns the child (by forking) and sets CODE_SERVER_PARENT_PID

as you stated here this process on a relatively old hardware or in ISH Shell (tried on multiple devices ip6s, ip8+, ip11, ipse2ndgen) generally takes more than 10 seconds

In order to fix this issue my suggestion would be to introduce a new flag --timeout for example

It wasn't the only issue I've had when running this in ISH Shell

await fs.writeFile(configPath, defaultConfigFile(generatedPassword), {

  • fs does not throw an error on the line above if the file exists so it writes the default file again and again on every start*

  • when you hit the page there seems to be a timeout for the websocket connection you may also need to increase the timeout for this issue

  • these installation steps for ios are deprecated you need at least node v20 in order to install with npm

  • got so many node zlib errors which causes ws timeouts to kick in and timeout on client side*

Please do let me know if you want me to submit an issue for the issues listed above

My OS/Web Information

Web Browser: Chrome
Local OS: Windows 11
Remote OS: Alpine v3.18 (ISH Shell App)
Remote Architecture: x86 (tried with multiple devices ip6s, ip8+, ip11, ipse2ndgen)
Nodejs Version : 20.8.1-r0 (https://dl-cdn.alpinelinux.org/alpine/v3.18/community/x86/nodejs-current-20.8.1-r0.apk)

* indicates that the issue is most likely not related to code-server

@code-asher
Copy link
Member

Interesting! I had not considered the possibility it was actually taking > 10 seconds because that seemed so long already.

Please do let me know if you want me to submit an issue for the issues listed above

I will rename this issue so we can use it for --timeout. We should definitely open a new issue for the config file being overwritten. Interesting that it does not throw EEXIST. Maybe it will work if we add { flag: "wx" } (https://nodejs.org/api/fs.html#file-system-flags).

@code-asher code-asher changed the title code-server fails to start with "error timed out" Make timeout configurable Aug 19, 2024
@code-asher
Copy link
Member

Ah as for the zlib issues, feel free to open an issue for that as well but I have no idea what we can do there.

I am not sure exactly what needs to change in the iOS docs to use Node v20 (is it the repositories step?) but happy to take a PR for that or just let me know what the right steps are and I can make the change.

@UnCor3
Copy link

UnCor3 commented Aug 21, 2024

@code-asher I will submit a PR for updated IOS docs but currently trying to figure out how to run this outside of ish shell with a jailbroken device so that people can run 64 bit. As for the errors i believe it's all because of ish shell x86 emulation because later on i tried on a vm running 3.18 alpine x86 and it worked fine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants