Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a note on a kind issue to README #120

Merged
merged 1 commit into from
May 24, 2024
Merged

Conversation

yuanchen8911
Copy link
Collaborator

@yuanchen8911 yuanchen8911 commented May 15, 2024

This PR adds a note on a kind issue to README.

Checking a pod's log in a kind cluster may report the following error:

failed to create fsnotify watcher: too many open files.

@yuanchen8911 yuanchen8911 force-pushed the kind-bug branch 3 times, most recently from 1b388aa to 6b7b90f Compare May 15, 2024 17:18
@yuanchen8911 yuanchen8911 changed the title Update a note on a kind issue to README Add a note on a kind issue to README May 15, 2024
@yuanchen8911 yuanchen8911 requested a review from klueska May 15, 2024 17:19
README.md Outdated
Comment on lines 18 to 24
1. `kind` is installed. See the official documentation [here](https://kind.sigs.k8s.io/docs/user/quick-start/#installation). Note that there is a [known issue with kind](https://kind.sigs.k8s.io/docs/user/known-issues/#pod-errors-due-to-too-many-open-files). You may see the error while trying to tail the log of a running pod in a kind cluster: `failed to create fsnotify watcher: too many open files.` The issue can be resolved by increasing the default values for the configuraton parameters in `/proc/sys/fs/inotify`.
```console
sudo sysctl -w fs.inotify.max_user_watches=<a larger value>
sudo sysctl -w fs.inotify.max_user_instances=<a large value>
sudo sysctl -w fs.inotify.max_queued_events=<a large value>
```

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave out the solution, and just include the link (the solution is well documented in the link itself).

Can we also move this note to after all of the instructions for installing the prerequisites. I feel like it distracts from the flow to have thid detail here, and it can be done after kind is installed (it doesn't have to be done together with it).

Copy link
Collaborator Author

@yuanchen8911 yuanchen8911 May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved it after the kubectl log command to put it in the context.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've applied the suggested change. I put it here because line 133 (right before this) is a command showing the pod log, which provides the right context. I'm okay with moving it to the beginning of this section too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it work to update fs.inotify.max_user_watchesafter the kind cluster has been started? If so, then let's move this to line 106, i.e. before the log calls are made (so people see the warning before things suddenly aren't working).

If you have to update fs.inotify.max_user_watches before the kind cluster is started, then I still think my original suggestion is the right place to put this (i.e. before the kind cluster is started).

Copy link
Collaborator Author

@yuanchen8911 yuanchen8911 May 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Does it work to update fs.inotify.max_user_watchesafter the kind cluster has been started?"

Yes, that's what I did. There's no need to recreate the kind cluster. How about moving it to line 98 (before the kubectl commands)? It reads like:


Run the examples by following the steps in the demo script

Finally, you can run the various examples contained in the demo/specs/quickstart folder. The README in that directory shows the full script of the demo you can walk through.

cat demo/specs/quickstart/README.md

Note: there is a known issue with kind. You may see an error while trying to tail the log of a running pod in the kind cluster: failed to create fsnotify watcher: too many open files. The issue may be resolved by increasing the value for fs.inotify.max_user_watches.

@yuanchen8911 yuanchen8911 force-pushed the kind-bug branch 2 times, most recently from afca8f4 to dbb4278 Compare May 15, 2024 20:44
@yuanchen8911 yuanchen8911 requested a review from klueska May 15, 2024 20:44
@yuanchen8911 yuanchen8911 requested a review from klueska May 17, 2024 20:17
@yuanchen8911 yuanchen8911 force-pushed the kind-bug branch 4 times, most recently from 5027241 to a921075 Compare May 18, 2024 13:46
README.md Outdated
Finally, you can run the various examples contained in the `demo/specs/quickstart` folder.
The `README` in that directory shows the full script of the demo you can walk through.

Finally, you can run the various examples contained in the `demo/specs/quickstart` folder. The `README` in that directory shows the full script of the demo you can walk through.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to change anything other than pull the second sentence onto the same line as the first (which I'd like to avoid).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it.

README.md Outdated
Comment on lines 98 to 101
**Note:** there is a [known issue with kind](https://kind.sigs.k8s.io/docs/user/known-issues/#pod-errors-due-to-too-many-open-files). You may see an error while trying to tail the log of a running pod in the kind cluster: `failed to create fsnotify watcher: too many open files.` The issue may be resolved by increasing the value for `fs.inotify.max_user_watches`.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think this belongs further down at line 106, i.e. just after the sentence:

Get the pods' statuses. Depending on which GPUs are available, running the first three examples will produce output similar to the following.

Copy link
Collaborator Author

@yuanchen8911 yuanchen8911 May 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, it's been moved to line 106 as suggested.

@yuanchen8911 yuanchen8911 force-pushed the kind-bug branch 3 times, most recently from b3fdf68 to 9f99f30 Compare May 20, 2024 15:12
@yuanchen8911 yuanchen8911 requested a review from klueska May 20, 2024 15:12
Signed-off-by: Yuan Chen <[email protected]>

Fix a format in README

Signed-off-by: Yuan Chen <[email protected]>

Rremove .DS_Store

Signed-off-by: Yuan Chen <[email protected]>

Update README

Signed-off-by: Yuan Chen <[email protected]>

Update README.md

Signed-off-by: Yuan Chen <[email protected]>

Update README.md

Signed-off-by: Yuan Chen <[email protected]>

Update README

Signed-off-by: Yuan Chen <[email protected]>

Update README

Signed-off-by: Yuan Chen <[email protected]>

Update README.md

Co-authored-by: Kevin Klues <[email protected]>
Signed-off-by: Yuan Chen <[email protected]>
Signed-off-by: Yuan Chen <[email protected]>

Update README

Signed-off-by: Yuan Chen <[email protected]>

Update README

Signed-off-by: Yuan Chen <[email protected]>

Update README

Signed-off-by: Yuan Chen <[email protected]>

Fix a format issue in README

Signed-off-by: Yuan Chen <[email protected]>

Fix a format issue in README

Signed-off-by: Yuan Chen <[email protected]>

Fix a format issue in README

Signed-off-by: Yuan Chen <[email protected]>

Update README.md

Co-authored-by: Kevin Klues <[email protected]>
Signed-off-by: Yuan Chen <[email protected]>
Signed-off-by: Yuan Chen <[email protected]>
@klueska klueska merged commit 176ad35 into NVIDIA:main May 24, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants