Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a documenation how to debug the agent #163

Merged
merged 1 commit into from
Feb 20, 2025

Conversation

AvielSegev
Copy link
Collaborator

Hi Everyone,

I’ve created a document called debugging.md as a follow-up to ECOPROJECT-2656.

I'm pretty sure there's plenty of room for new ideas, improvement and changes in the document. So let me know your thoughts!

@app-sre-bot
Copy link
Collaborator

Can one of the admins verify this patch?

Copy link
Collaborator

@machacekondra machacekondra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I would maybe not use the Agent everywhere, can we use Discovery VM or Discover Machine in most of the places? Unless you are mentioning the planner-agent container itself

@AvielSegev AvielSegev force-pushed the docs/debugging.md branch 3 times, most recently from cb032c8 to c9b2837 Compare February 17, 2025 12:49
@AvielSegev
Copy link
Collaborator Author

Looks great! I would maybe not use the Agent everywhere, can we use Discovery VM or Discover Machine in most of the places? Unless you are mentioning the planner-agent container itself

Sure. It's Done:)

@AvielSegev
Copy link
Collaborator Author

Should we mention additional cases or take a different approach to those already presented?

@machacekondra
Copy link
Collaborator

Should we mention additional cases or take a different approach to those already presented?

What do you mean by additional cases?

@AvielSegev
Copy link
Collaborator Author

Should we mention additional cases or take a different approach to those already presented?

What do you mean by additional cases?

The document presents two edge cases from the customer side. Are there any other potential issues, aside from the disconnected environment and container-related problems?

@machacekondra
Copy link
Collaborator

The document presents two edge cases from the customer side. Are there any other potential issues, aside from the disconnected environment and container-related problems?

Most likely are network issues, and issues with containers to start. I didn't encounter any other problem, yet. Maybe issues logging into Vcenter.

@AvielSegev AvielSegev force-pushed the docs/debugging.md branch 4 times, most recently from 1d5b12b to c782ce4 Compare February 18, 2025 13:07
@AvielSegev
Copy link
Collaborator Author

The document presents two edge cases from the customer side. Are there any other potential issues, aside from the disconnected environment and container-related problems?

Most likely are network issues, and issues with containers to start. I didn't encounter any other problem, yet. Maybe issues logging into Vcenter.

Logging into vCenter is handled smoothly by the UI. If the username or password is incorrect, the user is notified. If the /sdk suffix is missing, it is automatically appended. Can you think about additional specific cases regarding the login?

@AvielSegev AvielSegev force-pushed the docs/debugging.md branch 3 times, most recently from 68c8ce0 to 8565113 Compare February 18, 2025 13:50
doc/debugging.md Outdated
**Case: The planner-agent container is not running:**
Follow these steps to troubleshoot and resolve the issue:
1. Inspect the logs of the planner-agent container to identify any errors or issues that might have caused it to stop.
`podman logs planner-agent`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please use systemctl instead of podman. For logs it's fine, but start & stop could be problematic, as we have pre/post script defined in systemd services

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please provide the full command using systemctl? It is not working for me on the planner-agent container as a service.
but I view the ignition file and noticed a retry every 5 seconds to restart in case of a failure. Perhaps we can remove it?

Copy link
Collaborator

@machacekondra machacekondra Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be:

journaltct --user -u planner-agent
systemctl --user status planner-agent
systemctl --user restart planner-agent

but I view the ignition file and noticed a retry every 5 seconds to restart in case of a failure. Perhaps we can remove it?

Why do you think we should remove it? Actually, it's there to make sure if there is something wrong with startup, we retry it, do you think we should not try to retry?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the ignition is fine. I meant to remove the troubleshooting step where the user manually restarts the container, as we've already defined a restart policy, making manual involvment redundant.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Ondra. I changed the commands.

Copy link
Collaborator

@machacekondra machacekondra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Thanks!

@machacekondra machacekondra merged commit 9c057bf into kubev2v:main Feb 20, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants