Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: migrate contextual logging #1613

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

googs1025
Copy link
Member

Fix: #1608

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jan 25, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign ingvagabund for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@googs1025 googs1025 force-pushed the contextual_logging branch 3 times, most recently from 92cee90 to 6377812 Compare January 25, 2025 12:35
@googs1025
Copy link
Member Author

@a7i @ingvagabund /PTAL

Copy link
Contributor

@ricardomaraschini ricardomaraschini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, what an amazing work you have done. Thanks for taking the time to get this in. I have left a few comments.

@@ -46,7 +47,7 @@ func NewLeaderElection(
id = hostname + "_" + string(uuid.NewUUID())
}

klog.V(3).Infof("Assigned unique lease holder id: %s", id)
logger.V(3).Info(fmt.Sprintf("Assigned unique lease holder id: %s", id))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this be better ?

logger.V(3).Info("Assigned unique lease holder id", "id", id)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

},
OnNewLeader: func(identity string) {
// Just got the lock
if identity == id {
return
}
klog.V(1).Infof("New leader elected: %v", identity)
logger.V(1).Info(fmt.Sprintf("New leader elected: %v", identity))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -93,11 +93,12 @@ func (mc *MetricsCollector) AllNodesUsage() (map[string]map[v1.ResourceName]*res
}

func (mc *MetricsCollector) NodeUsage(node *v1.Node) (map[v1.ResourceName]*resource.Quantity, error) {
logger := klog.FromContext(context.TODO())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not entirely sure we should have this TODO() added to the code. The enhancement proposal mentions that the logger should either come in as part of a Context or as an explicit argument. Any reason for not receiving it as part of an existing Context ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return false
}*/
}
// Ignore nodes that are marked unschedulable
/*if node.Spec.Unschedulable {
klog.V(4).InfoS("Ignoring node since it is unschedulable", "node", klog.KObj(node.Name))
logger..V(4).Info("Ignoring node since it is unschedulable", "node", klog.KObj(node.Name))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be commented out but there is an extra . we should get rid of.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -157,7 +158,7 @@ func ConvertToPods(objs []interface{}, filter FilterFunc) []*v1.Pod {
if !ok {
continue
}
if filter == nil || filter(pod) {
if filter == nil || filter(context.TODO(), pod) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't GetPodsAssignedToNodeFunc type also receive a Context as argument ? I mean, now that FilterFunc also requires one to be passed in ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -57,8 +59,9 @@ func Register(
pluginArgDefaulter PluginArgDefaulter,
registry Registry,
) {
logger := klog.FromContext(context.TODO())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this logger be received through an argument ? Or maybe a Context.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -72,13 +72,15 @@ func New(args runtime.Object, handle frameworktypes.Handle) (frameworktypes.Plug
return nil, fmt.Errorf("want args to be of type defaultEvictorFilterArgs, got %T", args)
}

logger := klog.FromContext(context.TODO())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto. I don't think we should be introducing new TODO() here. We could pass in the logger or a context.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -23,13 +24,14 @@ import (

func ValidateDefaultEvictorArgs(obj runtime.Object) error {
args := obj.(*DefaultEvictorArgs)
logger := klog.FromContext(context.TODO())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if err := validateCanEvict(pod, failedPodsArgs); err != nil {
klog.V(4).InfoS(fmt.Sprintf("ignoring pod for eviction due to: %s", err.Error()), "pod", klog.KObj(pod))
logger.V(4).Info(fmt.Sprintf("ignoring pod for eviction due to: %s", err.Error()), "pod", klog.KObj(pod))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we be getting the logger from the Context received as an argument here ? More, if we do that then we can get rid of the line 52 above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if err := validateCanEvict(pod, tooManyRestartsArgs); err != nil {
klog.V(4).InfoS(fmt.Sprintf("ignoring pod for eviction due to: %s", err.Error()), "pod", klog.KObj(pod))
logger.V(4).Info(fmt.Sprintf("ignoring pod for eviction due to: %s", err.Error()), "pod", klog.KObj(pod))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here as my previous comment. I think we want to get the logger from the context received as an argument here as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 18, 2025
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Feb 19, 2025
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 19, 2025
@googs1025 googs1025 force-pushed the contextual_logging branch 6 times, most recently from 2bb3baf to 627da1c Compare February 19, 2025 03:16
@ricardomaraschini
Copy link
Contributor

@googs1025 This one looks good to me, most of the changes are really straightforward. We need a review from someone else now (as the number of changes is quite big the more eyes we get here the better).

@googs1025
Copy link
Member Author

/cc @ingvagabund @a7i @seanmalloy
/PTAL

Copy link
Contributor

@ingvagabund ingvagabund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not we be using logger.WithName at some places at least? This way there's nothing added to the structured logging besides what's already provided through the key/value pairs. Also, not every place is suitable for klog.FromContext. The high level places (some I commented on) can probably keep the original klog.

@@ -324,7 +328,7 @@ func Run(ctx context.Context, rs *options.DeschedulerServer) error {

// Add k8s compatibility warnings to logs
if err := validateVersionCompatibility(rs.Client.Discovery(), version.Get()); err != nil {
klog.Warning(err.Error())
logger.Error(err, "validate version error")
Copy link
Contributor

@ingvagabund ingvagabund Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to stay as a warning/info.

@@ -57,7 +58,7 @@ func PodMatchesTermsNamespaceAndSelector(pod *v1.Pod, namespaces sets.Set[string
}

// PodMatchNodeSelector checks if a pod node selector matches the node label.
func PodMatchNodeSelector(pod *v1.Pod, node *v1.Node) (bool, error) {
func PodMatchNodeSelector(ctx context.Context, pod *v1.Pod, node *v1.Node) (bool, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ctx is unused here

@@ -274,7 +277,7 @@ func GetNodeWeightGivenPodPreferredAffinity(pod *v1.Pod, node *v1.Node) (int32,
preferredNodeSelector := &v1.NodeSelector{NodeSelectorTerms: []v1.NodeSelectorTerm{prefSchedulTerm.Preference}}
match, err := corev1.MatchNodeSelectorTerms(node, preferredNodeSelector)
if err != nil {
klog.ErrorS(err, "error parsing node selector", "selector", preferredNodeSelector)
logger.Error(err, "error parsing node selector", "selector", preferredNodeSelector)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Hups. We forgot to continue. Normally, match should always be false when err != nil. Yet, noone never really knows :)

if _, ok := registry[name]; ok {
klog.V(10).InfoS("Plugin already registered", "plugin", name)
logger.V(10).Info("Plugin already registered", "plugin", name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can sacrifice the context for keeping klog here to avoid passing the extra ctx argument. This is quite how level.

if err != nil {
klog.Error(err, "Failed to get threshold priority from args")
logger.Error(err, "Failed to get threshold priority from args")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The err should be handled here too. Returning an error instead of profile.

Copy link
Member Author

@googs1025 googs1025 Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the consistency , I will submit another PR to complete these two things. (error handle) (add continue) ...

#1613 (comment)

mc.mu.RLock()
defer mc.mu.RUnlock()

if _, exists := mc.nodes[node.Name]; !exists {
klog.V(4).InfoS("unable to find node in the collected metrics", "node", klog.KObj(node))
logger.V(4).Info("unable to find node in the collected metrics", "node", klog.KObj(node))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be kept with klog as there's no additional value in the context.

@ingvagabund
Copy link
Contributor

Also, passing context into some of the functions like filter makes the wiring quite heavy. There's too many context arguments for the moment. I wonder if there's a way to reduce it.

return false
} /*else if cond.Type == v1.NodeOutOfDisk && cond.Status != v1.ConditionFalse {
klog.V(4).InfoS("Ignoring node with condition status", "node", klog.KObj(node.Name), "condition", cond.Type, "status", cond.Status)
logger.V(4).Info("Ignoring node with condition status", "node", klog.KObj(node.Name), "condition", cond.Type, "status", cond.Status)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g. how this log line will look different from the current one? The log line already has node in it. Is there more context that needs to be added here?

@ingvagabund
Copy link
Contributor

@googs1025 this looks like tremendous amount of work you have done here. Thank you for the patience and effort.

@googs1025
Copy link
Member Author

Also, passing context into some of the functions like filter makes the wiring quite heavy. There's too many context arguments for the moment. I wonder if there's a way to reduce it.

Indeed, when I work on this pr, I also found filter prefilter method in particular would become heavy. At present, it seems that pass ctx is a simple way. 🤔 Or, maybe we can initialize a common ctx in the struct so that we don't need to pass ctx frequently. I'll try this way

@googs1025 googs1025 marked this pull request as draft February 25, 2025 01:24
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 25, 2025
@ingvagabund
Copy link
Contributor

ingvagabund commented Feb 25, 2025

Hint: Context for any filter or similar function within a plugin is the plugin name and the extension point (Deschedule/Balance). Context that can be set by the framework. The pod and node names are "just" data.

Before moving with any changes it's worth considering the actual context that each log represents. E.g. there are logs that print an error related to a plugin defaulting, conversions. Should these log a context or should they rather return an error instead that gets logged outside of them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feature: Migrate to contextual logging
4 participants