gcm.arrow_strength providing different ranking #1130

ankur-tutlani · 2024-01-08T16:00:39Z

I am using arrow_strength function to identify top nodes showing variation in target node (Growth).

arrow_strengths = gcm.arrow_strength(scm, target_node='Growth',num_samples_conditional=5000,difference_estimation_func=gcm.divergence.estimate_kl_divergence_continuous_knn)
arrow_strength_pd = pd.DataFrame(list(arrow_strengths.items()), columns=['edge', 'importance'])
arrow_strength_pd=arrow_strength_pd.sort_values('importance',ascending=False)

There are ~ 40 nodes. After sorting I am getting different answers, say if X node is ranked on 10th, in another iteration using same causal graph and data, it moves to 30th place or vice versa. Is this behavior expected? Does this depend on causal graph structure?

Version information:

DoWhy version [e.g. 0.11.1]

The text was updated successfully, but these errors were encountered:

bloebp · 2024-01-08T17:01:25Z

The arrow strength has some sampling for estimation which leads to variations between runs. You can reduce this by changing some parameters like tolerance (to a smaller number).

Generally, if the rankings change that much between runs, it seems the connections are either equally strong or too weak in general (or the model simply isn't capturing them accurately enough). What is the range of the values?

You can also take a look at estimating confidence intervals, they might provide better insights:
https://www.pywhy.org/dowhy/v0.11.1/user_guide/modeling_gcm/estimating_confidence_intervals.html#conveniently-bootstrapping-graph-training-on-random-subsets-of-training-data

ankur-tutlani · 2024-01-11T15:11:34Z

Thanks for sharing the link and this is helpful.
Is there any recommendation in the library on the following?

If the causal graph structure is not very certain. The "auto" option takes care of causal mechanisms, but is there anything similar for graph too?
What are the recommendations to improve this if we get say following evaluation result?

The overall average KL divergence between the generated and observed distribution is 0.6444021604490836
The estimated KL divergence indicates a good representation of the data distribution, but might indicate some smaller mismatches between the distributions.

github-actions · 2024-01-26T01:46:09Z

This issue is stale because it has been open for 14 days with no activity.

bloebp · 2024-01-26T15:51:22Z

Sorry for the late reply!

1. If the causal graph structure is not very certain. The "auto" option takes care of causal mechanisms, but is there anything similar for graph too?

You can take a look at https://github.com/py-why/causal-learn, this is a package for inferring the causal graph based on data.

2. What are the recommendations to improve this if we get say following evaluation result?

You could try and set the parameter for the quality in the auto assignment function to BETTER (see the docstring of the function). Let me know if this improves the results (i.e., lower KL divergence). Otherwise, you might need to manually check which causal mechanisms can be improved. Maybe the performance results of nodes can give some insights.

github-actions · 2024-02-10T01:49:34Z

This issue is stale because it has been open for 14 days with no activity.

github-actions · 2024-02-18T01:46:18Z

This issue was closed because it has been inactive for 7 days since being marked as stale.

ankur-tutlani added the question Further information is requested label Jan 8, 2024

github-actions bot added the stale label Jan 26, 2024

github-actions bot removed the stale label Jan 27, 2024

github-actions bot added the stale label Feb 10, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gcm.arrow_strength providing different ranking #1130

gcm.arrow_strength providing different ranking #1130

ankur-tutlani commented Jan 8, 2024

bloebp commented Jan 8, 2024

ankur-tutlani commented Jan 11, 2024

github-actions bot commented Jan 26, 2024

bloebp commented Jan 26, 2024

github-actions bot commented Feb 10, 2024

github-actions bot commented Feb 18, 2024

gcm.arrow_strength providing different ranking #1130

gcm.arrow_strength providing different ranking #1130

Comments

ankur-tutlani commented Jan 8, 2024

bloebp commented Jan 8, 2024

ankur-tutlani commented Jan 11, 2024

github-actions bot commented Jan 26, 2024

bloebp commented Jan 26, 2024

github-actions bot commented Feb 10, 2024

github-actions bot commented Feb 18, 2024