Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disparate impact remover has no effect on disparate impact metric #547

Open
JorritMontijn opened this issue Jan 21, 2025 · 0 comments
Open

Comments

@JorritMontijn
Copy link

First of all, I'd like to thank you all for a lot of work that has gone into this package! I hope you could help me with the following problem. I'm using the R interface and after some initial problems getting it set up (the default installation has incompatible versions of python and tensorflow), I can access the AIF360 functions now. However, either the documentation is unclear as to how I should use it, or the function disparate_impact_remover is broken.

In the following code block I'm repairing some data, but the data with full repair is identical to the data without repair:

load_aif360_lib()
ad <- adult_dataset()
p <- list("race", 1)
u <- list("race", 0)

#subselect
pd_conv = ad$convert_to_dataframe()
data0 = pd_conv[[1]]
data0sub = data0[,c('race','age','sex','income-per-year')]

#turn into AIF data frame
aif_df = binary_label_dataset(
  data_path = data0sub,
  favor_label=1, unfavor_label=0, 
  unprivileged_protected_attribute=1, 
  privileged_protected_attribute=0,
  target_column='income-per-year', protected_attribute='race')

#repair
di1 <- disparate_impact_remover(repair_level = 1.0, sensitive_attribute = "race")
rp1 <- di1$fit_transform(aif_df)

di2 <- disparate_impact_remover(repair_level = 0, sensitive_attribute = "race")
rp2 <- di2$fit_transform(aif_df)


#calc metric
bm1 = binary_label_dataset_metric(rp1, list('race', 1), list('race',0))
fl_disparate_impact1 = bm1$disparate_impact()


#calc metric
bm2 = binary_label_dataset_metric(rp2, list('race', 1), list('race',0))
fl_disparate_impact2 = bm2$disparate_impact()

> fl_disparate_impact1
[1] 0.6037688
> fl_disparate_impact2
[1] 0.6037688

Note that the subselection isn't strictly necessary, but I wanted to make sure there was no error in transforming the data sets between R data frames and the AIF360 format, as I initially noticed this problem in my own data set.

So my question is: am I doing something wrong, or are these functions broken?

Thank you in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant