Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Diverging stacked bar #3823

Open
Ownezx opened this issue Jan 29, 2025 · 9 comments
Open

Feature request: Diverging stacked bar #3823

Ownezx opened this issue Jan 29, 2025 · 9 comments

Comments

@Ownezx
Copy link

Ownezx commented Jan 29, 2025

Hi, I'm currently analysing likerts data and it's common practice to analyse the response using diverging stacked bar.

Seeing there are no implementation for it in the library I wanted to propose that as an enhancement.

Expected problems: handling a even and odd number of bins makes handling the center data different (at least from my experience in Matlab and their bar charts).

Here is an example of a diverging stacked bar I did for data analysis.
Image

@thuiop
Copy link
Contributor

thuiop commented Jan 29, 2025

This can likely be achieved in some form through the objects interface, but I am confused as to what the data looks like exactly. Could you provide an example of the raw data ?

@Ownezx
Copy link
Author

Ownezx commented Jan 30, 2025

In my instance with microsoft forms, the data is a list of categorial strings : "Not intuitive, slightly not intuitive, neutral, slightly intuitive, intuitive"
The diverging stacked bar is then created from the count in each category.

The idea behind a diverging stacked bar graph is to show responses on a spectrum while keeping it aligned to the central position.

Here is an example pandas data frame that would have such raw data:

dataFrame = pd.DataFrame()

# In this case 5 categories
dataFrame["Intuitive"] = [
    "Very Intuitive",
    "Very Intuitive",
    "Very Intuitive",
    "Very Intuitive",
    "Very Intuitive",
    "Slightly Intuitive",
    "Slightly Intuitive",
    "Slightly Intuitive",
    "Slightly Intuitive",
    "Neutral"
    "Neutral"
    "Neutral"
    "Slightly Unintuitive",
    "Slightly Unintuitive",
    "Very Unintuitive"
]

@thuiop
Copy link
Contributor

thuiop commented Jan 30, 2025

Ah, got you, so grey/Neutral is centered on 0 and the other responses are stacked on the left or right, correct? I will see if I manage to replicate that in a not too contrived way; it does seem to be a pretty specific thing though.

@Ownezx
Copy link
Author

Ownezx commented Jan 30, 2025

That is correct.

Ideally the X axis would be positive on both sides as they are both counts.

Currently I have not found any easy way to do it without significant fidgetting around, at least in MATLAB.

Here is a little micro example (it turns out it's much simpler in matplotlib than in MATLAB)

# importing package
import matplotlib.pyplot as plt

# create data
label = ["test"]
# Counted and ordered values
y = [1, 2, 3, 4, 5]

# plot bars in stack manner
plt.bar(label, y[0], bottom=-y[0]-y[1]-y[2]/2)
plt.bar(label, y[1], bottom=-y[1]-y[2]/2)
plt.bar(label, y[2], bottom=-y[2]/2)
plt.bar(label, y[3], bottom=y[2]/2)
plt.bar(label, y[4], bottom=y[2]/2+y[3])
plt.show()

@Ownezx
Copy link
Author

Ownezx commented Jan 30, 2025

Also sometimes you want to force the user to not be neutral, hence the odd and even numbers as a slight difficulty in the implementation. You don't center in the same way. In that case the options would be:
"Very unintuitive, Unintuitive, slightly not intuitive, slightly intuitive, intuitive, very intuitive"

The center in that case would be on the point between "slightly not intuitive" and "slightly intuitive".

EDIT : Corresponding example:

# importing package
import matplotlib.pyplot as plt

# create data
label = ["test"]
# Counted and ordered values
y = [1, 2, 3, 4, 5, 6]

# plot bars in stack manner
plt.bar(label, y[0], bottom=-y[0]-y[1]-y[2])
plt.bar(label, y[1], bottom=-y[1]-y[2])
plt.bar(label, y[2], bottom=-y[2])
plt.bar(label, y[3], bottom=0)
plt.bar(label, y[4], bottom=y[3])
plt.bar(label, y[5], bottom=y[3]+y[4])
plt.show()

@Ownezx
Copy link
Author

Ownezx commented Jan 30, 2025

As more example, of such graphs we can look at population pyramids with 2 or 4 values: https://en.wikipedia.org/wiki/Population_pyramid
Or political leaning depeding on age with 2 or 4 values : https://www.pewresearch.org/politics/2024/04/09/age-generational-cohorts-and-party-identification/

@thuiop
Copy link
Contributor

thuiop commented Jan 30, 2025

Ok, you can find the result below. I had to create a custom object, which is not officially supported by seaborn currently so this might break in the future. But hey, it works. Of course, in a real context the class would be in another module; the second group of imports is only necessary for defining the DivergingStack class. You also need a tiny bit of pandas manipulation in order to get the counts in the way I implemented it but it is pretty manageable.

Image

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn.objects as so

from dataclasses import dataclass
from functools import partial
from pandas import DataFrame
from seaborn._core.groupby import GroupBy
from seaborn._core.moves import Move
from seaborn._core.scales import Scale

@dataclass
class DivergingStack(Move):
    def _stack(self, df, orient, order=None):
        df = GroupBy(order).apply(df, lambda x: x)
        if df["baseline"].nunique() > 1:
            err = "Stack move cannot be used when baselines are already heterogeneous"
            raise RuntimeError(err)

        other = {"x": "y", "y": "x"}[orient]
        stacked_lengths = (df[other] - df["baseline"]).dropna().cumsum()
        offsets = stacked_lengths.shift(1).fillna(0)

        if len(df) % 2 == 0:
            middle = stacked_lengths[len(df) // 2 - 1]
        else:
            middle = (stacked_lengths[len(df) // 2 - 1] + stacked_lengths[len(df)//2]) / 2

        df[other] = stacked_lengths - middle
        df["baseline"] = df["baseline"] + offsets - middle

        return df

    def __call__(
        self, data: DataFrame, groupby: GroupBy, orient: str, scales: dict[str, Scale],
    ) -> DataFrame:

        groupers = ["col", "row", orient]
        return GroupBy(groupers).apply(data, partial(self._stack, order=groupby.order), orient)


ranks = ["Very Unintuitive","Slightly Unintuitive","Neutral","Slightly Intuitive","Very Intuitive"]
colors = ["red","indianred","grey","limegreen","green"]

df = pd.DataFrame({
    "Intuitive": np.random.choice(ranks,size=500),
    "category": np.random.choice(["A","B","C","D","E"],size=500),
    
})
grouped_df = df.groupby(["category","Intuitive"]).size().to_frame(name="count")

fig,ax = plt.subplots()
p = (
    so.Plot(data=grouped_df,x="count",y="category",color="Intuitive")
    .add(so.Bar(edgewidth=0),DivergingStack())
    .scale(color=so.Nominal(values=colors,order=ranks))
)
p.on(ax).plot()
plt.show()

@Ownezx
Copy link
Author

Ownezx commented Jan 30, 2025

Oh wow, that is really nice work!

Here are the few things I can think of that could improve what you did

  • Integration to the color palettes in seaborne (I've never used them yet so I'm unsure of how easy it is do it)
  • Having a legend with each category
  • Having a positive axis on both side, which is probably acheivable by assigning the absolute value to the xticks to itself.
    This might be good to only have as an option.
  • Being able to group by a third variable much like the hue in boxplot, with a color palette for each variables (a bit like my first example).
    This is probably too specific and not worth implementing.

If you need help testing things out at some point or documenting I'll gladly help.

@thuiop
Copy link
Contributor

thuiop commented Jan 30, 2025

  • This is up to the user, you can specify whatever you like in the color scale. Just set colors to something like sns.diverging_palette(220, 20, n=len(ranks)) (do not forget to import seaborn as sns). You probably would need to tinker with the colors, see the documentation for that.
  • The plot automatically generates one; on the figure I posted earlier it is outside the actual figure (it is actually the grey thing on the right) but depending on how you do the plot you can have it in some other location (although manipulating it precisely is still a rough point of the objects interface; see e.g. Plot legend needs more customizability #2994)
  • Yes, you would need to manipulate the tick labels, something like ax.xaxis.set_major_formatter(lambda x, pos: str(abs(x))) should do the trick
  • This is a bit of a pain; probably you would need to stitch several axes together to achieve something like your original figure. I could do it if I really needed but I am not sure it is worth the effort.

In any case, I provided the heavy lifting here; I will leave you handle the details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants