Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: MethodError: no method matching JointOrderStatistics(::DiscreteNonParametric{Float64, Float64, Vector{Float64}, Vector{Float64}}, ::Int64) #1839

Open
bvancil opened this issue Feb 20, 2024 · 3 comments

Comments

@bvancil
Copy link

bvancil commented Feb 20, 2024

Hello,

I was blithely trying to construct JointOrderStatistics for a discrete univariate distribution, ignoring the very clear documentation that only continuous univariate distributions are supported, and found that it hadn't been defined. Specifically, I'm trying to create such a distribution for the DiscreteNonParametric type of distribution. Is there any interest in building out an implementation for discrete distributions on sets with an order? I would be willing to give it a try.

Code to replicate error:

using Random

using Distributions

function generate_distribution(rng, n = 100)
    # How spiky?
    Spikiness = Uniform(1, 10)
    spikiness = rand(rng, Spikiness)
    unnormalized_weights = rand(rng, Uniform(), n + 1) .^ spikiness
    xs = collect(range(0, 1, n + 1))
    ps = unnormalized_weights ./ sum(unnormalized_weights)
    DiscreteNonParametric(xs, ps)
end

rng = MersenneTwister(20240218)
n = 3
MyDist = generate_distribution(rng)
μ = mean(MyDist)
θ₂ = var(MyDist)
γ₁ = skewness(MyDist)
κ = kurtosis(MyDist)
S = entropy(MyDist)

SampleDist = JointOrderStatistics(MyDist, n)
@sethaxen
Copy link
Contributor

I originally didn't implement JointOrderStatistics for discrete distributions because its pmf looked really complicated (from https://doi.org/10.1137/1.9780898719062.ch3):
image

But on a closer look, $r_s - r_{s-1}$ is just the count of $x_s$ in $x$, so it's pretty simple if ranks is 1:n:

function logpdf(d::JointOrderStatistics{<:DiscreteUnivariateDistribution}, x::AbstractVector{<:Real})
    n = d.n
    ranks = d.ranks
    @assert ranks == 1:n

    # below is equivalent to the following but faster since x should be sorted
    # sum(StatsBase.countmap(x)) do (x_i, c_i)
    #     c_i * logpdf(d, x_i) - loggamma(c_i + 1)
    # end + loggamma(n + 1)

    x_current = first(x)
    count_current = 1
    lp_current = logpdf(d.dist, x_current)
    T = typeof(lp_current)
    lp = zero(lp_current) + loggamma(T(n + 1))
    for i in Iterators.drop(eachindex(x), 1)
        x_i = x[i]
        if x_i == x_current
            count_current += 1
        elseif x_i < x_current || !insupport(d.dist, x_i)
            return oftype(lp, -Inf)
        else
            lp += count_current * lp_current - loggamma(T(count_current + 1))
            x_current = x_i
            count_current = 1
            lp_current = logpdf(d.dist, x_i)
        end
    end
    lp += count_current * lp_current - loggamma(T(count_current + 1))
    return lp
end

I'm not the best way to support arbitrary subsets of ranks. I suspect it would require being able to enumerate all elements in the support of the wrapped distribution between two bounds.

If you'd like to put this together into a PR or see a better way to go about it, I'd be happy to review!

@bvancil
Copy link
Author

bvancil commented Feb 20, 2024

Thanks for your input. Knowing that you're open to a PR, I'd like to try my hand at it.

@sethaxen
Copy link
Contributor

Just dropping here that while logpdf is nontrivial for discrete univariate distributions in general, the mean and variance of joint order statistics for discrete distributions with finite support are trivial. The reverse is true for continuous distributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants