Dirichlet Distribution Models#

The Dirichlet distribution model is a model parameterised by a vector of candidate quality. A quality score is associated to each candidate. When sampling a ranking, the quality scores are used to sample a number of points for each candidate (using a Dirichlet distribution). The ranking corresponds then to the candidates ordered by number of points.

didi(num_voters: int, num_candidates: int, alphas: list[float], seed: int = None) → list[list[int]][source]#

Generates ordinal votes from the DiDi (Dirichlet Distribution) model.

This model is parameterised by a vector alphas intuitively indicating a quality for each candidate. Moreover, the higher the sum of the alphas, the more correlated the votes are (the more concentrated the Dirichlet distribution is). To sample a vote, we sample a set of points—one per candidate—from a Dirichlet distribution parameterised by alphas. The vote then corresponds to the candidates ordered by decreasing order of points.

A collection of num_voters vote is generated independently and identically following the process described above.

This model is very similar in spirit to the plackett_luce() model.

Parameters:

num_voters (int) – Number of Voters.
num_candidates (int) – Number of Candidates.
alphas (list[float]) – List of model params, one value per candidate.
seed (int, default: None) – Seed for numpy random number generator.

Returns:

Ordinal votes.

Return type:

list[list[int]]

Examples

from prefsampling.ordinal import didi

# Sample from a DiDi model with 2 voters and 3 candidates, the qualities of
# candidates are 0.5, 0.2, and 0.1.
didi(2, 3, (0.5, 0.2, 0.1))

# For reproducibility, you can set the seed.
didi(2, 3, (5, 2, 0.1), seed=1002)

# Don't forget to provide a quality for all candidates
try:
    didi(2, 3, (0.5, 0.2))
except ValueError:
    pass

# And all quality scores need to be strictly positive
try:
    didi(2, 3, (0.5, 0.2, -0.4))
except ValueError:
    pass
try:
    didi(2, 3, (0.5, 0.2, 0))
except ValueError:
    pass

Validation

The probability distribution guiding the DiDi model is not known in general. Since it depends on the order of the values in a Dirichlet sample, the general computation is involved. Still, we can check some special cases.

First, when all qualities are the same, we are supposed to obtain a uniform distribution over all rankings.

Observed versus theoretical frequencies for a DiDi model with alpha=[0.1, 0.1, 0.1, 0.1, 0.1]

Second, in the special case of 2 candidates, we can easily compute an expression for the probability distribution of the model. Assume we have two candidates with quality \alpha_0 and \alpha_1. Then, the probability of observing the ranking 0 \succ 1 is that of the probability to sample two values x_0, x_1 from a Dirichlet distribution with parameters \alpha_0 and \alpha_1 such that x_0 > x_1. We have thus:

\mathbb{P}(x_0 > x_1) = \mathbb{P}(x_0 > 0.5) = \int_{0.5}^1 x_0^{\alpha_0 - 1} \times (1 - x_0)^{\alpha_1 - 1} dx_0.

We can compute an approximate value for of this integral using scipy.

Observed versus theoretical frequencies for a DiDi model with alpha=[0.1, 0.1]

Observed versus theoretical frequencies for a DiDi model with alpha=[1, 0.3]

In the general case, we obtain the following frequencies.

Observed versus theoretical frequencies for a DiDi model with alpha=[0.2, 0.5, 0.3, 0.7, 0.2]

Observed versus theoretical frequencies for a DiDi model with alpha=[1, 0.3, 0.3, 0.3, 0.3]

References

The DiDi model has not been references in any publications. Stanisław Szufa introduced out of curiosity.

See the wikipedia page of the Dirichlet distribution for more details.

Dirichlet Distribution Models

Contents

Dirichlet Distribution Models#