Dirichlet Distribution Models

Contents

Dirichlet Distribution Models#

The Dirichlet distribution model is a model parameterised by a vector of candidate quality. A quality score is associated to each candidate. When sampling a ranking, the quality scores are used to sample a number of points for each candidate (using a Dirichlet distribution). The ranking corresponds then to the candidates ordered by number of points.

didi(num_voters: int, num_candidates: int, alphas: list[float], seed: int = None) ndarray[source]#

Generates ordinal votes from the DiDi (Dirichlet Distribution) model.

This model is parameterised by a vector alphas intuitively indicating a quality for each candidate. Moreover, the higher the sum of the alphas, the more correlated the votes are (the more concentrated the Dirichlet distribution is). To sample a vote, we sample a set of points—one per candidate—from a Dirichlet distribution parameterised by alphas. The vote then corresponds to the candidates ordered by decreasing order of points.

A collection of num_voters vote is generated independently and identically following the process described above.

This model is very similar in spirit to the plackett_luce() model.

Parameters:
  • num_voters (int) – Number of Voters.

  • num_candidates (int) – Number of Candidates.

  • alphas (list[float]) – List of model params, one value per candidate.

  • seed (int) – Seed for numpy random number generator.

Returns:

Ordinal votes.

Return type:

np.ndarray

Examples

from prefsampling.ordinal import didi

# Sample from a DiDi model with 2 voters and 3 candidates, the qualities of
# candidates are 0.5, 0.2, and 0.1.
didi(2, 3, (0.5, 0.2, 0.1))

# For reproducibility, you can set the seed.
didi(2, 3, (5, 2, 0.1), seed=1002)

# Don't forget to provide a quality for all candidates
try:
    didi(2, 3, (0.5, 0.2))
except ValueError:
    pass

# And all quality scores need to be strictly positive
try:
    didi(2, 3, (0.5, 0.2, -0.4))
except ValueError:
    pass
try:
    didi(2, 3, (0.5, 0.2, 0))
except ValueError:
    pass

Validation

The probability distribution guiding the DiDi model is not known in general. Since it depends on the order of the values in a Dirichlet sample, the general computation is involved. Still, we can check some special cases.

First, when all qualities are the same, we are supposed to obtain a uniform distribution over all rankings.

Observed versus theoretical frequencies for a DiDi model with alpha=[0.1, 0.1, 0.1, 0.1, 0.1]

Second, in the special case of 2 candidates, we can easily compute an expression for the probability distribution of the model. Assume we have two candidates with quality \alpha_0 and \alpha_1. Then, the probability of observing the ranking 0 \succ 1 is that of the probability to sample two values x_0, x_1 from a Dirichlet distribution with parameters \alpha_0 and \alpha_1 such that x_0 > x_1. We have thus:

\mathbb{P}(x_0 > x_1) = \mathbb{P}(x_0 > 0.5) = \int_{0.5}^1 x_0^{\alpha_0 - 1}
\times (1 - x_0)^{\alpha_1 - 1} dx_0.

We can compute an approximate value for of this integral using scipy.

Observed versus theoretical frequencies for a DiDi model with alpha=[0.1, 0.1] Observed versus theoretical frequencies for a DiDi model with alpha=[1, 0.3]

In the general case, we obtain the following frequencies.

Observed versus theoretical frequencies for a DiDi model with alpha=[0.2, 0.5, 0.3, 0.7, 0.2] Observed versus theoretical frequencies for a DiDi model with alpha=[1, 0.3, 0.3, 0.3, 0.3]

References

The DiDi model has not been references in any publications. Stanisław Szufa introduced out of curiosity.

See the wikipedia page of the Dirichlet distribution for more details.