Source code for prefsampling.ordinal.didi

"""
The Dirichlet distribution model is a model parameterised by a vector of candidate quality.
A quality score is associated to each candidate. When sampling a ranking, the quality scores
are used to sample a number of points for each candidate (using a Dirichlet distribution).
The ranking corresponds then to the candidates ordered by number of points.
"""

from __future__ import annotations

import numpy as np

from prefsampling.inputvalidators import validate_num_voters_candidates


[docs] @validate_num_voters_candidates def didi( num_voters: int, num_candidates: int, alphas: list[float], seed: int = None ) -> list[list[int]]: """ Generates ordinal votes from the DiDi (Dirichlet Distribution) model. This model is parameterised by a vector `alphas` intuitively indicating a quality for each candidate. Moreover, the higher the sum of the `alphas`, the more correlated the votes are (the more concentrated the Dirichlet distribution is). To sample a vote, we sample a set of points---one per candidate---from a Dirichlet distribution parameterised by `alphas`. The vote then corresponds to the candidates ordered by decreasing order of points. A collection of `num_voters` vote is generated independently and identically following the process described above. This model is very similar in spirit to the :py:func:`~prefsampling.ordinal.plackettluce.plackett_luce` model. Parameters ---------- num_voters : int Number of Voters. num_candidates : int Number of Candidates. alphas : list[float] List of model params, one value per candidate. seed : int, default: :code:`None` Seed for numpy random number generator. Returns ------- list[list[int]] Ordinal votes. Examples -------- .. testcode:: from prefsampling.ordinal import didi # Sample from a DiDi model with 2 voters and 3 candidates, the qualities of # candidates are 0.5, 0.2, and 0.1. didi(2, 3, (0.5, 0.2, 0.1)) # For reproducibility, you can set the seed. didi(2, 3, (5, 2, 0.1), seed=1002) # Don't forget to provide a quality for all candidates try: didi(2, 3, (0.5, 0.2)) except ValueError: pass # And all quality scores need to be strictly positive try: didi(2, 3, (0.5, 0.2, -0.4)) except ValueError: pass try: didi(2, 3, (0.5, 0.2, 0)) except ValueError: pass Validation ---------- The probability distribution guiding the DiDi model is not known in general. Since it depends on the order of the values in a Dirichlet sample, the general computation is involved. Still, we can check some special cases. First, when all qualities are the same, we are supposed to obtain a uniform distribution over all rankings. .. image:: ../validation_plots/ordinal/didi__0_1_0_1_0_1_0_1_0_1_.png :width: 800 :alt: Observed versus theoretical frequencies for a DiDi model with alpha=[0.1, 0.1, 0.1, 0.1, 0.1] Second, in the special case of 2 candidates, we can easily compute an expression for the probability distribution of the model. Assume we have two candidates with quality :math:`\\alpha_0` and :math:`\\alpha_1`. Then, the probability of observing the ranking :math:`0 \\succ 1` is that of the probability to sample two values :math:`x_0`, :math:`x_1` from a Dirichlet distribution with parameters :math:`\\alpha_0` and :math:`\\alpha_1` such that :math:`x_0 > x_1`. We have thus: .. math:: \\mathbb{P}(x_0 > x_1) = \\mathbb{P}(x_0 > 0.5) = \\int_{0.5}^1 x_0^{\\alpha_0 - 1} \\times (1 - x_0)^{\\alpha_1 - 1} dx_0. We can compute an approximate value for of this integral using scipy. .. image:: ../validation_plots/ordinal/didi__1_0_0_3_.png :width: 800 :alt: Observed versus theoretical frequencies for a DiDi model with alpha=[0.1, 0.1] .. image:: ../validation_plots/ordinal/didi__0_1_0_1_.png :width: 800 :alt: Observed versus theoretical frequencies for a DiDi model with alpha=[1, 0.3] In the general case, we obtain the following frequencies. .. image:: ../validation_plots/ordinal/didi__0_2_0_5_0_3_0_7_0_2_.png :width: 800 :alt: Observed versus theoretical frequencies for a DiDi model with alpha=[0.2, 0.5, 0.3, 0.7, 0.2] .. image:: ../validation_plots/ordinal/didi__1_0_0_3_0_3_0_3_0_3_.png :width: 800 :alt: Observed versus theoretical frequencies for a DiDi model with alpha=[1, 0.3, 0.3, 0.3, 0.3] References ---------- The DiDi model has not been references in any publications. Stanisław Szufa introduced out of curiosity. See the `wikipedia page <https://en.wikipedia.org/wiki/Dirichlet_distribution>`_ of the Dirichlet distribution for more details. """ if len(alphas) != num_candidates: raise ValueError( "Incorrect length of alphas vector. Should be equal to num_candidates." ) if not all(a > 0 for a in alphas): raise ValueError( "The values of the alpha vector should all be strictly positive." ) rng = np.random.default_rng(seed) votes = [] for i in range(num_voters): points = rng.dirichlet(alphas) votes.append(list(reversed(points.argsort()))) return votes