Source code for prefsampling.ordinal.didi

"""
The Dirichlet distribution model is a model parameterised by a vector of candidate quality.
A quality score is associated to each candidate. When sampling a ranking, the quality scores
are used to sample a number of points for each candidate (using a Dirichlet distribution).
The ranking corresponds then to the candidates ordered by number of points.
"""

from __future__ import annotations

import numpy as np

from prefsampling.inputvalidators import validate_num_voters_candidates



[docs]
@validate_num_voters_candidates
def didi(
    num_voters: int, num_candidates: int, alphas: list[float], seed: int = None
) -> list[list[int]]:
    """
    Generates ordinal votes from the DiDi (Dirichlet Distribution) model.

    This model is parameterised by a vector `alphas` intuitively indicating a quality for each
    candidate. Moreover, the higher the sum of the `alphas`, the more correlated the votes are
    (the more concentrated the Dirichlet distribution is). To sample a vote, we sample a set of
    points---one per candidate---from a Dirichlet distribution parameterised by `alphas`. The
    vote then corresponds to the candidates ordered by decreasing order of points.

    A collection of `num_voters` vote is generated independently and identically following the
    process described above.

    This model is very similar in spirit to the
    :py:func:`~prefsampling.ordinal.plackettluce.plackett_luce` model.

    Parameters
    ----------
        num_voters : int
            Number of Voters.
        num_candidates : int
            Number of Candidates.
        alphas : list[float]
            List of model params, one value per candidate.
        seed : int, default: :code:`None`
            Seed for numpy random number generator.

    Returns
    -------
        list[list[int]]
            Ordinal votes.

    Examples
    --------

        .. testcode::

            from prefsampling.ordinal import didi

            # Sample from a DiDi model with 2 voters and 3 candidates, the qualities of
            # candidates are 0.5, 0.2, and 0.1.
            didi(2, 3, (0.5, 0.2, 0.1))

            # For reproducibility, you can set the seed.
            didi(2, 3, (5, 2, 0.1), seed=1002)

            # Don't forget to provide a quality for all candidates
            try:
                didi(2, 3, (0.5, 0.2))
            except ValueError:
                pass

            # And all quality scores need to be strictly positive
            try:
                didi(2, 3, (0.5, 0.2, -0.4))
            except ValueError:
                pass
            try:
                didi(2, 3, (0.5, 0.2, 0))
            except ValueError:
                pass

    Validation
    ----------

        The probability distribution guiding the DiDi model is not known in general. Since it
        depends on the order of the values in a Dirichlet sample, the general computation is
        involved. Still, we can check some special cases.

        First, when all qualities are the same, we are supposed to obtain a uniform distribution
        over all rankings.

        .. image:: ../validation_plots/ordinal/didi__0_1_0_1_0_1_0_1_0_1_.png
            :width: 800
            :alt: Observed versus theoretical frequencies for a DiDi model with alpha=[0.1, 0.1, 0.1, 0.1, 0.1]

        Second, in the special case of 2 candidates, we can easily compute an expression for the
        probability distribution of the model. Assume we have two candidates with quality
        :math:`\\alpha_0` and :math:`\\alpha_1`. Then, the probability of observing the ranking
        :math:`0 \\succ 1` is that of the probability to sample two values :math:`x_0`, :math:`x_1`
        from a Dirichlet distribution with parameters :math:`\\alpha_0` and :math:`\\alpha_1` such
        that :math:`x_0 > x_1`. We have thus:

        .. math::

            \\mathbb{P}(x_0 > x_1) = \\mathbb{P}(x_0 > 0.5) = \\int_{0.5}^1 x_0^{\\alpha_0 - 1}
            \\times (1 - x_0)^{\\alpha_1 - 1} dx_0.

        We can compute an approximate value for of this integral using scipy.

        .. image:: ../validation_plots/ordinal/didi__1_0_0_3_.png
            :width: 800
            :alt: Observed versus theoretical frequencies for a DiDi model with alpha=[0.1, 0.1]

        .. image:: ../validation_plots/ordinal/didi__0_1_0_1_.png
            :width: 800
            :alt: Observed versus theoretical frequencies for a DiDi model with alpha=[1, 0.3]

        In the general case, we obtain the following frequencies.

        .. image:: ../validation_plots/ordinal/didi__0_2_0_5_0_3_0_7_0_2_.png
            :width: 800
            :alt: Observed versus theoretical frequencies for a DiDi model with alpha=[0.2, 0.5, 0.3, 0.7, 0.2]

        .. image:: ../validation_plots/ordinal/didi__1_0_0_3_0_3_0_3_0_3_.png
            :width: 800
            :alt: Observed versus theoretical frequencies for a DiDi model with alpha=[1, 0.3, 0.3, 0.3, 0.3]

    References
    ----------

        The DiDi model has not been references in any publications. Stanisław Szufa introduced out
        of curiosity.

        See the `wikipedia page <https://en.wikipedia.org/wiki/Dirichlet_distribution>`_ of the
        Dirichlet distribution for more details.
    """
    if len(alphas) != num_candidates:
        raise ValueError(
            "Incorrect length of alphas vector. Should be equal to num_candidates."
        )

    if not all(a > 0 for a in alphas):
        raise ValueError(
            "The values of the alpha vector should all be strictly positive."
        )

    rng = np.random.default_rng(seed)

    votes = []

    for i in range(num_voters):
        points = rng.dirichlet(alphas)
        votes.append(list(reversed(points.argsort())))

    return votes