Sampling quantile paths

Sampling quantile paths#

ⓘ

__all__ = ["generate_quantile_paths", "generate_path_hierarchical_beta"]

We want to generate paths \(\lnLh\) for the quantile function \(l(Φ)\), with \(Φ \in [0, 1]\), from a stochastic process \(\pathP\) determined by \(\lnLtt(Φ)\) and \(\emdstd(Φ)\). This process must satisfy the following requirements:

It must generate only monotone paths, since quantile functions are monotone.
The process must be heteroscedastic, with variability at \(Φ\) given by \(\emdstd(Φ)\).
Generated paths should track the function \(\lnLtt(Φ)\), and track it more tightly when variability is low.
Paths should be “temporally uncorrelated”: each stop \(Φ\) corresponds to a different ensemble of data points, which can be drawn in any order. So we don’t expect any special correlation between \(\lnLh(Φ)\) and \(\lnLh(Φ')\), beyond the requirement of monotonicity.
- In particular, we want to avoid defining a stochastic process which starts at one point and accumulates variance, like the \(\sqrt{t}\) envelope characteristic of a Gaussian white noise.
- Concretely, we require the process to be “\(Φ\)-symmetric”: replacing \(\lnLtt(Φ) \to \lnLtt(-Φ)\) and \(\emdstd(Φ) \to \emdstd(-Φ)\) should define the same process, just inverted along the \(Φ\) axis.

Hierarchical beta process#

Because the joint requirements of monotonicity, non-stationarity and \(Φ\)-symmetry are uncommon for a stochastic process, some care is required to define an appropriate \(\pathP\). The approach we choose here is to first select the end points \(\lnLh(0)\) and \(\lnLh(1)\), then fill the interval by successive binary partitions: first \(\bigl\{\lnLh\bigl(\tfrac{1}{2}\bigl)\bigr\}\), then \(\bigl\{\lnLh\bigl(\tfrac{1}{4}\bigr), \lnLh\bigl(\tfrac{3}{4}\bigr)\bigr\}\), \(\bigl\{\lnLh\bigl(\tfrac{1}{8}\bigr), \lnLh\bigl(\tfrac{3}{8}\bigr), \lnLh\bigl(\tfrac{5}{8}\bigr), \lnLh\bigl(\tfrac{7}{8}\bigr)\bigr\}\), etc. (Below we will denote these ensembles \(\{\lnLh\}^{(1)}\), \(\{\lnLh\}^{(2)}\), \(\{\lnLh\}^{(3)}\), etc.) Thus integrating over paths becomes akin to a path integral with variable end points. Moreover, instead of drawing quantile values, we draw increments

(1)#\[Δ q_{ΔΦ}(Φ) := \lnLh(Φ+ΔΦ) - \lnLh(Φ) \,.\]

Given two initial end points \(\lnLh(0)\) and \(\lnLh(1)\), we therefore first we draw the pair \(\bigl\{Δ q_{2^{-1}}(0),\; Δ q_{2^{-1}}\bigl(2^{-1}\bigr)\}\), which gives us

\[\lnLh\bigl(2^{-1}\bigr) = \lnLh(0) + Δ q_{2^{-1}}(0) = \lnLh(1) - Δ q_{2^{-1}}\bigl(2^{-1}\bigr)\,.\]

Then \(\bigl\{\lnLh(0), \lnLh\bigl(\frac{1}{2}\bigr) \bigr\}\) and \(\bigl\{ \lnLh\bigl(\frac{1}{2}\bigr), \lnLh(1) \bigr\}\) serve as end points to draw \(\bigl\{Δ q_{2^{-2}}\bigl(0\bigr),\; Δ q_{2^{-2}}\bigl(2^{-2}\bigr) \bigr\}\) and \(\bigl\{Δ q_{2^{-2}}\bigl(2^{-1}\bigr),\; Δ q_{2^{-2}}\bigl(2^{-1} + 2^{-2}\bigr) \bigr\}\). We repeat the procedure as needed, sampling smaller and smaller incremenents, until the path has the desired resolution. As the increments are constrained:

\[Δ q_{2^{-n}}(Φ) \in \bigl( 0, \lnLh(Φ+2^{-n+1}) - \lnLh(Φ)\,\bigr)\,,\]

the path thus sampled is always monotone. Note also that increments must be drawn in pairs (or more generally as a combination) of values constrained by their sum:

(2)#\[Δ q_{2^{-n}}\bigl(Φ\bigr) + Δ q_{2^{-n}}\bigl(Φ + 2^{-n} \bigr) \stackrel{!}{=} \lnLh(Φ+2^{-n+1}) - \lnLh(Φ) \,.\]

The possible increments therefore lie on a 1-simplex, for which a natural choice is to use a beta distribution[1], with the random variable corresponding to the first increment \(Δ q_{2^{-n}}(Φ)\). The density function of a beta random variable has the form

(3)#\[p(x_1) \propto x^{α-1} (1-x)^{β-1}\,,\]

with \(α\) and \(β\) parameters to be determined.

Important

An essential property of a stochastic process is consistency: it must not matter exactly how we discretize the interval [1]. Let \(\{\lnLh\}^{(n)}\) denote the steps which are added when we refine the discretization from steps of \(2^{-n+1}\) to steps of \(2^{-n}\):

(4)#\[\{\lnLh\}^{(n)} := \bigl\{\lnLh(k\cdot 2^{-n}) \,\big|\, k=1,3,\dotsc,2^n \bigr\} \,.\]

A necessary condition for consistency is that coarsening the discretization from steps of \(2^{-n}\) to steps of \(2^{-n+1}\) (i.e. marginalizing over the points at \(\{\lnLh\}^{(n)}\)) does not substantially change the probability law:

(5)#\[p\bigl(\{\lnLh\}^{(n)}\bigr)\bigr) \stackrel{!}{=} \int p\bigl(\{\lnLh\}^{(n)} \,\big|\, \{\lnLh\}^{(n+1)}\bigr) \,d\{\lnLh\}^{(n+1)} \;+\; ε\,,\]

with \(ε\) vanishing as \(n\) is increased to infinity.

Satisfying this requirement is required in order to compute integrals over \(\lnLh\), since otherwise they don’t converge as \(ΔΦ\) is reduced.

Conditions for choosing the beta parameters#

To draw an increment \(Δ q_{2^{-n}}\), we need to convert \(\lnLtt(Φ)\) and \(\emdstd(Φ)\) into beta distribution parameters \(α\) and \(β\). If \(x_1\) follows a beta distribution, then its first two cumulants are given by

\[\begin{split}\begin{aligned} x_1 &\sim \Beta(α, β) \,, \\ \EE[x_1] &= \frac{α}{α+β} \,, \\ \VV[x_1] &= \frac{αβ}{(α+β)^2(α+β+1)} \,. \\ \end{aligned}\end{split}\]

However, as discussed by Mateu-Figueras et al. (2021, 2011), to properly account for the geometry of a simplex, one should use instead statistics with respect to the Aitchison measure, sometimes referred to as the center and metric variance. Defining \(x_2 = 1-x_1\), these can be written (Mateu-Figueras et al., 2021)

(6)#\[\begin{split}\begin{align} \EE_a[(x_1, x_2)] &= \frac{1}{e^{ψ(α)} + e^{ψ(β)}} \bigl[e^{ψ(α)}, e^{ψ(β)}\bigr] \,, \\ \Mvar[(x_1, x_2)] &= \frac{1}{2} \bigl(ψ_1(α) + ψ_1(β)\bigr) \,. \\ \end{align}\end{split}\]

Here \(ψ\) and \(ψ_1\) are the digamma and trigamma functions respectively. (In addition to being more appropriate, the center and metric variance are also better suited for defining a consistent stochastic process. For example, since the metric variance is unbounded, we can always scale it with \(\emdstd(Φ)\) without exceeding its domain.)

Since we want the sum to be \(d := \lnLh(Φ+2^{-n+1}) - \lnLh(Φ)\), we define

(7)#\[\bigl[Δ q_{2^{-n}}\bigl(Φ\bigr),\, Δ q_{2^{-n}}\bigl(Φ+2^{-n})\bigr)\bigr] = d \bigl[x_1, x_2\bigr] \,.\]

Then

\[\begin{split}\begin{aligned} \EE_a\Bigl[\bigl[Δ q_{2^{-n}}\bigl(Φ\bigr),\, Δ q_{2^{-n}}\bigl(Φ+2^{-n}\bigr)\bigr]\Bigr] &= \frac{d}{e^{ψ(α)} + e^{ψ(β)}} \bigl[e^{ψ(α)}, e^{ψ(β)}\bigr] \,, \\ \Mvar\Bigl[\bigl[Δ q_{2^{-n}}\bigl(Φ\bigr),\, Δ q_{2^{-n}}\bigl(Φ+2^{-n}\bigr)\bigr]\Bigr] &= \frac{1}{2} \bigl(ψ_1(α) + ψ_1(β)\bigr) \,. \end{aligned}\end{split}\]

We now choose to define the parameters \(α\) and \(β\) via the following relations:

(8)#\[\begin{split}\begin{aligned} \EE_a\Bigl[\bigl[Δ q_{2^{-n}}\bigl(Φ\bigr),\, Δ q_{2^{-n}}\bigl(Φ+2^{-n}\bigr)\bigr]\Bigr] &\stackrel{!}{=}^* \bigl[\, \lnLtt\bigl(Φ+2^{-n}\bigr) - \lnLtt\bigl(Φ\bigr),\,\lnLtt\bigl(Φ+2^{-n+1}\bigr) - \lnLtt\bigl(Φ+2^{-n}\bigr) \,\bigr]\,, \\ \Mvar\Bigl[\bigl[Δ q_{2^{-n}}\bigl(Φ\bigr),\, Δ q_{2^{-n}}\bigl(Φ+2^{-n}\bigr)\bigr]\Bigr] &\stackrel{!}{=} \emdstd\bigl(Φ+2^{-n}\bigr)^2 \,. \end{aligned}\end{split}\]

These follow from interpretating \(\lnLtt\) and \(\emdstd\) as estimators for the center and square root of the metric variance. We use \(=^*\) to indicate equality in spirit rather than true equality, since strictly speaking, these are 3 equations for 2 unknown. To reduce the \(\EE_a\) equations to one, we use instead

(9)#\[\frac{\EE_a\bigl[Δ q_{2^{-n}}\bigl(Φ\bigr)\bigr]}{\EE_a \bigl[Δ q_{2^{-n}}\bigl(Φ+2^{-n}\bigr)\bigr]} \stackrel{!}{=} \frac{\lnLtt\bigl(Φ+2^{-n}\bigr) - \lnLtt\bigl(Φ\bigr)}{\lnLtt\bigl(Φ+2^{-n+1}\bigr) - \lnLtt\bigl(Φ+2^{-n}\bigr)} \,.\]

Remarks

We satisfy the necessary condition for consistency by construction:

\[p\bigl(\{l\}^{(n)}\bigr)\bigr) = \int p\bigl(\{l\}^{(n)} \,\big|\, \{l\}^{(n+1)}\bigr) \,d\{l\}^{(n+1)}\,.\]
The stochastic process is not Markovian, so successive increments are not independent. The variance of a larger increment therefore need not equal the sum of the variance of constituent smaller ones; in other words,

\[Δ q_{2^{-n+1}}\bigl(Φ\bigr) = Δ q_{2^{-n}}\bigl(Φ\bigr) + Δ q_{2^{-n}}\bigl(Φ+2^{-n}\bigr)\]

does not imply

\[\VV\bigl[Δ q_{2^{-n+1}}\bigl(Φ\bigr)\bigr] = \VV\bigl[Δ q_{2^{-n}}\bigl(Φ\bigr)\bigr] + \VV\bigl[Δ q_{2^{-n}}\bigl(Φ+2^{-n}\bigr)\bigr]\,.\]
Our defining equations make equivalent use of the pre (\(Δ q_{2^{-n}}(Φ)\)) and post (\(Δ q_{2^{-n}}(Φ+2^{-n})\)) increments, thus preserving symmetry in \(Φ\).
Step sizes of the form \(2^{-n}\) have exact representations in binary. Thus even small step sizes should not introduce additional numerical errors.

Formulation of the parameter equations as a root-finding problem#

Define

(10)#\[\begin{split}\begin{align} r &:= \frac{\lnLtt(Φ+2^{-n}) - \lnLtt(Φ)}{\lnLtt(Φ+2^{-n+1}) - \lnLtt(Φ+2^{-n})} \,; \\ v &:= 2 \emdstd\bigl(Φ + 2^{-n}\bigr)^2 \,. \end{align}\end{split}\]

The first value, \(r\), is the ratio of two subincrements within \(Δ q_{2^{-n+1}}(Φ)\). Setting \(\frac{e^{ψ(α)}}{e^{ψ(β)}} = r\), the two equations we need to solve for \(α\) and \(β\) can be written

(11)#\[\begin{split}\begin{align} ψ(α) - ψ(β) &= \ln r \,; \\ \ln\bigl[ ψ_1(α) + ψ_1(β) \bigr] &= \ln v \,. \end{align}\end{split}\]

Note that these equations are symmetric in \(Φ\): replacing \(Φ\) by \(-Φ\) simply changes the sign on both sides of the first. The use of the logarithm in the equation for \(v\) helps to stabilize the numerics.

def f(lnαβ, lnr_v, _array=np.array, _exp=np.exp, _log=np.log,
      digamma=scipy.special.digamma, polygamma=scipy.special.polygamma):
    α, β = _exp(lnαβ).reshape(2, -1)  # scipy's `root` always flattens inputs
    lnr, v = lnr_v
    return _array((
        digamma(α) - digamma(β) - lnr,
        _log(polygamma(1, α) + polygamma(1, β)) - _log(v)
    )).flat

def f_mid(lnα, v, _exp=np.exp, _log=np.log, polygamma=scipy.special.polygamma):
    "1-d equation, for the special case α=β (equiv to r=1)"
    return _log(2*polygamma(1, _exp(lnα))) - _log(v)

def jac(lnαβ, lnr_v):
    lnα, lnβ = lnαβ.reshape(2, -1)      # scipy's `root` always flattens inputs
    α, β = np.exp(lnαβ).reshape(2, -1)
    j = np.block([[np.diagflat(digamma(α)*lnα), np.diagflat(-digamma(β)*lnβ)],
                     [np.diagflat(polygamma(1,α)*lnα), np.diagflat(polygamma(1,β)*lnβ)]])
    return j

The functions \(ψ\) and \(ψ_1\) diverge at zero, so \(α\) and \(β\) should remain positive. Therefore it makes sense to fit their logarithm: this enforces the lower bound, and improves the resolution where the derivative is highest. The two objective functions (up to scalar shift) are plotted below: the region for low \(\ln α\) and \(\ln β\) shows sharp variation around \(α=β\), suggesting that this area may be most challenging for a numerical optimizer. In practice this is indeed what we observed.

We found however that we can make fits much more reliable by first choosing a suitable initialization point along the \(\ln α = \ln β\) diagonal. In practice this means setting \(α_0 = α = β\) and solving the first equation of Eqs. (11) for \(α_0\). (We use the implementation of Brent’s method in SciPy.) Then we can solve the full 2d problem of Eqs. (11), with \((α_0, α_0)\) as initial value. This procedure was successful for all values of \(r\) and \(v\) we encountered in our experiments.

../../_images/path-sampling_polygamma.svg — Fig. 1 Characterization of the digamma (\(ψ\)) and trigamma (\(ψ_1\)) functions, and of the metric variance \(\Mvar\).#

Plotting the eigenvalues of the Jacobian (specifically, the real part of its dominant eigenvalue) in fact highlights three regions with a center at roughly \((\ln α, \ln β) = (0, 0)\). (The Jacobian does not depend on \(r\) or \(v\), so this is true for all fit conditions).

Note that the color scale is clipped, to better resolve values near zero. Eigenvalues quickly increase by multiple orders of magnitude away from \((0,0)\).

Note

Empirically we found that initializing fits at \((0, 0)\) resulted in robust fits for a large number of \((r,v)\) tuples, even when \(r > 100\). We hypothesize that this is because it is difficult for the fit to move from one region to another; by initializing where the Jacobian is small, fits are able to find the desired values before getting stuck in the wrong region.

That being said, there are cases where \((0, 0)\) is not a good initial vector for the root solver is when \(\boldsymbol{r \approx 1}\). This can be resolved by choosing a better initial value along the \((α_0, α_0)\) diagonal, as described above. In practice we found no detriment to always using the 1d problem to select an initial vector, so we use that approach in all cases.

../../_images/path-sampling_jac-spectrum.svg — Fig. 2 **Objective function has a saddle-point around (0,0)** After rewriting Eqs. (11) in terms of \(\ln α\) and \(\ln β\), we compute the Jacobian \(J\). Plotted is the real part of the eigenvalue \(λ_i\) of \(J\) for which \(\lvert\mathop{\mathrm{Re}}(λ)\rvert\) is largest; this gives an indication of how quickly the fit moves away from a given point. In most cases, a root finding algorithm initialized at (0,0) will find a solution.#

Special cases for extreme values#

For extreme values of \(r\) or \(v\), the beta distribution becomes degenerate and numerical optimization may break. We identified four cases requiring special treatment.

\(\boldsymbol{r \to 0}\)

The corresponds to stating that \(Δ q_{2^{-n}}(Φ)\) is infinitely smaller than \(Δ q_{2^{-n}}(Φ+2^{-n})\). Thus we set \(x_1 = 1\), which is equivalent to setting

\[\begin{split}\begin{aligned} Δ q_{2^{-n}}(Φ) &= 0 \,, \\ Δ q_{2^{-n}}(Φ+2^{-n}) &= \lnLtt(Φ+2^{-n+1}) - \lnLtt(Φ) \,. \end{aligned}\end{split}\]

\(\boldsymbol{r \to \infty}\)

The converse of the previous case: \(Δ q_{2^{-n}}(Φ)\) is infinitely larger than \(Δ q_{2^{-n}}(Φ+2^{-n})\). We set \(x_1 = 0\).

\(\boldsymbol{v \to 0}\)

As \(v\) vanishes, the distribution for \(x_1\) approaches a Dirac delta centered on \(\tfrac{1}{r+1}\). In our implementation, we replace \(x_1\) by a constant when \(v < 10^{-8}\).

\(\boldsymbol{v \to \infty}\)

Having \(v\) go to infinity requires that \(α\) and/or \(β\) go to \(0\) (see Eq. (11) and Fig. 1). The probability density of \(x_1\) is then a Dirac delta: placed at \(x_1=0\) if \(α \to 0\), or placed at \(x_1 = 1\) if \(β \to 0\). If both \(α\) and \(β\) go to \(0\), the PDF must be the sum of two weighted deltas:

\[p(x_1) = w_0 δ(x_1 - 0) + w_1 δ(x_1 - 1) \,.\]

The weights \(w_i\) can be determined by requiring that

\[\EE[x_1] = r \,,\]

which yields

\[\begin{aligned} w_1 &= \frac{r}{r+1}\,, & w_2 &= \frac{1}{r+1} \,. \end{aligned}\]

(For this special case, we revert to writing the condition in terms of a standard (Lebesgue) expectation, since the center (Eq. (6)) is undefined when \(α, β \to 0\).)

Since we have already considered the special cases \(r = 0\) and \(r \to \infty\), we can assume \(0 < r < \infty\). Then both \(α\) and \(β\) are zero, and \(x_1\) should be a Bernoulli random variable with success probability \(p = w_2 = \frac{1}{r+1}\).

def get_beta_rv(r: Real, v: Real) -> Tuple[float]:
    """
    Return α and β corresponding to `r` and `v`.
    This function is not exported: it is used only for illustrative purposes
    within the notebook.
    """
    # Special cases for extreme values of r
    if r < 1e-12:
        return scipy.stats.bernoulli(0)  # Dirac delta at 0
    elif r > 1e12:
        return scipy.stats.bernoulli(1)  # Dirac delta at 1
    # Special cases for extreme values of v
    elif v < 1e-8:
        return get_beta_rv(r, 1e-8)
    elif v > 1e4:
        # (Actual draw function replaces beta by a Bernoulli in this case)
        return scipy.stats.bernoulli(1/(r+1))
    
    # if v < 1e-6:
    #     # Some unlucky values, like r=2.2715995006941436, v=6.278153793994013e-08,
    #     # seem to be particularly pathological for the root solver.
    #     # At least in the case above, the function is continuous at those values
    #     # (±ε returns very similar values for a and b).
    #     # Replacing these by nearby values which are more friendly to binary representation
    #     # seems to help.
    #     v = digitize(v, rtol=1e-5, show=False)
    
    # if 0.25 < r < 4:
    #     # Special case for r ≈ 1: improve initialization by first solving r=1 <=> α=β
    # Improve initialization by first solving r=1 <=> α=β
    x0 = scipy.optimize.brentq(f_mid, -5, 20, args=(v,))
    x0 = (x0, x0)
    # else:
    #     # Normal case: Initialize fit at (α, β) = (1, 1)
    #     x0 = (0, 0)
    res = scipy.optimize.root(f, x0, args=[math.log(r), v])
    if not res.success:
        logger.error("Failed to determine α & β parameters for beta distribution. "
                     f"Conditions were:\n  {r=}\n{v=}")
    α, β = np.exp(res.x)
    return scipy.stats.beta(α, β)

def scipy_mvroot_solver(fun, x0, args, method, root=scipy.optimize.root):
    res = root(fun, x0, args, method)
    return res.x, res.success

Show code cell source

Hide code cell source

def _draw_from_beta_scalar(r: Real, v: Real, rng: RNGenerator, n_samples: int=1,
                           *, _log=math.log, _exp=np.exp, _shape=np.shape,
                           bracketed_solver=scipy.optimize.brentq, mvroot_solver=scipy_mvroot_solver
                          ) -> Tuple[float]:
    rng = np.random.default_rng(rng)  # No-op if `rng` is already a Generator
    size = None if n_samples == 1 else (*_shape(r), n_samples)
    # Special cases for extreme values of r
    if r < 1e-12:
        special_val = 1           # EXIT AT END
    elif r > 1e12:
        special_val = 0           # EXIT AT END
    # Special cases for extreme values of v
    elif v < 1e-8:
        special_val = 1 / (1+r)   # EXIT AT END
    elif v > 1e4:
        # Replace beta by a Bernouilli distribution
        return rng.binomial(1, 1/(r+1), size=size)
    
    # Normal case
    else:

        # if v < 1e-6:
        #     # Some unlucky values, like r=2.2715995006941436, v=6.278153793994013e-08,
        #     # seem to be particularly pathological for the root solver.
        #     # At least in the case above, the function is continuous at those values
        #     # (±ε returns very similar values for a and b).
        #     # Replacing these by nearby values which are more friendly to binary representation
        #     # seems to help.
        #     v = digitize(v, rtol=1e-5, show=False)
        
        # if 0.25 < r < 4:
        #     # Special case for r ≈ 1 and 1 < v < 1e5: Initialize on the α=β line
        #     # In this case the initialization (0,0) is unstable, so we 
        # First find a better initialization by solving for the 1d case
        # where r=1 and therefore α=β.
        # (The limits where the normal case fails are around (r=1/3, v=1e4) and (r=3, v=1e4)
        # NB: The values -5 and 20 are slightly beyond the special case limits 5e-8 < v < 1e4 set above;
        #     since also the trigamma function is monotone, this should always find a solution.
        x0 = bracketed_solver(f_mid, -5, 20, args=(v,))
        
        x0 = (x0, x0)
        # else:
        #     # Normal case: Initialize fit at log(α, β) = (1, 1)
        #     x0 = (0., 0.)
        x, success = mvroot_solver(f, x0, args=[_log(r), v], method="hybr")
        if not success:
            logger.error("Failed to determine α & β parameters for beta distribution. "
                         f"Conditions were:\n  {r=}\n  {v=}")
        α, β = _exp(x)
        return rng.beta(α, β, size=size)
    
    # Finally, if `size` was passed, ensure result has the right shape
    # NB: We only reach this point if we go through one of the 3 first special cases
    if size:
        return np.array(special_val)[...,None].repeat(n_samples, axis=-1)
    else:
        return special_val

def draw_from_beta(r: Union[Real,Array[float,1]],
                   v: Union[Real,Array[float,1]],
                   rng: Optional[RNGenerator]=None,
                   n_samples: int=1
                  ) -> Tuple[float]:
    """
    Return α, β for a beta distribution with a metric variance `v` and center
    biased by `r`. More precisely, `r` is the ratio of the lengths ``c`` and
    ``1-c``, where ``c`` is the center.
    
    `r` and `v` may either be scalars or arrays
    """
    rng = np.random.default_rng(rng)  # No-op if `rng` is already a Generator
    
    if hasattr(r, "__iter__"):
        return np.array([_draw_from_beta_scalar(_r, _v, rng, n_samples)
                          for _r, _v in zip(r, v)])
    else:
        return _draw_from_beta_scalar(r, v, rng, n_samples)

Examples of different fitted beta distributions#

Plotted below are the beta distributions for different values of \(r\) and \(v\).

Show code cell source

Hide code cell source

%%opts Curve [title="Fitted beta distributions", ylim=(None,7)]
%%opts Table [title="Empirical statistics (4000 samples)"]
%%opts Layout [sublabel_format=""]

curves = {}
stats = {}
xarr = np.linspace(0, 1, 400)
for (r, v), c in zip(
            [(0.2, 1e-32), (0.2, 1e-16), (0.2, 1e-8), (0.2, 1e-4), (0.2, 1e-2), (0.2, 0.5),
             (0.5, 0.5), (0.5, 0.1), (0.5, 1),
             (1, 0.5), (1, 1e1), (1, 30), (1, 50), (1, 70), (1, 1e2), (1, 1e3), (1, 1e5), (1, 1e6),
             (5, 0.5), (5, 8), (5, 1e4), (5, 2e4), (5, 4e4), (5, 1e6), (5, 1e8), (5, 1e16), (5, 1e32),
             (6.24122778821756, 414.7130462762959),
             (2.2715995006941436, 6.278153793994013e-08),
             (2.271457193328191, 6.075242708902806e-08),
             (2.269182419251242, 6.794061846449025e-08),
             (2.691033486949275e-17, 0.02930210834055045)
            ],
        itertools.cycle(config.viz.colors.bright.cycle)):
    rv = get_beta_rv(r, v)
    if isinstance(rv.dist, scipy.stats.rv_discrete):
        # Dirac delta distribution
        p, = rv.args
        if p == 0:
            α, β = np.inf, 0
        elif p == 1:
            α, β = 0, np.inf
        else:
            α, β = 0, 0
    else:
        # rv is a beta random variable
        α, β = rv.args
    # α, β = get_beta_α_β(r, v)
    x = draw_from_beta(r, v, n_samples=4000)
    # rv = beta(α, β)
    # x = rv.rvs(4000)
    pdf = rv.pmf(xarr) if isinstance(rv.dist, scipy.stats.rv_discrete) else rv.pdf(xarr)
    curves[(r,v)] = hv.Curve(zip(xarr, pdf), label=f"{r=}, {v=}",
                             kdims=[hv.Dimension("x1", label="$x_1$")],
                             vdims=[hv.Dimension("px1", label="p($x_1$)")]  # non-TeX brackets to avoid legend issue 
                            ).opts(color=c)
    with warnings.catch_warnings():
        warnings.filterwarnings("ignore", "invalid value encountered", category=RuntimeWarning)
        stats[(r,v)] = tuple(f"{y:.3f}" for y in
                              (α, β, x.mean(), x.std(),
                               np.exp(digamma(α)) / (np.exp(digamma(α)) + np.exp(digamma(β))),
                               0.5 * (polygamma(1, α) + polygamma(1, β))
                              ))

hmap = hv.HoloMap(curves, kdims=["r", "v"])
dists = hmap.overlay()
dists.opts(legend_position="right")
dists.opts(width=500, backend="bokeh")
dists.opts(fig_inches=7, aspect=2.5, legend_cols=2, backend="matplotlib")

/tmp/ipykernel_2413/3907032844.py:1: HoloviewsDeprecationWarning: IPython magic is deprecated and will be removed in version 1.23.0.
  get_ipython().run_cell_magic('opts', 'Curve [title="Fitted beta distributions", ylim=(None,7)]', '%%opts Table [title="Empirical statistics (4000 samples)"]\n%%opts Layout [sublabel_format=""]\n\ncurves = {}\nstats = {}\nxarr = np.linspace(0, 1, 400)\nfor (r, v), c in zip(\n            [(0.2, 1e-32), (0.2, 1e-16), (0.2, 1e-8), (0.2, 1e-4), (0.2, 1e-2), (0.2, 0.5),\n             (0.5, 0.5), (0.5, 0.1), (0.5, 1),\n             (1, 0.5), (1, 1e1), (1, 30), (1, 50), (1, 70), (1, 1e2), (1, 1e3), (1, 1e5), (1, 1e6),\n             (5, 0.5), (5, 8), (5, 1e4), (5, 2e4), (5, 4e4), (5, 1e6), (5, 1e8), (5, 1e16), (5, 1e32),\n             (6.24122778821756, 414.7130462762959),\n             (2.2715995006941436, 6.278153793994013e-08),\n             (2.271457193328191, 6.075242708902806e-08),\n             (2.269182419251242, 6.794061846449025e-08),\n             (2.691033486949275e-17, 0.02930210834055045)\n            ],\n        itertools.cycle(config.viz.colors.bright.cycle)):\n    rv = get_beta_rv(r, v)\n    if isinstance(rv.dist, scipy.stats.rv_discrete):\n        # Dirac delta distribution\n        p, = rv.args\n        if p == 0:\n            α, β = np.inf, 0\n        elif p == 1:\n            α, β = 0, np.inf\n        else:\n            α, β = 0, 0\n    else:\n        # rv is a beta random variable\n        α, β = rv.args\n    # α, β = get_beta_α_β(r, v)\n    x = draw_from_beta(r, v, n_samples=4000)\n    # rv = beta(α, β)\n    # x = rv.rvs(4000)\n    pdf = rv.pmf(xarr) if isinstance(rv.dist, scipy.stats.rv_discrete) else rv.pdf(xarr)\n    curves[(r,v)] = hv.Curve(zip(xarr, pdf), label=f"{r=}, {v=}",\n                             kdims=[hv.Dimension("x1", label="$x_1$")],\n                             vdims=[hv.Dimension("px1", label="p($x_1$)")]  # non-TeX brackets to avoid legend issue \n                            ).opts(color=c)\n    with warnings.catch_warnings():\n        warnings.filterwarnings("ignore", "invalid value encountered", category=RuntimeWarning)\n        stats[(r,v)] = tuple(f"{y:.3f}" for y in\n                              (α, β, x.mean(), x.std(),\n                               np.exp(digamma(α)) / (np.exp(digamma(α)) + np.exp(digamma(β))),\n                               0.5 * (polygamma(1, α) + polygamma(1, β))\n                              ))\n\nhmap = hv.HoloMap(curves, kdims=["r", "v"])\ndists = hmap.overlay()\ndists.opts(legend_position="right")\ndists.opts(width=500, backend="bokeh")\ndists.opts(fig_inches=7, aspect=2.5, legend_cols=2, backend="matplotlib")\n')

Statistics for the fitted beta distributions. \(\mathbb{E}[x_1]\) and \(\mathrm{std}[x_1]\) are computed from 4000 samples. \(\mathbb{E}_a[x_1]\) and \(\mathrm{Mvar}[x_1,x_2]\) are computed using the expressions above.

Table 1 **Timing `_draw_from_beta_scalar`.** Timings marked *“JAX (imperative)”* use an implementation very near the NumPy one, with just minimal changes (replacement of np with jnp, RNG keys, etc.). In particular, the imperative version still uses `brentq`. Timings marked *“JAX (functional)”* use the more substantial rewrite still present above, which if in theory could be compiled if the multivariate root solver (`jaxopt_mvroot_solver`) was jittable. Using `scipy_mvroot_solver` instead results in very similar times.#
			No JIT		JIT-ed *gamma		JIT-ed bisec
L	r	v	NumPy	JAX (imperative)	NumPy	JAX (imperative)	JAX (functional)
1	0.2	1e-32	1.14 μs ± 16.2 ns (±2%)	508 ns ± 0.614 ns (±2%)	1.1 μs ± 0.677 ns (±4%)	501 ns ± 0.388 ns (±1%)	1.57 ms ± 59.6 µs
1	0.2	1e-16	1.11 μs ± 2.95 ns (±2%)	507 ns ± 0.56 ns (±0%)	1.1 μs ± 5.78 ns (±1%)	501 ns ± 0.651 ns (±0%)	1.55 ms ± 13.9 µs
1	0.2	1e-08	550 μs ± 6.47 μs (±1%)	2.52 ms ± 12.4 μs (±1%)	560 μs ± 782 ns (±3%)	2.07 ms ± 16.8 μs (±5%)	62.3 ms ± 489 µs
1	0.2	0.0001	717 μs ± 12 μs (±1%)	2.84 ms ± 7.44 μs (±3%)	698 μs ± 2.97 μs (±1%)	2.22 ms ± 17.4 μs (±2%)	75.6 ms ± 556 µs
1	0.2	0.01	737 μs ± 8.1 μs (±1%)	2.91 ms ± 8.94 μs (±1%)	735 μs ± 5.79 μs (±3%)	2.23 ms ± 12.4 μs (±3%)	76.5 ms ± 1.42 ms
1	0.2	0.5	804 μs ± 6.82 μs (±1%)	3.02 ms ± 16.4 μs (±1%)	805 μs ± 16.5 μs (±2%)	2.31 ms ± 19.4 μs (±1%)	83.3 ms ± 1.88 ms
1	0.5	0.5	757 μs ± 8.25 μs (±1%)	2.88 ms ± 10.5 μs (±7%)	764 μs ± 857 ns (±1%)	2.26 ms ± 12 μs (±1%)	76.1 ms ± 886 µs
1	0.5	0.1	735 μs ± 10.2 μs (±2%)	2.86 ms ± 5.1 μs (±2%)	754 μs ± 3.06 μs (±2%)	2.28 ms ± 42 μs (±3%)	75.7 ms ± 496 µs
1	0.5	1.0	736 μs ± 8.01 μs (±1%)	2.86 ms ± 11.1 μs (±3%)	727 μs ± 3.3 μs (±4%)	2.26 ms ± 21.9 μs (±0%)	75.6 ms ± 406 µs
1	1.0	0.5	512 μs ± 6.09 μs (±2%)	2.36 ms ± 9 μs (±2%)	559 μs ± 2 μs (±2%)	2.06 ms ± 13 μs (±2%)	49.2 ms ± 517 µs
1	1.0	10.0	516 μs ± 7.56 μs (±2%)	2.36 ms ± 7.87 μs (±2%)	557 μs ± 2.22 μs (±3%)	2.05 ms ± 19.1 μs (±2%)	49.1 ms ± 301 µs
1	1.0	100.0	479 μs ± 7.3 μs (±1%)	2.36 ms ± 15.4 μs (±1%)	498 μs ± 1.66 μs (±2%)	1.99 ms ± 17.1 μs (±1%)	48.9 ms ± 252 µs
1	1.0	1000.0	461 μs ± 10.3 μs (±2%)	2.31 ms ± 2.92 μs (±2%)	483 μs ± 3.23 μs (±1%)	1.98 ms ± 8.52 μs (±1%)	49 ms ± 312 µs
1	1.0	100000.0	2.11 μs ± 62.1 ns (±8%)	810 μs ± 1.51 μs (±1%)	2.07 μs ± 33.3 ns (±4%)	810 μs ± 2.65 μs (±2%)	1.29 ms ± 19.6 µs
1	1.0	1000000.0	2.07 μs ± 18.5 ns (±4%)	811 μs ± 2.22 μs (±1%)	2.05 μs ± 15.6 ns (±3%)	809 μs ± 1.35 μs (±0%)	1.28 ms ± 5.65 µs
1	5.0	0.5	805 μs ± 10 μs (±1%)	3.02 ms ± 4.95 μs (±2%)	790 μs ± 628 ns (±5%)	2.27 ms ± 8.23 μs (±2%)	85.4 ms ± 2.6 ms
1	5.0	8.0	853 μs ± 11.5 μs (±0%)	3.16 ms ± 34.4 μs (±1%)	829 μs ± 3.2 μs (±2%)	2.31 ms ± 17.3 μs (±1%)	90.6 ms ± 1.18 ms
1	5.0	10000.0	587 μs ± 9.9 μs (±3%)	2.67 ms ± 28.2 μs (±0%)	573 μs ± 990 ns (±1%)	2.07 ms ± 18.3 μs (±1%)	62 ms ± 490 µs
1	5.0	20000.0	2.06 μs ± 19.8 ns (±2%)	813 μs ± 5.46 μs (±2%)	2.05 μs ± 16.6 ns (±1%)	808 μs ± 1.38 μs (±2%)	1.28 ms ± 6 µs
1	5.0	40000.0	2.16 μs ± 10.2 ns (±3%)	819 μs ± 7.11 μs (±1%)	2.07 μs ± 9.88 ns (±3%)	806 μs ± 1.95 μs (±0%)	1.28 ms ± 11.1 µs
1	5.0	1000000.0	2.06 μs ± 13.3 ns (±3%)	834 μs ± 1.26 μs (±2%)	2.06 μs ± 15.5 ns (±2%)	814 μs ± 2.21 μs (±1%)	1.28 ms ± 13.6 µs
1	5.0	100000000.0	2.08 μs ± 13 ns (±3%)	835 μs ± 1.88 μs (±2%)	2.05 μs ± 12 ns (±2%)	813 μs ± 1.25 μs (±1%)	1.27 ms ± 10.6 µs
1	5.0	1e+16	2.06 μs ± 24 ns (±3%)	827 μs ± 2.65 μs (±5%)	2.05 μs ± 15.4 ns (±4%)	812 μs ± 2.74 μs (±0%)	1.27 ms ± 5.78 µs
1	5.0	1e+32	2.11 μs ± 38.9 ns (±2%)	822 μs ± 9.73 μs (±1%)	2.06 μs ± 13.8 ns (±6%)	807 μs ± 2.03 μs (±1%)	1.28 ms ± 33.4 µs

Implementation#

Generate a single path#

Now that we know how to construct an sampling distribution for the increments, sampling an entire path is just a matter of repeating the process recursively until we reach the desired resolution.

def generate_path_hierarchical_beta(
        qstar: Callable, deltaEMD: Callable, c: float,
        qstart: float, qend: float, res: int=8, rng=None,
        *, Phistart: float=0., Phiend: float=1.
    ) -> Tuple[Array[float,1], Array[float,1]]:
    """
    Returned path has length ``2**res + 1``.
    If `qstar` and`Mvar` have a different length, they are linearly-
    interpolated to align with the returned array `Φhat`.
    
    Parameters
    ----------
    qstar: Empirical (mixed) quantile function. The generated path will
       be centered on this trace. Although any callable mapping [0, 1] to R
       is accepted, in all practical use cases this will be the output of
       :func:`emd.make_empirical_risk_ppf`.
    deltaEMD: Callable returning discrepancy values (i.e. {math}`δ^\mathrm{EMD}`).
       Like `qstar`, this must map [0, 1] to R+.
    c: Proportionality constant relating δEMD to
       the square root of the metric variance.
    qstart, qend: The end point ordinates. The hierarchical beta process "fills in"
       the space between a start and an end point, but it needs the value of _q_
       to be given at those points. These end points are by default Φ=0 and Φ=1,
       which correspond to the bounds of a quantile function.
    res: Returned paths have length ``2**res+1``, and therefore ``2**res`` increments.
       Typical values of `res` are 6, 7 and 8, corresponding to paths of length
       64, 128 and 256. Smaller may be useful to accelerate debugging. Larger values
       increase the computation cost with typically negligible improvements in accuracy.
       Must be at least 1.
    rng: Any argument accepted by `numpy.random.default_rng` to initialize an RNG.
    Phistart, Phiend: (Keyword only) The abscissa corresponding to the ordinates
       `qstart` and `qend`. The default values are 0 and 1, appropriate for generating
       a quantile function.
    
    Returns
    -------
    The pair Φhat, qhat.
        Φhat: Array of equally spaced values between 0 and 1, with step size ``2**-res``.
        qhat: The generated path, evaluated at the values listed in `Φhat`.
    """
    # Validation
    res = int(res)  # sourcery skip: remove-unnecessary-cast
    if Phistart >= Phiend:
        raise ValueError("`Phistart` must be strictly smaller than `Phiend`. "
                         f"Received:\n  {Phistart=}\n  {Phiend=}")
    # if not (len(Phi) == len(qstar) == len(Mvar)):
    #     raise ValueError("`Phi`, `qstar` and `Mvar` must all have "
    #                      "the same shape. Values received have the respective shapes "
    #                      f"{np.shape(Phi)}, {np.shape(qstar)}, {np.shape(Mvar)}")
    if res < 1:
        raise ValueError("`res` must be greater or equal to 1.")
    rng = np.random.default_rng(rng)  # No-op if `rng` is already a Generator
    # Interpolation
    N  = 2**res + 1
    Φarr = np.linspace(Phistart, Phiend, N)
    qsarr = qstar(Φarr)
    # Pre-computations
    Mvar = c * deltaEMD(Φarr)**2
    # Algorithm
    qhat = np.empty(N)
    qhat[0] = qstart
    qhat[-1] = qend
    for n in range(1, res+1):
        Δi = 2**(res-n)
        i = np.arange(Δi, N, 2*Δi)  # Indices for the new values to insert
        d = qhat[i+Δi] - qhat[i-Δi] # Each pair of increments must sum to `d`
        r = (qsarr[i] - qsarr[i-Δi]) / (qsarr[i+Δi]-qsarr[i])  # Ratio of first/second increments
        v = 2*Mvar[i]
        # Prevent failure in pathological cases where `q` is flat in some places
        if ((qsarr[i+Δi] - qsarr[i-Δi]) == 0).any():   # If both increments are zero, then the calculation for `r` gives 0/0 => NaN
            logger.warning("The quantile function is not strictly increasing. Non-increasing intervals were skipped. The sampling of EMD paths may be less accurate.")
            keep = ((qsarr[i+Δi] - qsarr[i-Δi]) > 0)  # `draw_from_beta` can deal with `r=0` and `r=inf`, but not `r=NaN`.
            x1 = np.zeros_like(r)
            x1[keep] = draw_from_beta(r[keep], v[keep], rng=rng)
        else:
            x1 = draw_from_beta(r, v, rng=rng)
        qhat[i] = qhat[i-Δi] + d * x1
    return Φarr, qhat

Generate ensemble of sample paths#