\(\require{mathtools} \newcommand{\notag}{} \newcommand{\tag}{} \newcommand{\label}[1]{} \newcommand{\sfrac}[2]{#1/#2} \newcommand{\bm}[1]{\boldsymbol{#1}} \newcommand{\num}[1]{#1} \newcommand{\qty}[2]{#1\,#2} \renewenvironment{align} {\begin{aligned}} {\end{aligned}} \renewenvironment{alignat} {\begin{alignedat}} {\end{alignedat}} \newcommand{\pdfmspace}[1]{} % Ignore PDF-only spacing commands \newcommand{\htmlmspace}[1]{\mspace{#1}} % Ignore PDF-only spacing commands \newcommand{\scaleto}[2]{#1} % Allow to use scaleto from scalerel package \newcommand{\RR}{\mathbb R} \newcommand{\NN}{\mathbb N} \newcommand{\PP}{\mathbb P} \newcommand{\EE}{\mathbb E} \newcommand{\XX}{\mathbb X} \newcommand{\ZZ}{\mathbb Z} \newcommand{\QQ}{\mathbb Q} \newcommand{\fF}{\mathcal F} \newcommand{\dD}{\mathcal D} \newcommand{\lL}{\mathcal L} \newcommand{\gG}{\mathcal G} \newcommand{\hH}{\mathcal H} \newcommand{\nN}{\mathcal N} \newcommand{\pP}{\mathcal P} \newcommand{\BB}{\mathbb B} \newcommand{\Exp}{\operatorname{Exp}} \newcommand{\Binomial}{\operatorname{Binomial}} \newcommand{\Poisson}{\operatorname{Poisson}} \newcommand{\linop}{\mathcal{L}(\mathbb{B})} \newcommand{\linopell}{\mathcal{L}(\ell_1)} \DeclareMathOperator{\trace}{trace} \DeclareMathOperator{\Var}{Var} \DeclareMathOperator{\Span}{span} \DeclareMathOperator{\proj}{proj} \DeclareMathOperator{\col}{col} \DeclareMathOperator*{\argmin}{arg\,min} \DeclareMathOperator*{\argmax}{arg\,max} \DeclareMathOperator*{\gt}{>} \definecolor{highlight-blue}{RGB}{0,123,255} % definition, theorem, proposition \definecolor{highlight-yellow}{RGB}{255,193,7} % lemma, conjecture, example \definecolor{highlight-orange}{RGB}{253,126,20} % criterion, corollary, property \definecolor{highlight-red}{RGB}{220,53,69} % criterion \newcommand{\logL}{\ell} \newcommand{\eE}{\mathcal{E}} \newcommand{\oO}{\mathcal{O}} \newcommand{\defeq}{\stackrel{\mathrm{def}}{=}} \newcommand{\Bspec}{\mathcal{B}} % Spectral radiance \newcommand{\X}{\mathcal{X}} % X space \newcommand{\Y}{\mathcal{Y}} % Y space \newcommand{\M}{\mathcal{M}} % Model \newcommand{\Tspace}{\mathcal{T}} \newcommand{\Vspace}{\mathcal{V}} \newcommand{\Mtrue}{\mathcal{M}_{\mathrm{true}}} \newcommand{\MP}{\M_{\mathrm{P}}} \newcommand{\MRJ}{\M_{\mathrm{RJ}}} \newcommand{\qproc}{\mathfrak{Q}} \newcommand{\D}{\mathcal{D}} % Data (true or generic) \newcommand{\Dt}{\tilde{\mathcal{D}}} \newcommand{\Phit}{\widetilde{\Phi}} \newcommand{\Phis}{\Phi^*} \newcommand{\qt}{\tilde{q}} \newcommand{\qs}{q^*} \newcommand{\qh}{\hat{q}} \newcommand{\AB}[1]{\mathtt{AB}~\mathtt{#1}} \newcommand{\LP}[1]{\mathtt{LP}~\mathtt{#1}} \newcommand{\NML}{\mathrm{NML}} \newcommand{\iI}{\mathcal{I}} \newcommand{\true}{\mathrm{true}} \newcommand{\dist}{D} \newcommand{\Mtheo}[1]{\mathcal{M}_{#1}} % Model (theoretical model); index: param set \newcommand{\DL}[1][L]{\mathcal{D}^{(#1)}} % Data (RV or generic) \newcommand{\DLp}[1][L]{\mathcal{D}^{(#1')}} % Data (RV or generic) \newcommand{\DtL}[1][L]{\tilde{\mathcal{D}}^{(#1)}} % Data (RV or generic) \newcommand{\DpL}[1][L]{{\mathcal{D}'}^{(#1)}} % Data (RV or generic) \newcommand{\Dobs}[1][]{\mathcal{D}_{\mathrm{obs}}^{#1}} % Data (observed) \newcommand{\calibset}{\mathcal{C}} \newcommand{\N}{\mathcal{N}} % Normal distribution \newcommand{\Z}{\mathcal{Z}} % Partition function \newcommand{\VV}{\mathbb{V}} % Variance \newcommand{\T}{\mathsf{T}} % Transpose \newcommand{\EMD}{\mathrm{EMD}} \newcommand{\dEMD}{d_{\mathrm{EMD}}} \newcommand{\dEMDtilde}{\tilde{d}_{\mathrm{EMD}}} \newcommand{\dEMDsafe}{d_{\mathrm{EMD}}^{\text{(safe)}}} \newcommand{\e}{ε} % Model confusion threshold \newcommand{\falsifythreshold}{ε} \newcommand{\bayes}[1][]{B_{#1}} \newcommand{\bayesthresh}[1][]{B_{0}} \newcommand{\bayesm}[1][]{B^{\mathcal{M}}_{#1}} \newcommand{\bayesl}[1][]{B^l_{#1}} \newcommand{\bayesphys}[1][]{B^{{p}}_{#1}} \newcommand{\Bconf}[1]{B^{\mathrm{epis}}_{#1}} \newcommand{\Bemd}[1]{B^{\mathrm{EMD}}_{#1}} \newcommand{\BQ}[1]{B^{Q}_{#1}} \newcommand{\Bconfbin}[1][]{\bar{B}^{\mathrm{conf}}_{#1}} \newcommand{\Bemdbin}[1][]{\bar{B}_{#1}^{\mathrm{EMD}}} \newcommand{\bin}{\mathcal{B}} \newcommand{\Bconft}[1][]{\tilde{B}^{\mathrm{conf}}_{#1}} \newcommand{\fc}{f_c} \newcommand{\fcbin}{\bar{f}_c} \newcommand{\paramphys}[1][]{Θ^{{p}}_{#1}} \newcommand{\paramobs}[1][]{Θ^{ε}_{#1}} \newcommand{\test}{\mathrm{test}} \newcommand{\train}{\mathrm{train}} \newcommand{\synth}{\mathrm{synth}} \newcommand{\rep}{\mathrm{rep}} \newcommand{\MNtrue}{\mathcal{M}^{{p}}_{\text{true}}} \newcommand{\MN}[1][]{\mathcal{M}^{{p}}_{#1}} \newcommand{\MNA}{\mathcal{M}^{{p}}_{Θ_A}} \newcommand{\MNB}{\mathcal{M}^{{p}}_{Θ_B}} \newcommand{\Me}[1][]{\mathcal{M}^ε_{#1}} \newcommand{\Metrue}{\mathcal{M}^ε_{\text{true}}} \newcommand{\Meobs}{\mathcal{M}^ε_{\text{obs}}} \newcommand{\Meh}[1][]{\hat{\mathcal{M}}^ε_{#1}} \newcommand{\MNa}{\mathcal{M}^{\mathcal{N}}_a} \newcommand{\MeA}{\mathcal{M}^ε_A} \newcommand{\MeB}{\mathcal{M}^ε_B} \newcommand{\Ms}{\mathcal{M}^*} \newcommand{\MsA}{\mathcal{M}^*_A} \newcommand{\MsB}{\mathcal{M}^*_B} \newcommand{\Msa}{\mathcal{M}^*_a} \newcommand{\MsAz}{\mathcal{M}^*_{A,z}} \newcommand{\MsBz}{\mathcal{M}^*_{B,z}} \newcommand{\Msaz}{\mathcal{M}^*_{a,z}} \newcommand{\MeAz}{\mathcal{M}^ε_{A,z}} \newcommand{\MeBz}{\mathcal{M}^ε_{B,z}} \newcommand{\Meaz}{\mathcal{M}^ε_{a,z}} \newcommand{\zo}{z^{0}} \renewcommand{\lL}[2][]{\mathcal{L}_{#1|{#2}}} % likelihood \newcommand{\Lavg}[2][]{\mathcal{L}^{/#2}_{#1}} % Geometric average of likelihood \newcommand{\lLphys}[2][]{\mathcal{L}^{{p}}_{#1|#2}} \newcommand{\Lavgphys}[2][]{\mathcal{L}^{{p}/#2}_{#1}} % Geometric average of likelihood \newcommand{\lLL}[3][]{\mathcal{L}^{(#3)}_{#1|#2}} \newcommand{\lLphysL}[3][]{\mathcal{L}^{{p},(#3)}_{#1|#2}} \newcommand{\lnL}[2][]{l_{#1|#2}} % Per-sample log likelihood \newcommand{\lnLt}[2][]{\widetilde{l}_{#1|#2}} \newcommand{\lnLtt}{\widetilde{l}} % Used only in path_sampling \newcommand{\lnLh}[1][]{\hat{l}_{#1}} \newcommand{\lnLphys}[2][]{l^{{p}}_{#1|#2}} \newcommand{\lnLphysL}[3][]{l^{{p},(#3)}_{#1|#2}} \newcommand{\Elmu}[2][1]{μ_{{#2}}^{(#1)}} \newcommand{\Elmuh}[2][1]{\hat{μ}_{{#2}}^{(#1)}} \newcommand{\Elsig}[2][1]{Σ_{{#2}}^{(#1)}} \newcommand{\Elsigh}[2][1]{\hat{Σ}_{{#2}}^{(#1)}} \newcommand{\pathP}{\mathop{{p}}} % Path-sampling process (generic) \newcommand{\pathPhb}{\mathop{{p}}_{\mathrm{Beta}}} % Path-sampling process (hierarchical beta) \newcommand{\interval}{\mathcal{I}} \newcommand{\Phiset}[1]{\{\Phi\}^{\small (#1)}} \newcommand{\Phipart}[1]{\{\mathcal{I}_Φ\}^{\small (#1)}} \newcommand{\qhset}[1]{\{\qh\}^{\small (#1)}} \newcommand{\Dqpart}[1]{\{Δ\qh_{2^{#1}}\}} \newcommand{\LsAzl}{\mathcal{L}_{\smash{{}^{\,*}_A},z,L}} \newcommand{\LsBzl}{\mathcal{L}_{\smash{{}^{\,*}_B},z,L}} \newcommand{\lsA}{l_{\smash{{}^{\,*}_A}}} \newcommand{\lsB}{l_{\smash{{}^{\,*}_B}}} \newcommand{\lsAz}{l_{\smash{{}^{\,*}_A},z}} \newcommand{\lsAzj}{l_{\smash{{}^{\,*}_A},z_j}} \newcommand{\lsAzo}{l_{\smash{{}^{\,*}_A},z^0}} \newcommand{\leAz}{l_{\smash{{}^{\,ε}_A},z}} \newcommand{\lsAez}{l_{\smash{{}^{*ε}_A},z}} \newcommand{\lsBz}{l_{\smash{{}^{\,*}_B},z}} \newcommand{\lsBzj}{l_{\smash{{}^{\,*}_B},z_j}} \newcommand{\lsBzo}{l_{\smash{{}^{\,*}_B},z^0}} \newcommand{\leBz}{l_{\smash{{}^{\,ε}_B},z}} \newcommand{\lsBez}{l_{\smash{{}^{*ε}_B},z}} \newcommand{\LaszL}{\mathcal{L}_{\smash{{}^{*}_a},z,L}} \newcommand{\lasz}{l_{\smash{{}^{*}_a},z}} \newcommand{\laszj}{l_{\smash{{}^{*}_a},z_j}} \newcommand{\laszo}{l_{\smash{{}^{*}_a},z^0}} \newcommand{\laez}{l_{\smash{{}^{ε}_a},z}} \newcommand{\lasez}{l_{\smash{{}^{*ε}_a},z}} \newcommand{\lhatasz}{\hat{l}_{\smash{{}^{*}_a},z}} \newcommand{\pasz}{p_{\smash{{}^{*}_a},z}} \newcommand{\paez}{p_{\smash{{}^{ε}_a},z}} \newcommand{\pasez}{p_{\smash{{}^{*ε}_a},z}} \newcommand{\phatsaz}{\hat{p}_{\smash{{}^{*}_a},z}} \newcommand{\phateaz}{\hat{p}_{\smash{{}^{ε}_a},z}} \newcommand{\phatseaz}{\hat{p}_{\smash{{}^{*ε}_a},z}} \newcommand{\Phil}[2][]{Φ_{#1|#2}} % Φ_{\la} \newcommand{\Philt}[2][]{\widetilde{Φ}_{#1|#2}} % Φ_{\la} \newcommand{\Philhat}[2][]{\hat{Φ}_{#1|#2}} % Φ_{\la} \newcommand{\Philsaz}{Φ_{\smash{{}^{*}_a},z}} % Φ_{\lasz} \newcommand{\Phileaz}{Φ_{\smash{{}^{ε}_a},z}} % Φ_{\laez} \newcommand{\Philseaz}{Φ_{\smash{{}^{*ε}_a},z}} % Φ_{\lasez} \newcommand{\mus}[1][1]{μ^{(#1)}_*} \newcommand{\musA}[1][1]{μ^{(#1)}_{\smash{{}^{\,*}_A}}} \newcommand{\SigsA}[1][1]{Σ^{(#1)}_{\smash{{}^{\,*}_A}}} \newcommand{\musB}[1][1]{μ^{(#1)}_{\smash{{}^{\,*}_B}}} \newcommand{\SigsB}[1][1]{Σ^{(#1)}_{\smash{{}^{\,*}_B}}} \newcommand{\musa}[1][1]{μ^{(#1)}_{\smash{{}^{*}_a}}} \newcommand{\Sigsa}[1][1]{Σ^{(#1)}_{\smash{{}^{*}_a}}} \newcommand{\Msah}{{\color{highlight-red}\mathcal{M}^{*}_a}} \newcommand{\Msazh}{{\color{highlight-red}\mathcal{M}^{*}_{a,z}}} \newcommand{\Meah}{{\color{highlight-blue}\mathcal{M}^{ε}_a}} \newcommand{\Meazh}{{\color{highlight-blue}\mathcal{M}^{ε}_{a,z}}} \newcommand{\lsazh}{{\color{highlight-red}l_{\smash{{}^{*}_a},z}}} \newcommand{\leazh}{{\color{highlight-blue}l_{\smash{{}^{ε}_a},z}}} \newcommand{\lseazh}{{\color{highlight-orange}l_{\smash{{}^{*ε}_a},z}}} \newcommand{\Philsazh}{{\color{highlight-red}Φ_{\smash{{}^{*}_a},z}}} % Φ_{\lasz} \newcommand{\Phileazh}{{\color{highlight-blue}Φ_{\smash{{}^{ε}_a},z}}} % Φ_{\laez} \newcommand{\Philseazh}{{\color{highlight-orange}Φ_{\smash{{}^{*ε}_a},z}}} % Φ_{\lasez} \newcommand{\emdstd}{\tilde{σ}} \DeclareMathOperator{\Mvar}{Mvar} \DeclareMathOperator{\AIC}{AIC} \DeclareMathOperator{\epll}{epll} \DeclareMathOperator{\elpd}{elpd} \DeclareMathOperator{\MDL}{MDL} \DeclareMathOperator{\comp}{COMP} \DeclareMathOperator{\Lognorm}{Lognorm} \DeclareMathOperator{\erf}{erf} \DeclareMathOperator*{\argmax}{arg\,max} \DeclareMathOperator{\Image}{Image} \DeclareMathOperator{\sgn}{sgn} \DeclareMathOperator{\SE}{SE} % standard error \DeclareMathOperator{\Unif}{Unif} \DeclareMathOperator{\Poisson}{Poisson} \DeclareMathOperator{\SkewNormal}{SkewNormal} \DeclareMathOperator{\TruncNormal}{TruncNormal} \DeclareMathOperator{\Exponential}{Exponential} \DeclareMathOperator{\exGaussian}{exGaussian} \DeclareMathOperator{\IG}{IG} \DeclareMathOperator{\NIG}{NIG} \DeclareMathOperator{\Gammadist}{Gamma} \DeclareMathOperator{\Lognormal}{Lognormal} \DeclareMathOperator{\Beta}{Beta} \newcommand{\sinf}{{s_{\infty}}}\)

4.4. Calibration experiments#

As described in our Results, the goal of calibration is twofold. First we want to correlate the value of \(\Bemd{AB;c}\) with the probability \(\Bconf{AB;Ω}\) that, across variations of the data-generating process \(\Mtrue\), model \(A\) has lower risk than model \(B\). We use an epistemic distribution \(Ω\) to describe those variations. Second, we want to ensure that a decision to reject either model based on \(\Bemd{AB}\) is robust: it should hold for any reasonable epistemic distribution \(Ω\) of experimental conditions (and therefore hopefully also for unanticipated experimental variations).

The calibration procedure involves fixing two candidate models and an epistemic distribution \(Ω\). We then sample multiple data-generating models \(\Mtrue\) from \(Ω\), generate a replicate dataset \(\D_\test^\rep\) from each, and compute both \(\Bemd{}\) and \(\Bconf{}\) on that replicate dataset.

This process can be repeated for as many epistemic distributions and as many different pairs of candidate models as desired, until we are sufficiently confident in the robustness of \(\Bemd{}\). This means in particular that we cannot expect perfect correlation between \(\Bemd{}\) and \(\Bconf{}\) across all models and conditions.

However we don’t actually require perfect correlation because this is an asymmetrical test: a decision to reject neither model is largely preferred over a decision to reject the wrong one. Therefore what we especially want to ensure is that for any pair of models \(A\), \(B\), and any reasonable epistemic distribution \(Ω\), we have

\[\Bigl\lvert \Bemd{AB;c} - 0.5 \Bigr\rvert \lesssim \Bigl\lvert \Bconf{AB;Ω} - 0.5 \Bigr\rvert \,.\]

This is given as either Eq. 2.30 or Eq. 2.31 in the Results.

Note that \(\Bconf{AB;Ω}\) is defined as the fraction of experiments (i.e. data-generating models \(\Mtrue\)) for which \(R_A\) is smaller than \(R_B\). Therefore it cannot actually be calculated for a single experiment; only over sets of experiments. Since our goal is to identify a correlation between \(\Bconf{AB;Ω}\) and \(\Bemd{AB;c}\), we compute the former as a function of the latter:

(4.4)#\[\begin{split}\begin{aligned} \Bconf{AB;Ω}\Bigl(\Bemd{AB;c}\Bigr) &\coloneqq P \Bigl(R_A < R_B \,\Bigm|\, Ω, \Bemd{AB;c}\Bigr) \\ &~= P_{\Mtrue \sim Ω | \Bemd{AB;c}} \Bigl(R_A < R_B\Bigr) \\ &~= \EE_{\Mtrue \sim Ω | \Bemd{AB;c}} \Bigl[R_A < R_B\Bigr] \,. \end{aligned}\end{split}\]

(On the last line, \([\;]\) is the Iverson bracket, which is 1 when the clause it contains is true, and 0 otherwise.) In other words, we partition the experiments in the domain of \(Ω\) into subsets having the same value \(\Bemd{AB;c}\), then compute \(\Bconf{AB;Ω}\) over each subset. In practice we do this by binning the experiments according to \(\Bemd{AB;c}\), as we explain below.

4.4.1. Calibration and computation of \(B\)epis for the black body radiation models#

For the black body models, we consider one epistemic distribution \(Ω\) with two sources of experimental variability: the true physical model (\(\Bspec_{\mathrm{P}}\) or \(\Bspec_{\mathrm{RJ}}\)) and the bias \(\Bspec_0\). All other parameters are fixed to the standard values used throughout the text ( \(s=10^{5}\, {\mathrm{m}^{2} \cdot \mathrm{nm} \cdot \mathrm{photons} \cdot \mathrm{sr}} / {\mathrm{kW}}\), \(T=4000 \mathrm{K}\) ).

To generate each replicate dataset \(\D_\test^\rep\), we select

(4.5)#\[\begin{align} \Bspec_{\mathrm{phys}} &\sim \Unif\Bigl(\Bigl\{\Bspec_{\mathrm{P}}, \Bspec_{\mathrm{RJ}}\Bigr\}\Bigr) \\ \Bspec_0 &\sim \Unif\Bigl( \bigl[-10^{-4}, 10^{-4}\bigr] \Bigr) \cdot {\mathrm{m}^{2} \cdot \mathrm{nm} \cdot \mathrm{photons} \cdot \mathrm{sr}} / {\mathrm{kW}} \end{align}\]

We then generate 4096 data points for each dataset using the Poisson observation model of Eq. 2.34, substituting \(\Bspec_{\mathrm{phys}}\) for \(\Bspec_{\mathrm{P}}\).

As in the main text, both candidate models assume Gaussian noise, so for each replicate dataset and each candidate model we fit the standard deviation of that noise to the observation data by maximum likelihood.

We then compute \(\Bemd{\mathrm{P},\mathrm{RJ};c} \in [0, 1]\) for each value of \(c\) being tested. We also compute a binary variable — represented again with the Iverson bracket — which is 1 if the true risk of \(\M_\mathrm{P}\) is lower risk than that of \(\M_\mathrm{RJ}\), and 0 otherwise:

(4.6)#\[\begin{equation} \bigl[R_{\mathrm{P}} < R_{\mathrm{RJ}}\bigr] := \begin{cases} 1 & \text{if } R_{\mathrm{P}} < R_{\mathrm{RJ}} \,, \\ 0 & \text{otherwise.} \end{cases} \end{equation}\]

This produces a long list of \(\Bigl(\Bemd{\mathrm{P},\mathrm{RJ};c}, \bigl[R_{\mathrm{P}} < R_{\mathrm{RJ}}\bigr] \Bigr)\) pairs, one for each dataset \(\D_\test^\rep\). Formally, we can then compute \(\Bconf{\mathrm{P},\mathrm{RJ};Ω} \bigl(\Bemd{\mathrm{P},\mathrm{RJ};c})\) as per Eq. 4.4: by keeping only those pairs with matching \(\Bemd{\mathrm{P},\mathrm{RJ};c}\) value and averaging \(B^\infty_{\mathrm{P},\mathrm{RJ}}\). In practice, all \(\Bemd{\mathrm{P},\mathrm{RJ};c}\) values will be slightly different, so we bin neighbouring values together into a histogram. To make best use of our statistical power, we assign the same number of pairs to each bin, which means that bins have different widths:

Procedure for calibration histograms
  1. Sort all \(\bigl(\Bemd{\mathrm{P},\mathrm{RJ};c}, [R_{\mathrm{P}} < R_{\mathrm{RJ}}] \bigr)\) pairs according to \(\Bemd{\mathrm{P},\mathrm{RJ};c}\).

  2. Bin the experiments by combining those with similar values of \(\Bemd{\mathrm{P},\mathrm{RJ};c}\). Each bin should contain a similar number of experiments, so that they have similar statistical power.

  3. Within each bin, average the values of \(\Bemd{\mathrm{P},\mathrm{RJ};c}\) and \([R_{\mathrm{P}} < R_{\mathrm{RJ}}]\), to obtain a single pair \(\bigl(\bar{B}^{\mathrm{EMD}}_{\mathrm{P},\mathrm{RJ};c}, \bar{B}^{\mathrm{epis}}_{\mathrm{P},\mathrm{RJ}}\bigr)\). These are the values we plot as calibration curves.

The unequal bin widths are clearly visible in Fig. 4.1, where points indicate the \(\bigl(\bar{B}^{\mathrm{EMD}}_{\mathrm{P},\mathrm{RJ};c}, \bar{B}^{\mathrm{epis}}_{\mathrm{P},\mathrm{RJ}}\bigr)\) pair corresponding to each bin. Larger gaps between points are indicative of larger bins, which result from fewer experiments having \(\Bemd{}\) values close to \(\bar{B}^{\mathrm{EMD}}\).

../../_images/uv_calibration_effect-c_spaced.svg

Fig. 4.1 Calibration plots for the \(\Bemd{\mathrm{P},\mathrm{RJ}}\) criterion comparing Planck and Rayleigh-Jeans models. The value of \(c\) increases from left to right and dark to bright. Each curve consists of 4096 experiments aggregated into 128 bins. Wide gaps between points indicate that few experiments produced similar values of \(\Bemd{}\); this is especially salient in the left panel, where almost all experiments have either \(\Bemd{} = 0\) or \(\Bemd{} = 1\). [source]#

Fig. 4.1 is also a particularly clear illustration of the effect of the sensitivity factor \(c\). When \(c\) is too small (\(c \leq 2^{-9}\)), \(R\)-distributions almost never overlap, and the \(\Bemd{\mathrm{P},\mathrm{RJ};c}\) is mostly either 0 or 1: the criterion is overconfident, with many points located in the red regions. As \(c\) increases (\(2^{-6} \leq c \leq 2^{3}\)), \(\Bemd{\mathrm{P},\mathrm{RJ};c}\) values become more evenly distributed over the entire \([0, 1]\) interval and either don’t encroach at all into the overconfident regions, or do so sparingly. These curves show a nearly monotone relation between \(\Bemd{\mathrm{P},\mathrm{RJ};c}\) and \(\Bconf{\mathrm{P},\mathrm{RJ};Ω}\), indicating that the former is strongly predictive of the latter; the corresponding \(c\) values would likely be suitable for model comparison. In contrast, when \(c\) is too large (\(c \geq 2^6\)), \(\Bemd{}\) becomes decorrelated from \(\Bconf{}\): all points are near \(\Bconf{} \approx 0.5\), indicating that the \(\Bemd{\mathrm{P},\mathrm{RJ};c}\) for such large values of \(c\) would not be a useful comparison criterion.

4.4.2. Calibration for the neural response models#

For the neuron model, we consider four sources of experimental variability: variations in the distribution used to model observation noise \(ξ\), variations in the strength (\(σ_o\)) of observation noise, as well as variations in the strength (\(σ_i\)) and correlation time (\(τ\)) of the external input \(I_{\mathrm{ext}}\). Each epistemic distribution \(Ω\) is therefore described by four distributions over hyperparameters:

Observation noise model

One of Gaussian or Cauchy. The distributions are centered and \(σ_o\) is drawn from the distribution defined below. Note that this is the model used to generate a dataset \(\D_\test^\rep\). The candidate models always evaluate their loss assuming a Gaussian observation model.

\[\begin{split}\begin{align*} &\qquad\text{\textit{Gaussian}} & &\qquad\text{\textit{Cauchy}} \\ p(ξ) &= \frac{1}{\sqrt{2πσ}} \exp\left(-\frac{ξ^2}{2σ_o^2}\right) & \; p(ξ) &= \frac{2}{πσ \left[ 1 + \left( \frac{ξ^2}{σ_o/2} \right) \right]} \end{align*}\end{split}\]

Observation noise strength \(σ_o\)

\[\begin{split}\begin{aligned} \text{\textit{Low noise:}} && \log σ_o &\sim \nN(0.0 \mathrm{mV}, (0.5 \mathrm{mV})^2) \\ \text{\textit{High noise:}} && \log σ_o &\sim \nN(1.0 \mathrm{mV}, (0.5 \mathrm{mV})^2) \end{aligned}\end{split}\]

External input strength \(σ_i\)

The parameter \(σ_i\) sets the strength of the input noise such that \(\langle{I_{\mathrm{ext}}^2\rangle} = σ_i^2\).

\[\begin{split}\begin{aligned} \text{\textit{Weak input:}} && \log σ_i &\sim \nN(-15.0 \mathrm{mV}, (0.5 \mathrm{mV})^2) \\ \text{\textit{Strong input:}} && \log σ_i &\sim \nN(-10.0 \mathrm{mV}, (0.5 \mathrm{mV})^2) \end{aligned}\end{split}\]

External input correlation time \(τ\)

The parameter \(τ\) sets the correlation time of the input noise such that \(\langle{I_{\mathrm{ext}}(t)I_{\mathrm{ext}}(t+s)\rangle} = σ_i^2 e^{-s^2/2τ^2}\).

\[\begin{split}\begin{aligned} \text{\textit{Short correlation:}} && \log_{10} τ &\sim \Unif([0.1 \mathrm{ms}, 0.2 \mathrm{ms}]) \\ \text{\textit{Long correlation:}} && \log_{10} τ &\sim \Unif([1.0 \mathrm{ms}, 2.0 \mathrm{ms}]) \end{aligned}\end{split}\]

We thus defined 2 statistical distributions for \(ξ\), 2 distributions for \(σ_o\), 2 distributions for \(τ\), and 2 distributions for \(σ_i\). Combined with three model pairs (see Fig. 2.5), this makes a total of \(48 = 3 \times 2 \times 2 \times 2 \times 2\) possible epistemic distributions \(Ω\), each of which can be identified by a tuple such as \(\bigl(\M_A \,\mathtt{vs}\, \M_B\), \(\mathtt{Cauchy}\), \(\mathtt{Low}\,\mathtt{noise}\), \(\mathtt{Strong}\,\mathtt{input}\), \(\mathtt{Long}\,\mathtt{correlation}\bigr)\). Calibration results for each of these conditions are given in Fig. 6.1.

During calibration against an epistemic distribution \(Ω\), drawing a dataset \(\D_\test^\rep\) happens in two steps. First, we randomly draw a vector \(ω\) of epistemic parameters (i.e. hyperparameters) from \(Ω\); for example \(ω = (\underbrace{-0.27 \mathrm{mV}}_{σ_o}, \underbrace{47 \mathrm{ms}}_{τ}, \underbrace{0.003 \mathrm{mV}}_{σ_i})\). Second, we use those parameters to generate the dataset \(\D_\test^\rep\), composed in this case of data points \((t, \tilde{V}^{\LP{}}(t))\). We can then evaluate the loss \(Q\) (given by Eq. 2.6) on those data points. Note that the loss does not depend on the epistemic parameters \(ω\) directly, but in general will involve parameters which are fitted to the simulated data. In short, the vector \(ω\) describes the parameters of a simulated experiment, while \(\D_\test^\rep\) describes a particular outcome of that experiment. In theory we could generate multiple datasets with the same vector \(ω\), but in practice it is more statistically efficient to draw a new experiment for each new dataset.

When choosing epistemic distributions, it is worth remembering that the goal of calibration is to empirically approximate a probability over experimental conditions. Thus choosing a distribution which can generate a large number of conditions — ideally an infinite number — will lead to better estimates. Here the use of continuous distributions for \(σ_o\), \(σ_i\) and \(τ\), and the fact that \(I_{\mathrm{ext}}\) is a continuous process, helps us achieve this goal. Also important is to generate data where the model selection problem is ambiguous, so as to resolve calibration curves around \(\Bemd{} = 0.5\).

This calibration is an imperfect procedure, and how well it works depends on the quality of the candidate models and the choice of loss function. Here for example, the choice of a pointwise loss makes it sensitive to the timing of spikes; this tends to favour models which produce fewer spikes, since the penalty on a mistimed spike is high. This is why in Fig. 2.5, in the \(\M_A \,\mathtt{vs}\, \M_D\) comparison, we see a floor on the values of \(\Bconf{AD}\). In short, some models have consistently lower loss even on random data, and so their risk — which is the expectation of their loss — is a priori lower. The bias we see in the \(\M_A \,\mathtt{vs}\, \M_B\) comparison is likely due to a similar effect. (In this case because \(\M_B\) is slightly better than \(\M_A\) on average.)