This paper aimes to explain what models are implemented in this package. To do so, it suffices to show the trial from which data arise and its likelihoods and *R*-codes which is very simple as follows.

In the following, we use the conventional likelihood notation;

\[ f(y|\theta), \]

where \(\LARGE{\color{red}{y}}\) denotes data, \(\LARGE{\color{blue}{\theta}}\) a model parameter.

First of all, we introduce a trial from which we obtain a dataset to be fitted a model defined lator.

Suppose that there are a *physician*, *radiographs* and a *data*-*analyst.* In the trial, the physician localizes shadows in radiographs if and only if he thinks shadows are lesions. In addition, the physician assigns to each localized shadow a *confidence* *level* which is a number as the following table.

Confidence Level | meaning |
---|---|

3 | definitely present |

2 | subtle |

1 | questionable |

E.g., if the physician locarized a shadow with his highest confidence, then he would assign the number 3 to the locarized shodow, which means that, in his point of view, the shadow is definitely a lesion.

After the physician’s lesion finding task, the data-analyst evaluates physician’s localizations, i.e. the data-analyst counts physician’s truly localized positions (hits) and incorrect guy (false alarms) for each confidence level. Here, we assume that data-analysist knows the truth (Gold-Standard of Diagnosis). Consequently, we will obtain the following table indicating the physician’s regonition performance ability.

Confidence Level | Number of Hits | Number of False alarms |
---|---|---|

3 = definitely present | \(H_{3}\) | \(F_{3}\) |

2 = subtle | \(H_{2}\) | \(F_{2}\) |

1 = questionable | \(H_{1}\) | \(F_{1}\) |

In this table, each component \(H_c\) and \(F_c\) are non negative integer valued random variables satisfying \(\Sigma_{c=1,2,3} H_c \leq N_L\). For example, \(H_{3}\) denotes the number of hit with reader’s confidence level \(3^\text{rd}\).

So, in conventional notation we may write

\[\large{\color{red}{y}} = (H_0,H_1,H_2,H_3;F_1,F_2,F_3 ;N_L,N_I),\]

where we set \(H_\color{red}{0}:=N_L-(H_1+H_2+H_3).\) and \(N_L\) denotes the number of lesions among all radiographs and \(N_I\) the number of radiographs.

In the next section, we provide the probability law of these random variables \(H_c\) and \(F_c\) as the image measure or push-forward measure, namely, likelihood function.

For model of *1* reader, *1* modalitie and *3* confidence levels.

Define the model by \[ \{H_{c};c=\color{red}{0},1,2,\cdots C\} \sim \color{green}{\text{Multinomial}}(\{p_{c,}(\theta)\}_{c=\color{red}{0},1,2,\cdots C}),\\ F_{c} \sim \text{Poisson}(q_c(\theta)N_I),\\ \]

where

\[ p_{c}(\theta) := \int_{\theta_c}^{\theta_{c+1}}\text{Gaussian}_{}(x|\mu,\sigma)dx,\\ q_c(\theta) := \int_{\theta_c}^{\theta_{c+1}} \biggl( \frac{d}{dz} \biggr) \log \Phi(z)dz. \]

from which, we can calculate the most important characteristic indicating the observer recoginition performance ability as

\[ AUC := \Phi (\frac{\mu/\sigma}{\sqrt{(1/\sigma)^2+1}}), \\ \]

Note that model parameter is \(\LARGE{\color{blue}{\theta}}\) \(= (\theta_1,\theta_2,\theta_3,...\theta_C;\mu,\sigma)\) which should be estimated and \(\Phi\) denotes the cumulative distribution functions of the canonical Gaussian. Note that \(\theta_{C+1} = \infty\) and \(\theta_{0} = -\infty\).

The definition of the rates \(p,q\) are based on the logic of latent panty theory, so you know what is latent, I am exhausted! because I am a xxxxx! Aches, MCS aches!

\[ dz_c := z_{c+1}-z_{c},\\ dz_c, \sigma \sim \text{Uniform}(0,3),\\ z_{c} \sim \text{Uniform}( -3,3),\\ \]

```
# > viewdata(d)
# * Number of Lesions: 259
# * Number of Images : 57
# . Confidence.Level False.Positives True.Positives
# ------------------- ----------------- ---------------- ---------------
# Obviouly present 3 1 97
# Relatively obvious 2 14 32
# Subtle 1 74 31
fit_a_model_to(d) # Here, a model is fitted to the data "d"
# Or GUI by Shiny
fit_GUI_Shiny()
```

\(( \dot{} \omega \dot{})\) Love love me do! \(('; \omega ;)\) You know I love you \(\star\)

I always …

What is the difference between FROC and ROC? In ROC case, both of hits \(H_c\) and false alarms \(F_c\) have upper bounds, namely, there are \(N_H,N_F\) such that

\(\Sigma H_c \leq N_H\) and \(\Sigma F_c \leq N_F\), On the other hand, in FROC case, \(F_c\) dose not have any upper bounds.

If we assume such an upper bound for \(F_c\), say \(N_F\), then, by putting \(F_0 := N_F - \Sigma F_c\), the following ROC model is available.

\[ \{H_{c};c=\color{red}{0},1,2,\cdots C\} \sim \color{green}{\text{Multinomial}}(\{p_{c,}(\theta)\}_{c=\color{red}{0},1,2,\cdots C}),\\ \{F_{c};c=\color{red}{0},1,2,\cdots C\} \sim \color{green}{\text{Multinomial}}(\{q_{c,}(\theta)\}_{c=\color{red}{0},1,2,\cdots C}),\\ \]

where

\[ p_{c}(\theta) := \int_{\theta_c}^{\theta_{c+1}}\text{Gaussian}_{}(x|\mu,\sigma)dx,\\ q_{c}(\theta) := \int_{\theta_c}^{\theta_{c+1}}\text{Gaussian}_{}(x| 0,1)dx.\\ \]

Unfortunately, it is not so good according to the following paper, which suggests that reductions of parmater makes MCMC to be more stable. E.g., by regarding \(\mu\) and \(\sigma\) as functions of parmaeter \(\theta\), i.e., \(\mu=\mu(\theta), \sigma=\sigma(\theta)\), we can reduce the parameter. Aww yeah, I guess so, … but if you ask me, first of all, I will try to implement the non convergent models.

**Reference** *Bayesian analysis of a ROC curve for categorical data using a skew-binormal model; Balgobin Nandram and Thelge Buddika Peiris (This paper is nice!); 2018 volume 11 369-384; Statistics and its interface*

If the cute author could have a stable life and address, then I would implement it in this pkg. But you know, now, the author’s life is painful in many face, healthy, life work, etc.

**Reference** As another reference of ROC or FROC, see page 109 equation (6.3.5) of the book: *Chakraborty, Dev P - Observer performance methods for diagnostic imaging * foundations, modeling, and applications with R-based examples-CRC Press (2017)__

we consider the comparison of imaging modality, such as MRI, CT, PET, …

This model also can be interpreted as a subject-specific random effect model if we forget the modality and use it a image instead.

I implement this because Bayesian is suitable for including individual differences,…. the author think this is a venefit of Bayesian ….

*2* readers, *2* modalities and *3* confidence levels.

Confidence Level | Modality ID | Reader ID | Number of Hits | Number of False alarms |
---|---|---|---|---|

3 = definitely present | 1= MRI | 1= Bob | \(H_{3,1,1}\) | \(F_{3,1,1}\) |

2 = equivocal | 1= MRI | 1= Bob | \(H_{2,1,1}\) | \(F_{2,1,1}\) |

1 = questionable | 1= MRI | 1= Bob | \(H_{1,1,1}\) | \(F_{1,1,1}\) |

3 = definitely present | 1= MRI | 2=Alice | \(H_{3,1,2}\) | \(F_{3,1,2}\) |

2 = equivocal | 1= MRI | 2=Alice | \(H_{2,1,2}\) | \(F_{2,1,2}\) |

1 = questionable | 1= MRI | 2=Alice | \(H_{1,1,2}\) | \(F_{1,1,2}\) |

3 = definitely present | 2= CT | 1= Bob | \(H_{3,2,1}\) | \(F_{3,2,1}\) |

2 = equivocal | 2= CT | 1= Bob | \(H_{2,2,1}\) | \(F_{2,2,1}\) |

1 = questionable | 2= CT | 1= Bob | \(H_{1,2,1}\) | \(F_{1,2,1}\) |

3 = definitely present | 2= CT | 2=Alice | \(H_{3,2,2}\) | \(F_{3,2,2}\) |

2 = equivocal | 2= CT | 2=Alice | \(H_{2,2,2}\) | \(F_{2,2,2}\) |

1 = questionable | 2= CT | 2=Alice | \(H_{1,2,2}\) | \(F_{1,2,2}\) |

where, each component \(H\) and \(F\) are non negative integers. By the multi-index notation, for example, \(H_{3,2,1}\) denotes the number of hit of the \(1^\text{st}\) reader over all images taken by \(2^\text{nd}\) modality with reader’s confidence level is \(3^\text{rd}\).

So, in conventional notation we may write

\[y = (H_{c,m,r},F_{c,m,r} ;N_L,N_I).\]

Note that the above data format is available in case of single modality and multiple readers. Then, in this case, the modality corresponds to a subject-specific random effect. Thus, reinterpreting this model, we also introduce, in FROC models, subject-specific random variables for heterogeneous hit and false rates among images (radiographs). You know, I am a patient, and tired, no want to write this. Anyway, in this context, it would be

*2* readers, *2* radiographs (too small haha) and *3* confidence levels.

Confidence Level | radiograph ID | Reader ID | Number of Hits | Number of False alarms |
---|---|---|---|---|

3 = definitely present | 1 | 1 | \(H_{3,1,1}\) | \(F_{3,1,1}\) |

2 = equivocal | 1 | 1 | \(H_{2,1,1}\) | \(F_{2,1,1}\) |

1 = questionable | 1 | 1 | \(H_{1,1,1}\) | \(F_{1,1,1}\) |

3 = definitely present | 1 | 2 | \(H_{3,1,2}\) | \(F_{3,1,2}\) |

2 = equivocal | 1 | 2 | \(H_{2,1,2}\) | \(F_{2,1,2}\) |

1 = questionable | 1 | 2 | \(H_{1,1,2}\) | \(F_{1,1,2}\) |

3 = definitely present | 2 | 1 | \(H_{3,2,1}\) | \(F_{3,2,1}\) |

2 = equivocal | 2 | 1 | \(H_{2,2,1}\) | \(F_{2,2,1}\) |

1 = questionable | 2 | 1 | \(H_{1,2,1}\) | \(F_{1,2,1}\) |

3 = definitely present | 2 | 2 | \(H_{3,2,2}\) | \(F_{3,2,2}\) |

2 = equivocal | 2 | 2 | \(H_{2,2,2}\) | \(F_{2,2,2}\) |

1 = questionable | 2 | 2 | \(H_{1,2,2}\) | \(F_{1,2,2}\) |

The following model with multinomal is not implemeted yet, cuz the author is tired by MCS disease! Ha,,, above me only sky! This package dose not help me! :’-D

\[ \{H_{c,m,r};c=1,2,\cdots C\} \sim \text{Multinomial}(\{p_{c,m,r}(\theta);c=1,2,\cdots C\},N_L),\\ F_{c,m,r} \sim \text{Poisson}(q_c(\theta)).\\ \]

\[ p_{c,m,r}(\theta) := \int_{\theta_c}^{\theta_{c+1}}\text{Gaussian}_{}(x|\mu_{m,r},\sigma_{m,r})dx,\\ q_c(\theta) := \int_{\theta_c}^{\theta_{c+1}} \frac{d \log \Phi(z)}{dz}dz. \]

\[ A_{m,r} := \Phi (\frac{\mu_{m,r}/\sigma_{m,r}}{\sqrt{(1/\sigma_{m,r})^2+1}}), \\ A_{m,r} \sim \text{Normal} (A_{m},\sigma_{r}^2), \\ \]

where model parameter is is \(\LARGE{\color{blue}{\theta}}\) \(= (\theta_1,\theta_2,\theta_3,...\theta_C;\mu_{m,r},\sigma_{m,r})\) which should be estimated and \(\Phi\) denotes the cumulative distribution functions of the canonical Gaussian. Note that \(\theta_{C+1} = \infty\).

Someone might consider that this is not a suitable model cuz it dose not includes full heterogenity. However, in MCMC algorithm, such full model leads the author non-convergent issues. So, I am tired. I do not implement this multinomial model yet, but what is the difficulty in Bayesian? The author begin R language in 2~3 years ago with Stan, then first, I implement the full heterogeneity model. But against my effort, it never converges. So, next, the bitch author tried to reduce this individual differences so that the model converges. Then I found this model. So, please do not bother me such questions.

\[ dz_c := z_{c+1}-z_{c},\\ dz_c, \sigma_{m,r} \sim \text{Uniform}(0,3),\\ z_{c} \sim \text{Uniform}( -3,3),\\ A_{m} \sim \text{Uniform}(0,1).\\ \]

This is only example, and in this package I implement proper priors. The author thinks the above prior is intuitively the simplest non informative priors without the coordinate free property.

```
fit <- fit_Bayesian_FROC(
ite = 1111,
cha = 1,
summary = TRUE,
Null.Hypothesis = F,
dataList = dd # example data to be fitted a model
)
# Or GUI by Shiny
fit_GUI_Shiny_MRMC()
```

Poisson rate and multinomial Bernoulli rates should be contained the *regular* interval \([ \epsilon, 1-\epsilon]\) for some fixed small \(\epsilon\), i.e., \(p_c,q_c \in [ \epsilon, 1-\epsilon]\)

Monotonicity \(p_1<p_2<\cdots\) and \(q_1>q_2>\cdots\) also reasonable, where subscripts means rating and a high number indicates a high confidence level.

The author found simultaneous zero hits or false alarms cause a bias in MCMC sampling. If prior is not suitable or non-informative, then such phenomenon occurs and SBC detects it.

So, we have to find the prior to satisfy the monotonicity and the *regular* interval condition.

The following SBC shows our prior is not good, because for some parameter, the rank statistics is not uniformly distributed.

```
stanModel <- stan_model_of_sbc()
Simulation_Based_Calibration_single_reader_single_modality_via_rstan_sbc(
stanModel = stanModel,
ite = 233,
M = 111, # or 1111 which tooks a lot of times
epsilon = 0.04,BBB = 1.1,AAA =0.0091,sbc_from_rstan = TRUE)
# or
stanModel <- stan_model_of_sbc()
Simulation_Based_Calibration_single_reader_single_modality_via_rstan_sbc(
stanModel = stanModel,
ite = 233,
M = 1111,
epsilon = 0.04,BBB = 1.1,AAA =0.0091,sbc_from_rstan = TRUE)
```

In my view, SBC dose not give me practical priors. Even if SBS has good performance, those priors cannot be implemented in this pkg, because such a model dose not fit practical data.

*Bayesian analysis of a ROC curve for categorical data using a skew-binormal model; Balgobin Nandram and Thelge Buddika Peiris (This paper is nice!); 2018 volume 11 369-384; Statistics and its interface*

In the following, the author pointed out why frequentist p value is problematic. Of course, under some condition, Bayesian p -value coincides with frequentist p value, so the scheme statistical test is problematic. We shall show the reason in the following simple example.

To tell the truth, I want to use epsilon delta manner, but for non-mathematics people, I do not use it.

- The next section
*proves*the monotonicity of p value for the most simple statistical test.

I want to publish this proof, but all reviewers are against it. There is a free space to speech here, so please enjoy my logic. I really like this because My heart is in math.

The methods of statistical testing are widely used in medical research. However, there is a well-known problem, which is that a large sample size gives a small p-value. In this section, we will provide an explicit explanation of this phenomenon with respect to simple hypothesis tests.

Consider the following null hypothesis \(H_0\) and its alternative hypothesis \(H_1\); \[\begin{eqnarray*} H_0: \mathbb E[ X_i] &=&m_0, \\ H_1: \mathbb E[ X_i] &>&m_0, \\ \end{eqnarray*}\] where \(\mathbb E[ X_i]\) means the expectation of random samples \(X_i\) from a normal distribution whose variance \(\sigma _0 ^2\) is known. In this situation, the test statistic is given by \[ Z^{\text{test}} := \frac{\overline{X_{n}} -m_0 }{\sqrt{\sigma _0 ^2/n}}, \] where \(\overline{X_{n}} := \sum_{i=1,\cdots,n} X_i /n\) is normally distributed with mean \(m_0\) and standard deviation \(\sigma_0/\sqrt{n}\). Under the null hypothesis, \(Z^{\text{test}}\) is normally distributed with mean \(0\) and a standard deviation \(1\) (standard normal distribution). The null hypothesis is rejected if \(Z^{\text{test}} >z_{2\alpha}\) , where \(z_{2\alpha}\) is a percentile point of the normal distribution, e.g., \(z_{0.025}=1.96 .\)

Suppose that the true distribution of \(X_1, \cdots, X_n\) is a normal distribution with mean \(m_0 + \epsilon\) and variance \(\sigma _0 ^2\), where \(\epsilon\) is an arbitrary fixed positive number. Then \[\begin{eqnarray*} Z^{\text{test}} &=&\frac{\overline{X_{n}} -(m_0+\epsilon -\epsilon) }{\sqrt{\sigma _0 ^2/n}}\\ &=& Z^{\text{Truth}} + \frac{\epsilon}{\sqrt{\sigma _0 ^2/n}}\\ \end{eqnarray*}\] where \(Z^{\text{Truth} }:=(\overline{X_{n}} -(m_0+\epsilon ))/\sqrt{\sigma _0 ^2/n}\).

In the following, we calculate the probability with which we reject the null hypothesis \(H_0\) with confidence level \(\alpha\). \[\begin{eqnarray*} \text{Prob}(Z^{\text{test}} >z_{2\alpha} ) &=&\text{Prob} (Z^{\text{Truth} } + \frac{\epsilon}{\sqrt{\sigma _0 ^2/n}} >z_{2\alpha})\\ &=&\text{Prob} (Z^{\text{Truth} } >z_{2\alpha} - \frac{\epsilon}{\sqrt{\sigma _0 ^2/n}})\\ &=&\text{Prob} (Z^{\text{Truth} } >z_{2\alpha} - \frac{\epsilon}{\sigma _0 }\sqrt{n} )\\ \end{eqnarray*}\] Note that \(\epsilon /\sigma _0\) is called the effect size.

Thus, if \(z_{2\alpha} - \epsilon \sqrt{n} /\sigma _0 < z_{2(1-\beta)}\), i.e., if \(n > ( z_{2\alpha}- z_{2(1-\beta)})^2 \sigma _0 ^2 \epsilon ^{-2}\), then the probability that the null hypothesis is rejected is greater than \(1 - \beta\).

For example, consider the case \(\sigma _0 =1\), \(\alpha =0.05\), and \((1-\beta) =\alpha\), then \(z_{2\alpha}=1.28\) and in this case, for all \(\epsilon>0\), if \(n > 7 \epsilon ^{-2}\) then the probability in which above hypothesis test concludes that the difference of the observed mean from the hypothesized mean is significant is greater than \(0.95\). This means that almost always the p-value is less than 0.05. Thus a large sample size induces a small p-value.

For example,

if \(\epsilon =1\) then by taking a sample size such that \(n > 7\), then almost always the conclusion of the test will be that the observed difference is statistically significant. Similarly,

if \(\epsilon =0.1\) then by taking a sample size such that \(n > 700\), then almost all tests will reach the conclusion that the difference is significant; and

if \(\epsilon =0.01\) then by taking sample size so that \(n > 70000\), then the same problem will arise.

This phenomenon also means that in large samples statistical tests will detect very small differences between populations.

By above consideration we can get the result ``significance difference’’ with respect to any tiny difference \(\epsilon\) by collecting a large enough sample \(n\), and thus we must not use the statistical test.