0. 감잡기

1. 핵심 개념

2. 예시상황

동전을 만드는 공장

Q: 동전을 9번 던져서 6번의 앞면이 나왔다고 할 때, __앞면위주로 만드는 공장__에서 동전이 왔을 경우의 사후 확률과 __뒷면위주로 만드는 공장__에서 동전이 왔을 경우의 사후 확률을 어떻게 구할 것인가?

2.1. Solution by Formal Analysis

# 원래 쓰는 함수
pD <- function(z,N,a,b){exp(beta(z+a,N-z+b) - beta(a,b))}

# 조금 더 robust하도록 log를 취한 beta lbeta를 씀
pD <- function(z,N,a,b){exp(lbeta(z+a,N-z+b) - lbeta(a,b))}

2.2 Solution by Grid Approximation

alt text

3. Solution by MCMC

3.1 Nonhierarchical MCMC Computation

  • 모형 \(m\)\(p(D|m)\)를 계산하는 방법을 알아보자
  • 복잡한 모형에 적용하기는 어려움으로 실생활에서는 잘 쓰이지 않는 방법.
  • 우선 확률분포함수(probability distribution function)를 approximation하는 방법은 \[ \int { \quad f({ \theta })p({ \theta })\quad d\theta \quad \approx \quad \frac { 1 }{ N } \sum _{ { \theta }_{ i }\sim p(\theta ) }^{ N }{ f({ \theta }_{ i }) } } \]
  • 이를 모형 비교 상황에 적용하면 \[ p(D)\quad =\int { \quad p(D|\theta )p(\theta )d\theta \quad \approx \quad \frac { 1 }{ N } \sum _{ { \theta }_{ i }p(\theta ) }^{ N }{ p(D|{ \theta }_{ i }) } } \]
    • 샘플링된 거의 모든 값에 대해 $p(D|)가 거의 0임
    • 특정 값으로 수렴하게 만들려면 엄청엄청엄청나게 많은 샘플이 필요함
  • 따라서 prior에서 샘플링을 하는 대신에 MCMC 샘플을 써보도록 하자
    • 베이즈 정리 원형 \[ p(\theta |D)\quad =\quad \frac { p(D|\theta )p(\theta ) }{ p(D) } \]
    • 변형된 베이즈정리 \[ \quad \frac { 1 }{ p(D) } =\frac { p(\theta |D) }{ p(D|\theta )p(\theta ) } \]
    • 여기에 마술을 부려서 \[ \frac { 1 }{ p(D) } \quad =\quad \frac { p(\theta |D) }{ p(D|\theta )p(\theta ) } \quad =\quad \frac { p(\theta |D) }{ p(D|\theta )p(\theta ) } \int { h(\theta )d\theta } \\ =\int { \frac { h(\theta ) }{ p(D|\theta )p(\theta ) } p(\theta |D) } \\ \approx \quad \frac { 1 }{ N } \sum _{ { \theta }_{ i } \sim p(\theta |D) }^{ N }{ \frac { h({ \theta }_{ i }) }{ p(D|{ \theta }_{ i })p({ \theta }_{ i }) } } \\ \]
  • \(h(\theta)\)를 흉내내는 과정
  • 복잡한 모형에는 제대로 작용하지 못할 수 있다.

3.1.1 JAGS 활용

  • \(p(D)\)를 계산하기 위해서 MCMC 결과를 활용
  • Jags-Ydich-Xnom1subj-Mbernbeta.R에서 문제의 상황에 맞춰 코드를 변경
  • \(m=2\)의 경우, \(\omega=0.75\) \(\kappa=12\)였던 prior를 다음과 같이 바꾸어 준다:
source("Jags-Ydich-Xnom1subj-Mbernbeta_ver1.R")
## 
## *********************************************************************
## Kruschke, J. K. (2015). Doing Bayesian Data Analysis, Second Edition:
## A Tutorial with R, JAGS, and Stan. Academic Press / Elsevier.
## *********************************************************************
## Loading required package: coda
## Linked to JAGS 3.4.0
## Loaded modules: basemod,bugs
modelString = "
    model {
  for (i in 1:Ntotal) {
    y[i] ~ dbern(theta)
  }
  theta ~ dbeta( 0.75*(12-2)+1, (1-0.75)*(12-2)+1)
}
  "
  • MCMC posterior sample 생성
myData = c(rep(0,9-6), rep(1,6))
mcmcCoda = genMCMC(data=myData, numSavedSteps=10000)
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
##    Graph Size: 21
## 
## Initializing model
## 
## Burning in the MCMC chain...
## Sampling final MCMC chain...
theta = as.matrix(mcmcCoda)[,"theta"]
  • \(p(D)\)를 계산
meanTheta = mean(theta)
sdTheta = sd(theta)

aPost = meanTheta * (meanTheta*(1-meanTheta)/sdTheta^2 - 1)
bPost = (1 - meanTheta) * (meanTheta*(1-meanTheta)/sdTheta^2 - 1)

oneOverPD = mean( dbeta(theta, aPost, bPost) / (theta^6*(1-theta)^(9-6) * dbeta(theta, 0.75*(12-2)+1, (1-0.75)*(12-2)+1)))

PD = 1/oneOverPD

show(PD)
## [1] 0.002337925
  • formal analysis에서 나온 결과와 근사한 것을 확인 할 수 있다.

3.2 Hierarchical MCMC Computation of Relative Model Probability

  • full hierarchical 구조
  • Jags-Ydich-Xnom1subj-MbernBetaModelComp.R를 활용할 예정
  • 이런 식을 \[ \frac { \prod _{ m }^{ }{ { p }_{ m }(D|{ \theta }_{ m },m){ p }_{ m }({ \theta }_{ m }|m)p(m) } }{ \sum _{ m }^{ }{ \int _{ }^{ }{ \prod _{ m }^{ }{ { p }_{ m }(D|{ \theta }_{ m\quad },m){ p }_{ m }({ \theta }_{ m }|m)p(m) } d{ \theta }_{ m } } } } \]
  • JAGS에서 다음과 같이 구현
## 
## *********************************************************************
## Kruschke, J. K. (2015). Doing Bayesian Data Analysis, Second Edition:
## A Tutorial with R, JAGS, and Stan. Academic Press / Elsevier.
## *********************************************************************
  • 데이터
#------------------------------------------------------------------------------
# THE DATA.

N=9
z=6
y = c( rep(0,N-z) , rep(1,z) )
dataList = list(
  y = y ,
  N = N 
)

#------------------------------------------------------------------------------
  • 모형
  • Bernoulli likelihood function과 beta prior distribution이 두 모델 모두에 관여하는 경우, 필수는 아님
  • omega로 model index를 명시
modelString = "
model {
  for ( i in 1:N ) {
    y[i] ~ dbern( theta )
  }
  theta ~ dbeta( omega[m]*(kappa-2)+1 , (1-omega[m])*(kappa-2)+1 ) 
  omega[1] <- .25
  omega[2] <- .75
  kappa <- 12
  m ~ dcat( mPriorProb[] )
  mPriorProb[1] <- .5
  mPriorProb[2] <- .5
}
"
writeLines( modelString , con="TEMPmodel.txt" )
  • chain 돌리기
# RUN THE CHAINS.

parameters = c("theta","m") 
adaptSteps = 1000             # Number of steps to "tune" the samplers.
burnInSteps = 1000           # Number of steps to "burn-in" the samplers.
nChains = 4                   # Number of chains to run.
numSavedSteps=50000          # Total number of steps in chains to save.
thinSteps=1                   # Number of steps to "thin" (1=keep every step).
nPerChain = ceiling( ( numSavedSteps * thinSteps ) / nChains ) # Steps per chain.
# Create, initialize, and adapt the model:
jagsModel = jags.model( "TEMPmodel.txt" , data=dataList , # inits=initsList , 
                        n.chains=nChains , n.adapt=adaptSteps )
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
##    Graph Size: 27
## 
## Initializing model
# Burn-in:
cat( "Burning in the MCMC chain...\n" )
## Burning in the MCMC chain...
update( jagsModel , n.iter=burnInSteps )
# The saved MCMC chain:
cat( "Sampling final MCMC chain...\n" )
## Sampling final MCMC chain...
codaSamples = coda.samples( jagsModel , variable.names=parameters , 
                            n.iter=nPerChain , thin=thinSteps )
# resulting codaSamples object has these indices: 
#   codaSamples[[ chainIdx ]][ stepIdx , paramIdx ]

save( codaSamples , file=paste0(fileNameRoot,"Mcmc.Rdata") )
  • chain 상태 진단
# Display diagnostics of chain:

parameterNames = varnames(codaSamples) # get all parameter names
for ( parName in parameterNames ) {
  diagMCMC( codaSamples , parName=parName ,
            saveName=fileNameRoot , saveType="eps" )
}
  • 결과
# Convert coda-object codaSamples to matrix object for easier handling.
mcmcMat = as.matrix( codaSamples , chains=TRUE )
m = mcmcMat[,"m"]
theta = mcmcMat[,"theta"]

# Compute the proportion of m at each index value:
pM1 = sum( m == 1 ) / length( m )
pM2 = 1 - pM1

# Extract theta values for each model index:
thetaM1 = theta[ m == 1 ]
thetaM2 = theta[ m == 2 ]

# Plot histograms of sampled theta values for each model,
# with pM displayed.
openGraph(width=7,height=5)
par( mar=0.5+c(3,1,2,1) , mgp=c(2.0,0.7,0) )
layout( matrix(c(1,1,2,3),nrow=2,byrow=FALSE) , widths=c(1,2) )
plotPost( m , breaks=seq(0.9,2.1,0.2) , cenTend="mean" , xlab="m" , main="Model Index" )
##        ESS   mean median     mode hdiMass hdiLow hdiHigh compVal
## m 14152.98 1.8264      2 2.001472    0.95      1       2      NA
##   pGtCompVal ROPElow ROPEhigh pLtROPE pInROPE pGtROPE
## m         NA      NA       NA      NA      NA      NA
plotPost( thetaM1 , 
          main=bquote( theta*" when m=1" * " ; p(m=1|D)" == .(signif(pM1,3)) ) , 
          cex.main=1.75 , xlab=bquote(theta) , xlim=c(0,1) )
##            ESS      mean    median      mode hdiMass    hdiLow   hdiHigh
## theta 7908.974 0.4519298 0.4510206 0.4554968    0.95 0.2561034 0.6635806
##       compVal pGtCompVal ROPElow ROPEhigh pLtROPE pInROPE pGtROPE
## theta      NA         NA      NA       NA      NA      NA      NA
plotPost( thetaM2 , 
          main=bquote( theta*" when m=2" * " ; p(m=2|D)" == .(signif(pM2,3)) ) , 
          cex.main=1.75 , xlab=bquote(theta) , xlim=c(0,1) )

##         ESS      mean    median      mode hdiMass    hdiLow   hdiHigh
## theta 41320 0.6901628 0.6966432 0.7027959    0.95 0.4986758 0.8764728
##       compVal pGtCompVal ROPElow ROPEhigh pLtROPE pInROPE pGtROPE
## theta      NA         NA      NA       NA      NA      NA      NA
saveGraph( file=paste0(fileNameRoot,"Post") , type="eps" )
## quartz_off_screen 
##                 2

3.2.1 pseudo-prior를 활용하여 자기 상관(autocorrelation)을 줄이기

  • MCMC random walk의 각 단계에서 JAGS는 주어진 데이터로 \(m\), \({\theta}_{1}\),\({\theta}_{2}\)를 만들어냄
  • 이때, chain의 각 단계에서 \(m=1\)의 경우 \({\theta}_{1}\), \(m=2\)의 경우 \({\theta}_{2}\)만 데이터를 설명하는 데 쓰이고(prior에 영향을 받으면서) 나머지 하나는 둥둥 데이터에 상관없이 떠다님

  • 기존에 사용하는 MCMC

modelString = "
model {
  for (i in 1:N){
    y[i] ~ dbern(theta)
  }
  theta <- equals(m,1)*theta1 + equals(m,2)*theta2
  theta1 ~ dbeta(omega1*(kappa1-2)+1, (1-omega1)*(kappa1-2)+1)
  omega1 <- .25
  kappa1 <- 12
  
  thetaq ~ dbeta(omega2*(kappa2-2)+1, (1-omega2)*(kappa2-2)+1)
  omega2 <- .75
  kappa2 <- 12
  
  m ~ dcat(mPriorProb[])
  mPriorProb[1] <- .5
  mPriorProb[2] <- .5
}
"
  • model index를 위한 chain이 매우 자기상관이 강할 수가 있다: chain이 다음 모델 순서전까지 이전 모형에 남아있을 수 있음.

  • pseudo prior를 활용해서 chain이 모형간에 잘 활용될 수 있도록 도와줄 수 있음
  • “Jags-Ydich-Xnom1subj-MbernBetaModelCompPseudoPrior.R” 참조

## 
## *********************************************************************
## Kruschke, J. K. (2015). Doing Bayesian Data Analysis, Second Edition:
## A Tutorial with R, JAGS, and Stan. Academic Press / Elsevier.
## *********************************************************************
  • 데이터
# THE DATA.

N=30
z=ceiling(.55*N)
y = c( rep(0,N-z) , rep(1,z) )
dataList = list(
  y = y ,
  N = N 
)
modelString = "
model {
  for ( i in 1:N ) {
    y[i] ~ dbern( theta )
  }
  theta <- equals(m,1)*theta1 + equals(m,2)*theta2
  theta1 ~ dbeta( omega1[m]*(kappa1[m]-2)+1 , (1-omega1[m])*(kappa1[m]-2)+1 ) 
  omega1[1] <- .10 # true prior value
  omega1[2] <- .40 # pseudo prior value
  kappa1[1] <- 20 # true prior value
  kappa1[2] <- 50 # pseudo prior value
  theta2 ~ dbeta( omega2[m]*(kappa2[m]-2)+1 , (1-omega2[m])*(kappa2[m]-2)+1 ) 
  omega2[1] <- .70 # pseudo prior value
  omega2[2] <- .90 # true prior value
  kappa2[1] <- 50 # pseudo prior value
  kappa2[2] <- 20 # true prior value
  m ~ dcat( mPriorProb[] )
  mPriorProb[1] <- .5
  mPriorProb[2] <- .5
}
" 
writeLines( modelString , con="TEMPmodel.txt" )
  • chain돌리기
# RUN THE CHAINS.

parameters = c("m","theta1","theta2") 
adaptSteps = 1000            # Number of steps to "tune" the samplers.
burnInSteps = 1000           # Number of steps to "burn-in" the samplers.
nChains = 4                  # Number of chains to run.
numSavedSteps=10000          # Total number of steps in chains to save.
thinSteps=1                  # Number of steps to "thin" (1=keep every step).
nPerChain = ceiling( ( numSavedSteps * thinSteps ) / nChains ) # Steps per chain.
# Create, initialize, and adapt the model:
jagsModel = jags.model( "TEMPmodel.txt" , data=dataList , # inits=initsList , 
                        n.chains=nChains , n.adapt=adaptSteps )
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
##    Graph Size: 68
## 
## Initializing model
# Burn-in:
cat( "Burning in the MCMC chain...\n" )
## Burning in the MCMC chain...
update( jagsModel , n.iter=burnInSteps )
# The saved MCMC chain:
cat( "Sampling final MCMC chain...\n" )
## Sampling final MCMC chain...
codaSamples = coda.samples( jagsModel , variable.names=parameters , 
                            n.iter=nPerChain , thin=thinSteps )
# resulting codaSamples object has these indices: 
#   codaSamples[[ chainIdx ]][ stepIdx , paramIdx ]

save( codaSamples , file=paste0(fileNameRoot,"Mcmc.Rdata") )
  • 결과
  • pseudo-prior를 사용하지 않은 경우 alt text
# EXAMINE THE RESULTS.

# Convert coda-object codaSamples to matrix object for easier handling.
mcmcMat = as.matrix( codaSamples , chains=TRUE )
m = mcmcMat[,"m"]
# Compute the proportion of m at each index value:
pM1 = sum( m == 1 ) / length( m )
pM2 = 1 - pM1
# Extract theta values for each model index:
theta1M1 = mcmcMat[,"theta1"][ m == 1 ] # true theta1
theta1M2 = mcmcMat[,"theta1"][ m == 2 ] # pseudo theta1
theta2M1 = mcmcMat[,"theta2"][ m == 1 ] # pseudo theta2
theta2M2 = mcmcMat[,"theta2"][ m == 2 ] # true theta2

# Plot histograms of sampled theta values for each model,
# with pM displayed.
openGraph(width=7,height=7)
par( mar=0.5+c(3,1,2,1) , mgp=c(2.0,0.7,0) )
layout( matrix(c(1,1,2,3,4,5),nrow=3,byrow=TRUE)   )
plotPost( m , breaks=seq(0.95,2.05,0.1) , xlim=c(0.75,2.25) , 
          cenTend="mean" , xlab="m" , cex.main=1.75 ,
          main=bquote( "Model Index." * 
                         " p(m=1|D) =" * .(signif(pM1,3)) *
                         ", p(m=2|D) =" * .(signif(pM2,3)) ) )
##     ESS   mean median     mode hdiMass hdiLow hdiHigh compVal pGtCompVal
## m 10000 1.9195      2 2.000622    0.95      1       2      NA         NA
##   ROPElow ROPEhigh pLtROPE pInROPE pGtROPE
## m      NA       NA      NA      NA      NA
plotPost( theta1M1 , 
          main=bquote( theta[1]*" when m=1 (using true prior)" ) , 
          cex.main=1.75 , xlab=bquote(theta[1]) , xlim=c(0,1) )
##          ESS      mean    median      mode hdiMass    hdiLow   hdiHigh
## theta[1] 805 0.3977709 0.3964047 0.3884233    0.95 0.2662324 0.5169071
##          compVal pGtCompVal ROPElow ROPEhigh pLtROPE pInROPE pGtROPE
## theta[1]      NA         NA      NA       NA      NA      NA      NA
plotPost( theta2M1 , 
          main=bquote( theta[2]*" when m=1; pseudo-prior" ) , 
          cex.main=1.75 , xlab=bquote(theta[2]) , xlim=c(0,1) )
##               ESS      mean    median      mode hdiMass    hdiLow
## theta[2] 708.2775 0.6900603 0.6920265 0.7013501    0.95 0.5528739
##            hdiHigh compVal pGtCompVal ROPElow ROPEhigh pLtROPE pInROPE
## theta[2] 0.7953801      NA         NA      NA       NA      NA      NA
##          pGtROPE
## theta[2]      NA
plotPost( theta1M2 , 
          main=bquote( theta[1]*" when m=2; pseudo-prior" ) , 
          cex.main=1.75 , xlab=bquote(theta[1]) , xlim=c(0,1) )
##               ESS      mean    median      mode hdiMass    hdiLow
## theta[1] 6059.944 0.4041653 0.4018718 0.3921399    0.95 0.2768463
##            hdiHigh compVal pGtCompVal ROPElow ROPEhigh pLtROPE pInROPE
## theta[1] 0.5427986      NA         NA      NA       NA      NA      NA
##          pGtROPE
## theta[1]      NA
plotPost( theta2M2 , 
          main=bquote( theta[2]*" when m=2 (using true prior)" ) , 
          cex.main=1.75 , xlab=bquote(theta[2]) , xlim=c(0,1) )

##               ESS      mean    median      mode hdiMass    hdiLow
## theta[2] 5976.625 0.6846584 0.6867314 0.6884611    0.95 0.5587032
##            hdiHigh compVal pGtCompVal ROPElow ROPEhigh pLtROPE pInROPE
## theta[2] 0.8133595      NA         NA      NA       NA      NA      NA
##          pGtROPE
## theta[2]      NA
saveGraph( file=paste0(fileNameRoot,"Post") , type="eps" )
## quartz_off_screen 
##                 2

3.3 서로 다른 “noise”가 있는 모델

  • 지금까지의 예제는 모든 모형에 대한 pdf, 즉 \(p(D|\theta)\)가 모두 같은 상황이었음
  • probability distribution을 때때로 noise distribution이라고 부름 (random variability)
  • 그러나 실생활에서는 모형들이 서로다른 데이터를 표현하는 경우가 있
  • MCMC 과정에서, 서로 다른 distribution 가정에 대한 likelihood를 계산해서 비교하여 선택가능
  • JAGS에서 pdf1, pdf2 등으로 명시해줄 수 있다.
#data
#  C <- 10000
#  for (i in 1:N) {
#    ones[i] <- 1
#

#model
modelString = "
model {
  for (i in 1:N) {
    spy1[i] <- pdf1( y[i] , parameters1)/C
    spy2[i] <- pdf2( y[i] , parameters2)/C
    spy[i] <- equals(m,1)*spy1[i] + equals(m,2)*spy2[i]
    ones[i] ~ dbern(spy[i])
  }
  parameters1 ~ dprior1...
  parameters2 ~ dprior2...
  m ~ dcat( mPriorProb[])
  mPriorProb[1] <- .5
  mPriorProb[2] <- .5
}
"

4. 예측: Model Averaging

\[ p(\overset { \wedge }{ y } |D)\quad =\quad \sum _{ m }^{ }{ p(\overset { \wedge }{ y } |D,m)p(m|D) } \\ =\quad \sum _{ m }^{ }{ \int { { \theta }_{ m }{ p }_{ m }(\overset { \wedge }{ y } |{ \theta }_{ m },m){ p }_{ m }({ \theta }_{ m }|D,m)p(m|D)\quad d{ \theta }_{ m } } } \]

5. Model Complexity

#z=15; N=20; pD(z,N,a=500,b=500)/pD(z,N,a=1,b=1)
#z=11; N=20; pD(z,N,a=500,b=500)/pD(z,N,a=1,b=1)

5.1 Nested model comparison의 위험

  • nested model: full model에 비교되는 reduced model
  • model complexity와 마찬가지로, 일반적인 model comparison 상황에서는 항상 full model이 win할 것
  • 그러나 Bayesian Model Comparison에서는, 만약 주어진 데이터가 restricted model에 의해 잘 설명될 수 있으면, restricted model을 선호하게 될 것

6. Prior Distribution에 대한 지나친 민감도

#z=65; N=100; pD(z,N,a=500,b=500)/pD(z,N,a=1,b=1)
#z=65; N=100; pD(z,N,a=500,b=500)/pD(z,N,a=0.01,b=0.01)

6.1. Priors of different models should be equally informed

  • 그러면 어떻게 해결?
  • 모형적합에 쓸 데이터의 일부분을 떼어내어 각 모형의 prior에 정보를 준다
  • 위의 예시에서 데이터의 10%를 떼어냈다고 치자(10번 던져서 6면의 앞면이 나옴)
  • prior1의 경우
#z=65-6; N=100-100; pD(z,N,a=500+6,b=500-6)/pD(z,N,a=1+6,b=1+10-6)
  • prior2의 경우
#z=65-6; N=100-100; pD(z,N,a=500+6,b=500-6)/pD(z,N,a=0.01+6,b=0.01+10-6)
  • Bayes Factor에 변화가 거의 없는 것을 확인할 수 있다.

<끝>