Title: | Functions and Datasets for the book "Bayesian Data Analysis" |
---|---|
Description: | Functions for Bayesian Data Analysis, with datasets from the book "Bayesian data Analysis (second edition)" by Gelman, Carlin, Stern and Rubin. Not all datasets yet, hopefully completed soon. |
Authors: | Compiled by Kjetil Halvorsen |
Maintainer: | Kjetil Halvorsen <[email protected]> |
License: | GPL (>= 2) |
Version: | 2012.04-1 |
Built: | 2025-01-31 05:23:58 UTC |
Source: | https://github.com/cran/BayesDA |
Contingency table describing a survey of sources and quality of information about cancer for 1729 people.
data(contingency)
data(contingency)
A data frame with 32 observations on the following 6 variables.
source.1
a factor with levels N
Y
source.2
a factor with levels N
Y
source.3
a factor with levels N
Y
source.4
a factor with levels N
Y
knowledge
a factor with levels Good
Poor
count
number of respondents with this pattern
The sources of information are: (1) news media, (2) light reading, (3) solid reading, (4) lectures. The book (page 437) is fitting Bayesian loglinear models.
data(contingency)
data(contingency)
An experiment was conducted on 50 cows to estimate the effect of a feed additive (methionine hydroxy analog) on six outcomes related to the amount of milk fat produced by each cow. Four diets (treatments) were considered, corresponding to different levels of the additive, and three variables were recorded before treatment assignment: lactation number ( seasons of lactation), age, and initial weight of cow. Multiple randomizations were calculated, and choosen that one with ‘best balane’, however that was defined.
data(cow)
data(cow)
A data frame with 50 observations on the following 10 variables.
level
diet, treatment
lactation
lactation number, pretreatment
age
age of cow, pretreatment variable
initial.weight
initial weight, pretreatment
dry
response
milk
response
fat
response
solids
response
final.weight
response
protein
response
data(cow) summary(cow) names(cow) # Investigating balance on pretreatment variables: with(cow, tapply(lactation, level, mean)) with(cow, tapply(age, level, mean))
data(cow) summary(cow) names(cow) # Investigating balance on pretreatment variables: with(cow, tapply(lactation, level, mean)) with(cow, tapply(age, level, mean))
A serial dilution assay to study concentration of cockroach allergen in homes of asthma sufferes.
data(dilution)
data(dilution)
The format is: List of 4 \$ unknowns : num [1:8, 1:10] matrix with optical densities for unknowns in each column \$ standards : num [1:16] vector of standard solution optical densities \$ dil.unknowns : num [1:8] dilution factor for unknowns \$ dil.standards: num [1:16] dilution factor for standard solution
Concentration of standard was 0.64
data(dilution) str(dilution) unknowns <- dilution$unknowns standards <- dilution$standards dil.unknowns <- dilution$dil.unknowns dil.standards <- dilution$dil.standards plot(dil.standards, standards) matplot(dil.unknowns, unknowns, type="b")
data(dilution) str(dilution) unknowns <- dilution$unknowns standards <- dilution$standards dil.unknowns <- dilution$dil.unknowns dil.standards <- dilution$dil.standards plot(dil.standards, standards) matplot(dil.unknowns, unknowns, type="b")
Numbers of faults found in each of 32 rolls of fabric produced in a particular factory. Also given is the length of the roll.
data(fabric)
data(fabric)
A data frame with 32 observations on the following 2 variables.
length
length of roll
faults
number of faults in roll
The book uses this for exercise 5. page 441
data(fabric) str(fabric) names(fabric) # Identity link: with(fabric, plot(faults ~ length)) # log link: with(fabric, plot(faults ~ length, log="y")) # Fitting poisson regression models: mod1 <- glm(faults ~ length-1, data=fabric, family=poisson) OK <- require(MCMCpack) if(OK) mod2 <- MCMCpoisson(faults ~ length-1, data=fabric, b0=0, B0=0.0001) summary(mod1) confint(mod1) if(OK) summary(mod2) # The exercise is to investigate overdispersion ...
data(fabric) str(fabric) names(fabric) # Identity link: with(fabric, plot(faults ~ length)) # log link: with(fabric, plot(faults ~ length, log="y")) # Fitting poisson regression models: mod1 <- glm(faults ~ length-1, data=fabric, family=poisson) OK <- require(MCMCpack) if(OK) mod2 <- MCMCpoisson(faults ~ length-1, data=fabric, b0=0, B0=0.0001) summary(mod1) confint(mod1) if(OK) summary(mod2) # The exercise is to investigate overdispersion ...
A factorial designed experiment from chemistry. Three experimental variables representing reactor conditions, and the response, conversion (%) from n-Heptane to acetylene.
data(factorial)
data(factorial)
A data frame with 16 observations on the following 4 variables.
temperature
reactor temperature
ratio
ratio of H2 to n-heptane (mole ratio)
contact
Contact time (sec)
conversion
the response, conversion from n-heptane to acetylene
This data is used in an exercise on regression with many explanatory variables, page 413 of second edition. Original authors assume a quadratic functional form.
data(factorial) summary(factorial) # non-Bayesian analysis: fac.mod1 <- lm(conversion ~ temperature+ratio+contact+ I(temperature*ratio)+I(temperature*contact)+ I(ratio*contact)+I(temperature^2)+I(ratio^2)+I(contact^2), data=factorial) summary(fac.mod1)
data(factorial) summary(factorial) # non-Bayesian analysis: fac.mod1 <- lm(conversion ~ temperature+ratio+contact+ I(temperature*ratio)+I(temperature*contact)+ I(ratio*contact)+I(temperature^2)+I(ratio^2)+I(contact^2), data=factorial) summary(fac.mod1)
Worldwide airline fatalities, 1976–1985. Death rate is passenger deaths per 100 million passenger miles.
data(fatalities)
data(fatalities)
A data frame with 10 observations on the following 4 variables.
year
year
facc
number of fatal accidents
pdeaths
number of passenger deaths
rdeath
death rate
Source: Statistical Abstracts of the United States
data(fatalities) summary(fatalities)
data(fatalities) summary(fatalities)
Data on football point spreads and game outcomes (north american football) for ten seasons, 1981, 1983-1986, 1988-1992, each season are 224 games and they are strung together. Only three first seasons are used in chapter one of book.
data(football)
data(football)
A data frame with 2240 observations on the following 7 variables.
home
home indicator
favorite
favorite score
underdog
underdog score
spread
point spread
favorite.name
a factor with levels ATL
BUF
CHI
CIN
CLE
DAL
DEN
DET
GB
HOU
IND
KC
LAA
LAN
MIA
MIN
NE
NO
NYG
NYJ
PHA
PHX
PIT
SD
SEA
SF
TB
WAS
underdog.name
a factor with levels ATL
BUF
CHI
CIN
CLE
DAL
DEN
DET
GB
HOU
IND
KC
LAA
LAN
MIA
MIN
NE
NO
NYG
NYJ
PHA
PHX
PIT
SD
SEA
SF
TB
WAS
week
a numeric vector
Football experts provide the point spread as a measure of the difference in ability between the two teams. For example, team A might be a 3.5 favourite to team B. The implication of this is that the proposition that team A, the favourite, defeats team B, the underdog, by 4 or more points, are considered a fair bet. In other words, the probability that A wins by more than 3.5 points is 0.5. If the point spread are an integer, then the implication is that team A is as likely to win by more points than the point spread as it is to win by fewer points than the point spread (or to loose). If the win is by exactly the point spread then neither side is paid off.
data(football) summary(football) names(football) # In chapter 1 only three first seasons are used: cap1 <- football[1:672, ]
data(football) summary(football) names(football) # In chapter 1 only three first seasons are used: cap1 <- football[1:672, ]
Number of attempts and successes of golf putts, by distance from the hole, for a sample of professional golfers.
data(golf)
data(golf)
A data frame with 19 observations on the following 3 variables.
distance
Distance from hole in feet
n
number of attempts
y
number of successes
This is used for an exercise on nonlinear modelling on page 515 in the second edition.
data(golf) names(golf) comment(golf) with(golf, plot(distance, y/n))
data(golf) names(golf) comment(golf) with(golf, plot(distance, y/n))
Simon Newcomb's measurements (1882) to measure the speed of light. The data are recordes as deviations from 24800 nanoseconds.
data(light)
data(light)
The format is: atomic [1:66] 28 26 33 24 34 -44 27 16 40 -2 ... - attr(*, "comment")= chr "Units: deviations from 24800 nanoseconds"
The currently accepted value for the speed of light on this scale is 33.0.
data(light) comment(light) hist(light, breaks=40) abline(v=33.0, col="red")
data(light) comment(light) hist(light, breaks=40) abline(v=33.0, col="red")
Results of 22 clinical trials of beta-blockers for reducing mortality after myocardial infection. Used for meta-analysis.
data(meta)
data(meta)
A data frame with 22 observations on the following 5 variables.
study
id code of study
control.deaths
number of deaths in control group
control.total
total number of patients in control group
treated.deaths
number of deaths in treatment group
treated.total
total number of patients in treatment group
The 22 clinical trials each consist of two groups of heart attack patients randomly allocated to receive or not receive beta-blockers ( a family of drugs that affect the central nervous system and can relax the heart musckles).
data(meta) names(meta) # Calculating empirical log-odds and its sampling variances: y <- apply(meta, 1, function(x) log( (x[4]/(x[5]-x[4]))/(x[2]/(x[3]-x[2])) ) ) s2 <- apply(meta, 1, function(x) 1/(x[5]-x[4]) + 1/x[4] +1/(x[3]-x[2]) + 1/x[2] ) cbind("Study number"=meta[,1], "empirical log odds"=y, "empirical sampling variance of y"=s2) #if(require(meta)){ # funnel(y, sqrt(s2)) # radial(y, sqrt(s2)) #}
data(meta) names(meta) # Calculating empirical log-odds and its sampling variances: y <- apply(meta, 1, function(x) log( (x[4]/(x[5]-x[4]))/(x[2]/(x[3]-x[2])) ) ) s2 <- apply(meta, 1, function(x) 1/(x[5]-x[4]) + 1/x[4] +1/(x[3]-x[2]) + 1/x[2] ) cbind("Study number"=meta[,1], "empirical log odds"=y, "empirical sampling variance of y"=s2) #if(require(meta)){ # funnel(y, sqrt(s2)) # radial(y, sqrt(s2)) #}
Population in all 804 municipalities in new York state in 1960, and two independent random samples from it.
data(newyork)
data(newyork)
A data frame with 804 observations on the following 2 variables.
population
a numeric vector Population
code
400 if in sample 1, 300 if in sample 2, 200 if in both, and 100 if in neither
Discussed on page 265 of second edition.
data(newyork) str(newyork)
data(newyork) str(newyork)
This is a tree-way array showing responses to 15 possible reactions in 23 situations for 54 persons. It is used in the book as an example for posterior predictive checks.
data(personality)
data(personality)
The format is: num [1:15, 1:23, 1:54] 0 0 0 1 0 0 1 2 0 0 ... - attr(*, "dimnames")=List of 3 ..\$ response : chr [1:15] "1" "2" "3" "4" ... ..\$ situation: chr [1:23] "1" "2" "3" "4" ... ..\$ person : chr [1:54] "1" "2" "3" "4" ...
data(personality) str(personality) # Following code adapted from file personality3.R on the book's webpage: nsubjects <- 6 nrep <- 7 test <- function (a){ output <- as.vector(a)>0 glm.data.frame <- data.frame (output, response, situation, person) glm0 <- glm (output ~ factor(response) + factor(situation) + factor(person), family=binomial(link=logit), data=glm.data.frame) pred0 <- predict.glm (glm0, type="response") mean (ifelse(output, (1-pred0)^2, pred0^2)) } data <- personality[,,1:nsubjects] attrs <- attributes(data) data <- ifelse (data>0, 1, 0) attributes(data) <- attrs
data(personality) str(personality) # Following code adapted from file personality3.R on the book's webpage: nsubjects <- 6 nrep <- 7 test <- function (a){ output <- as.vector(a)>0 glm.data.frame <- data.frame (output, response, situation, person) glm0 <- glm (output ~ factor(response) + factor(situation) + factor(person), family=binomial(link=logit), data=glm.data.frame) pred0 <- predict.glm (glm0, type="response") mean (ifelse(output, (1-pred0)^2, pred0^2)) } data <- personality[,,1:nsubjects] attrs <- attributes(data) data <- ifelse (data>0, 1, 0) attributes(data) <- attrs
Respondents to the CBS telephone survey classified by opinion, number of residential telephone lines, and number of adults in the household.
data(phones)
data(phones)
A data frame with 27 observations on the following 7 variables.
adults
number of adults in household
preference
a factor with levels Bush
Dukakis
No opinion/other
lines.1
number with one tele line
lines.2
number with 2 tele lines
lines.3
number with 3 tele lines
lines.4
number with 4 tele lines
lines.Q
number with unknown tele lines
This is used in exercises pages 242–243 in the book.
data(phones) summary(phones)
data(phones) summary(phones)
In the evaluation of drugs for possible clinical application, studies are routinely performed on rodents. For a particular study drawn from the statistical literature, suppose the immediate aim is to estimate theta, the probability of tumor in a population of female laboratory rats of type 'F344' that receive a zero dose of the drug — a control group. This gives data from historical control groups, and one current experimental control group.
data(rats)
data(rats)
A data frame with 71 observations on the following 3 variables.
y
number of rats with tumors
N
number of rats in experiment
Current
a factor with levels 0
1
data(rats) summary(rats) # moment estimate of (alfa, beta) in beta distribution is (1.4, 8.6) with(subset(rats, Current=="0"), hist( y/N, freq=FALSE)) plot(function(x) dbeta(x, 1.4, 8.6), from=0, to=1, col="red", add=TRUE) # plotting posterior in same plot: plot(function(x) dbeta(x, 5.4, 18.6), from=0, to=1, col="blue", add=TRUE)
data(rats) summary(rats) # moment estimate of (alfa, beta) in beta distribution is (1.4, 8.6) with(subset(rats, Current=="0"), hist( y/N, freq=FALSE)) plot(function(x) dbeta(x, 1.4, 8.6), from=0, to=1, col="red", add=TRUE) # plotting posterior in same plot: plot(function(x) dbeta(x, 5.4, 18.6), from=0, to=1, col="blue", add=TRUE)
Response times (in milliseconds) for 11 non-schizophrenic and 6 schizophrenic individuals.
data(schiz)
data(schiz)
The format is: num [1:30, 1:17] 312 272 350 286 268 328 298 356 292 308 ... - attr(*, "dimnames")=List of 2 ..$ : NULL ..$ : chr [1:17] "nonsch1" "nonsch2" "nonsch3" "nonsch4" ...
A numerical matrix with individuals as columns.
Psychological theory from the last half century and before suggests a model in which schizophrenics suffer from an attentional deficit on some trials, as well as a general motor reflex retardation.
data(schiz) str(schiz) # Making figure 18.1 in the book: opar <- par(no.readonly=TRUE) par( mar=c(2.0, 1,1,1)) par(mfrow=c(5,4)) for (i in 1:11) hist( log(schiz[,i]), main="", xlab="", ylab="", xlim=c(5.4, 7.5)) par( mfg=c(4, 1)) for (i in 1:6) hist( log(schiz[,11+i]), main="", xlab="", ylab="", xlim=c(5.4, 7.5)) par(opar)
data(schiz) str(schiz) # Making figure 18.1 in the book: opar <- par(no.readonly=TRUE) par( mar=c(2.0, 1,1,1)) par(mfrow=c(5,4)) for (i in 1:11) hist( log(schiz[,i]), main="", xlab="", ylab="", xlim=c(5.4, 7.5)) par( mfg=c(4, 1)) for (i in 1:6) hist( log(schiz[,11+i]), main="", xlab="", ylab="", xlim=c(5.4, 7.5)) par(opar)
Results of CBS News survey of 1447 adults in the United States, divided into
16 strata. The sampling is assumed to be proportional, so that the population
proportions , are approximately equal to the sampling proportions,
.
data(stratified)
data(stratified)
A data frame with 16 observations on the following 5 variables.
region
a character vector
bush
proportio declaring to vote for Bush
dukakis
proportion declaring to vote for Dukakis
other
proportion declaring to vote for other
proportion
sample proportion
data(stratified) str(stratified)
data(stratified) str(stratified)