Title: | Datasets from Sen & Srivastava |
---|---|
Description: | Collection of datasets from Sen & Srivastava: "Regression Analysis, Theory, Methods and Applications", Springer. Sources for individual data files are more fully documented in the book. |
Authors: | Kjetil B Halvorsen <[email protected]> |
Maintainer: | Kjetil B Halvorsen <[email protected]> |
License: | GPL (>= 2) |
Version: | 2015.6.25.1 |
Built: | 2025-03-06 03:17:15 UTC |
Source: | https://github.com/cran/SenSrivastava |
The E1.1
data frame has 24 rows and 2 columns.
data(E1.1)
data(E1.1)
This data frame contains the following columns:
a numeric vector, vehicles per mile.
a numeric vector, miles per hour.
Example 1.1 page 2 in Sen and Srivastava.
Huber, M.J (1957) Effect of temporary bridge on Parkway performance. Highway Research Board Bulletin 167 63–74.
data(E1.1) attach(E1.1) plot(DENSITY, sqrt(SPEED)) E1.1.m1 <- lm(sqrt(SPEED) ~ DENSITY + I(DENSITY^2), data=E1.1) summary(E1.1.m1)
data(E1.1) attach(E1.1) plot(DENSITY, sqrt(SPEED)) E1.1.m1 <- lm(sqrt(SPEED) ~ DENSITY + I(DENSITY^2), data=E1.1) summary(E1.1.m1)
The E1.11
data frame has 23 rows and 4 columns.
data(E1.11)
data(E1.11)
This data frame contains the following columns:
a character vector, names and state of each metropolitan area.
a numeric vector, units of measurenment not given.
a numeric vector, units of measurement not given.
a numeric vector, in thousands.
Dacey, M.F.(1983) Social science Theories and Methods I: Models of data, Evanston: Northwestern University.
data(E1.11) attach(E1.11) plot(Population, Violent.Crimes) detach()
data(E1.11) attach(E1.11) plot(Population, Violent.Crimes) detach()
The E1.15
data frame has 10 rows and 3 columns.
Stevens (1956) asked a number of persons to compare notes of
various decibel levels against a standard (80 decibels) and to assign
them a loudness rating with the standard note being a 10. logy is
the response variable and x the stimulus.
data(E1.15)
data(E1.15)
This data frame contains the following columns:
a numeric vector, the stimulus.
a numeric vector, the median response at x
a numeric vector, the log of y
.
Dacey,M.F. (1983) Social science Theories and Methods I: Models of Data Evanston: Northwestern University, fromStevens (1956).
data(E1.15) attach(E1.15) plot(x, logy) abline(lm( logy ~ x, data=E1.15)) detach()
data(E1.15) attach(E1.15) plot(x, logy) abline(lm( logy ~ x, data=E1.15)) detach()
The E1.16
data frame has 10 rows and 3 columns.
data(E1.16)
data(E1.16)
This data frame contains the following columns:
a character vector, name of the company
a numeric vector, 1972 earnings per share, in dollars.
a numeric vector, prive per share, in dollars, in may, 1973.
Dacey (1983, ch 1) from Moodys's Stock Survey, June 4, 1973, p 610.
with(E1.16, plot(Price.Share, Earn.Share))
with(E1.16, plot(Price.Share, Earn.Share))
The E1.17
data frame has 18 rows and 3 columns.
data(E1.17)
data(E1.17)
This data frame contains the following columns:
a numeric vector, district of Chicago. 1 is downtown Chicago.
a numeric vector, population density of each district.
a numeric vector, vehicle thefts per thousand residents.
Mark Buslik, Chicago Police Department.
data(E1.17) attach(E1.17) plot(pd, vtt) cat("Use the mouse to identify the outlier in the plot (click on the outlier)\n") ## Not run: identify(pd, vtt)
data(E1.17) attach(E1.17) plot(pd, vtt) cat("Use the mouse to identify the outlier in the plot (click on the outlier)\n") ## Not run: identify(pd, vtt)
The E1.18
data frame has 8 rows and 3 columns with
data on the number of marriages (ma
) that occurred between residents of each of
8 annular zones and residents of Simsbury, Connecticut, for the period 1930–39.
The number of residents of each zone is given as pop
and the midpoint
of distance between Simsbury and the band is given as d
.
data(E1.18)
data(E1.18)
This data frame contains the following columns:
a numeric vector, distance between Simsbury and midpoint of annular zone.
a numeric vector, population of annular zone.
a numeric vector, number of marriages.
Dacey (1983, ch 4) from Ellsworth (1948).
data(E1.18) summary(E1.18)
data(E1.18) summary(E1.18)
The E1.19
data frame has 20 rows and 3 columns.
Compiled from the catalog of one publisher of American Government books.
data(E1.19)
data(E1.19)
This data frame contains the following columns:
a numeric vector, price of book.
a numeric vector, number of pages of book.
a factor with levels
c
p
, c
is cloth and p
is paperback.
Compiled by one of the authors.
data(E1.19) summary(E1.19)
data(E1.19) summary(E1.19)
The E1.20
data frame has 13 rows and 7 columns.
data(E1.20)
data(E1.20)
This data frame contains the following columns:
a character vector, name of state.
a numeric vector, Physical Quality of Life Index, a measure of average wealth.
a numeric vector, combined infant mortality rate.
a numeric vector, rural male infant mortality rate.
a numeric vector, rural female infant mortality rate.
a numeric vector, urban male infant mortality rate.
a numeric vector, urban female infant mortality rate.
Dr. T.N.K.Raju, Department of Neonatology, University of Illinois at Chicago.
data(E1.20) ## Some data reorganization before analysis: ## Maybe reshape could have been used here? e1.20 <- data.frame(rbind(as.matrix(E1.20[,c(2,4)]), as.matrix(E1.20[,c(2,5)]), as.matrix(E1.20[,c(2,6)]), as.matrix(E1.20[,c(2,7)])),row.names=1:52) attr(e1.20,"names")[[2]] <- "IMR" e1.20$Female <- c(rep(0,13), rep(1,13),rep(0,13),rep(1,13)) e1.20$Urban <- c(rep(0,26),rep(1,26)) ## Now the analysis can start. summary(e1.20)
data(E1.20) ## Some data reorganization before analysis: ## Maybe reshape could have been used here? e1.20 <- data.frame(rbind(as.matrix(E1.20[,c(2,4)]), as.matrix(E1.20[,c(2,5)]), as.matrix(E1.20[,c(2,6)]), as.matrix(E1.20[,c(2,7)])),row.names=1:52) attr(e1.20,"names")[[2]] <- "IMR" e1.20$Female <- c(rep(0,13), rep(1,13),rep(0,13),rep(1,13)) e1.20$Urban <- c(rep(0,26),rep(1,26)) ## Now the analysis can start. summary(e1.20)
The E1.21
data frame has 24 rows and 2 columns. Data are on loads,
in pounds, and corresponding deformation, in inches, of a mild steel bar, of length 8 inches
and average diameter .564 inches.
data(E1.21)
data(E1.21)
This data frame contains the following columns:
a numeric vector, load, in pounds.
a numeric vector, corresponding deformation, in inches.
M.R. Khavanin, Department of Mechanical Engineering, University of Illinois at Chicago.
data(E1.21) attach(E1.21) plot(L, D) detach()
data(E1.21) attach(E1.21) plot(L, D) detach()
The E1.7
data frame has 6 rows and 2 columns. The relation between
population and number of telephones have been used to estimate the
population in non-census years.
data(E1.7)
data(E1.7)
This data frame contains the following columns:
a numeric vector, number of residents.
a numeric vector, number of telephones.
Prof. Edwin Thomas, Department of Geography, University of Illinois at Chicago.
data(E1.7) attach(E1.7) plot(RES, MAINS) plot(sqrt(RES), sqrt(MAINS))
data(E1.7) attach(E1.7) plot(RES, MAINS) plot(sqrt(RES), sqrt(MAINS))
The E10.1
data frame has 10 rows and 5 columns.
The responses were obtained by adding a N(0, 0.01) pseudorandom
variate to x.1
+0.5x.2
. The data were made up by the authors.
data(E10.1)
data(E10.1)
This data frame contains the following columns:
a numeric vector, predictor 1.
a numeric vector, predictor 2.
a numeric vector, response 1.
a numeric vector, response 2.
a numeric vector, response 3.
The data were made up by the authors.
data(E10.1) attach(E10.1) plot(x.1, x.2) names(E10.1) hascar <- require(car) if (hascar) { mod <- lm(y.1 ~ x.1+x.2, data=E10.1) vif(mod) }
data(E10.1) attach(E10.1) plot(x.1, x.2) names(E10.1) hascar <- require(car) if (hascar) { mod <- lm(y.1 ~ x.1+x.2, data=E10.1) vif(mod) }
The E10.11
data frame has 16 rows and 7 columns.
This is a selection of Longley's multicollinear data (1967).
data(E10.11)
data(E10.11)
This data frame contains the following columns:
a numeric vector, a price index.
a numeric vector, gross national product.
a numeric vector, unemployment rate.
a numeric vector, employment in the armed forces.
a numeric vector, noninstitutional population.
a numeric vector, the year.
a numeric vector, the response, total employment.
Reproduced from the Journal of the American Statistical Association, 62.
data(E10.11) summary(E10.11) plot(E10.11)
data(E10.11) summary(E10.11) plot(E10.11)
The E10.3
data frame has 30 rows and 6 columns.
This is part of a larger data set gathered for other purposes. The six variables
are each composites obtained from responses to a questionare. The dependent
variable y
is a composite of responses towards thr respondent's
supervisor and on job satisfaction. The highest possible score is 20. The
predictor variables are defined below.
data(E10.3)
data(E10.3)
This data frame contains the following columns:
a numeric vector, measures the level of social contact each respondent felt he or she had with the supervisor, was based on questions like "Do you see your supervisor outside of your work place?"
a numeric vector, measures the perceived level of interest from the supervisor in the employees personal life. Based on questions like "Would you discuss a personal problem with your supervisor?"
a numeric vector, measures the level of support the employee feels from the supervisor. Based on questions like "Is your supervisor supportive of your work?"
a numeric vector, together with x.5
measure the drive of the supervisor.
Based on the emplotees perception of this drive.
a numeric vector, based on questions like "Does your supervisor encourage you to learn new skills?"
a numeric vector, the response.
Sen and Srivastava (1990) Regression Analysis, Theory, Methods and Applications. Springer-verlag.
data(E10.3) summary(E10.3) plot(E10.3)
data(E10.3) summary(E10.3) plot(E10.3)
The E11.1
data frame has 20 rows and 5 columns.
data(E11.1)
data(E11.1)
This data frame contains the following columns:
a numeric vector, predictor 1.
a numeric vector, predictor 2.
a numeric vector, predictor 3.
a numeric vector, predictor 4.
a numeric vector, response.
Data made up by the authors.
data(E11.1) exleaps <- require("leaps", quietly=TRUE) if (exleaps) { E11.1.m1 <- regsubsets(y ~x.1+x.2+x.3+x.4, data=E11.1) summary(E11.1.m1) plot(E11.1.m1) }
data(E11.1) exleaps <- require("leaps", quietly=TRUE) if (exleaps) { E11.1.m1 <- regsubsets(y ~x.1+x.2+x.3+x.4, data=E11.1) summary(E11.1.m1) plot(E11.1.m1) }
The E2.1
data frame has 9 rows and 3 columns.
data(E2.1)
data(E2.1)
This data frame contains the following columns:
a numeric vector, grade point average (maximum=4)
a numeric vector, SAT verbal score.
a numeric vector, SAT mathematical score.
Dacey (1983).
data(E2.1) summary(E2.1)
data(E2.1) summary(E2.1)
The E2.11
data frame has 50 rows and 27 columns, this
combines exhibits E2.10 and E2.11 in the book. The data are for 1980 except as
noted.
data(E2.11)
data(E2.11)
This data frame contains the following columns:
a character vector, two-letter state code.
a numeric vector, total population (1000's).
a numeric vector, per mil of population living in urban areas.
a numeric vector, per mil who moved between 1965 and 1970.
a numeric vector, number of blacks (1000's).
a numeric vector, number of spanish speaking (1000's)
a numeric vector, number of native americans (100's).
a numeric vector, number of inmates of all institutions (correctional, mental, TB, etc) in 1970, (1000's).
a numeric vector, number of inmates of correctional institutions in 1970 (100's)
a numeric vector, Homes and schools for the mentally handicapped (100's)
a numeric vector, births per thousand.
a numeric vector, death rate from hearth disease per 100000 residents.
a numeric vector, suicide rate, 1978, per 100000.
a numeric vector, death rate from diabetes, 1978, per 100000.
a numeric vector, marriage rate, per 10000.
a numeric vector, divorce rate, per 10000.
a numeric vector, physicians per 100000.
a numeric vector, dentists per 100000.
a numeric vector, per mil high school grads.
a numeric vector, crime rate per 100000 population.
a numeric vector, murder rate oer 100000 population.
a numeric vector, prison rate (federal and state) per 100000 residents.
a numeric vector,
a numeric vector,
a numeric vector, telephjones per 100 (1979).
a numeric vector, per capita income in 1972 dollars.
a numeric vector, per mil of population below poverty label.
Compiled by Prof. Siim Soot, Department of Geography, University of Illinois at Chicago, from Statistical Abstract of the United States, 1981, U.S. Bureau of the Census, Washington, D.C.
data(E2.11) summary(E2.11)
data(E2.11) summary(E2.11)
The E2.2
data frame has 26 rows and 14 columns, data on
house prices in different zones of Chicago.
data(E2.2)
data(E2.2)
This data frame contains the following columns:
a numeric vector, selling price of house in thousands of dollars.
a numeric vector, number of bedrooms.
a numeric vector, floor space in sq. feet.
a numeric vector, number of fireplaces.
a numeric vector, number of rooms.
a numeric vector, storm windows (1 present, 0 absent).
a numeric vector, front footage of lot in feet.
a numeric vector, annual taxes.
a numeric vector, number of bathrooms.
a numeric vector, construction (0 if frame, 1 if brick).
a numeric vector, garage size (0=no garage, 10 1 auto garage, etc.).
a numeric vector, condition (1=needs work, 0 otherwise).
a numeric vector, indicator for zone A.
a numeric vector, indicator for zone B.
Ms. Terry Tasch of Long-Kogan Realty, Chicago.
data(E2.2) summary(E2.2)
data(E2.2) summary(E2.2)
The E2.4
data frame has 24 rows and 8 columns, all data
are for 1978.
data(E2.4)
data(E2.4)
This data frame contains the following columns:
a character vector, name of each country.
a numeric vector, cars per person.
a numeric vector, population of country in millions.
a numeric vector, population density.
a numeric vector, per capita income in U.S. dollars.
a numeric vector, gasoline price in U.S. cents per liter.
a numeric vector, Tonnes of gasoline consumed per car per year.
a numeric vector, thousands of passenger-kilometers per person of bus and rail use.
Develop a model with AO
as the response variable.
OECD (1982)
data(E2.4) summary(E2.4)
data(E2.4) summary(E2.4)
The E2.6
data frame has 10 rows and 2 columns.
data(E2.6)
data(E2.6)
This data frame contains the following columns:
a numeric vector, actual voltage.
a numeric vector, voltage computed from the measured power outout (using light output from electronic flash).
A definition of efficiency is the ratio V.c
/V.a
. Obtain
a model for efficiency E as a regresion in V.a
. Use a quadratic polynomial.
Examine the fit.
Armin Lehning, Speedotron Corporation.
data(E2.6) E2.6.m1 <- lm(V.c/V.a ~ V.a + I(V.a^2), data=E2.6) plot(E2.6.m1)
data(E2.6) E2.6.m1 <- lm(V.c/V.a ~ V.a + I(V.a^2), data=E2.6) plot(E2.6.m1)
The E2.7
data frame has 10 rows and 5 columns.
data(E2.7)
data(E2.7)
This data frame contains the following columns:
a numeric vector, year.
a numeric vector, number of cars per person.
a numeric vector, per capita GNP in 1000 korean Wons.
a numeric vector, average car price in 1000 korean Wons.
a numeric vector, gasoline price after taxes, in wons per liter.
KRIHS, (1985) Study of Road User Charges. Seoul: Korea Research Institute for Human Settlements.
data(E2.7) summary(E2.7)
data(E2.7) summary(E2.7)
The E2.8
data frame has 17 rows and 4 columns, data
for 17 factories in Shanghai.
data(E2.8)
data(E2.8)
This data frame contains the following columns:
a numeric vector, per capita output in Chinese Yuan.
a numeric vector, number of workers in the factory.
a numeric vector, land area of the factory in sq. meters per worker.
a numeric vector, investments in Yuan per worker.
Prof. Zhang Tingwei of Tongji University, Shanghai.
data(E2.8) summary(E2.8)
data(E2.8) summary(E2.8)
The E2.9
data frame has 15 rows and 10 columns.
The three sectors are "20": Food and kindred products, "36": Equipment and supplies and
"37": Transportation equipment.
data(E2.9)
data(E2.9)
This data frame contains the following columns:
a numeric vector, year without first two digits "19".
a numeric vector, capital of sector 20.
a numeric vector, capital of sector 36.
a numeric vector, capital of sector 37.
a numeric vector, labour of sector 20.
a numeric vector, labour of sector 36.
a numeric vector, labour of sector 37.
a numeric vector, real value added of sector 20.
a numeric vector, real value added of sector 36.
a numeric vector, real value added of sector 37.
Dr. Phillip Israelovich of the Federal Reserve Bank.
data(E2.9) summary(E2.9)
data(E2.9) summary(E2.9)
The E3.4
data frame has 13 rows and 2 columns.
World record times as of 1974.
data(E3.4)
data(E3.4)
This data frame contains the following columns:
a numeric vector, distance in meters.
a numeric vector, time in seconds.
Encyclopædia Britannica, 15th Edition, 1974, Micropædia, IX, page 485.
E3.5
, the records for women.
data(E3.4) summary(E3.4)
data(E3.4) summary(E3.4)
The E3.5
data frame has 6 rows and 2 columns.
Records are for 1974.
data(E3.5)
data(E3.5)
This data frame contains the following columns:
a numeric vector, distance run, in meters.
a numeric vector, time used, in seconds.
Encyclopædia Britannica, 15th Edition, 1974, Micropædia, IX, page 487.
E3.4
, for the men's records.
data(E3.5) data(E3.4) summary(E3.5) summary(E3.4) records <- rbind(E3.5,E3.4) sex <- factor(c(rep("F", 6), rep("M", 13))) records$sex <- sex summary(records)
data(E3.5) data(E3.4) summary(E3.5) summary(E3.4) records <- rbind(E3.5,E3.4) sex <- factor(c(rep("F", 6), rep("M", 13))) records$sex <- sex summary(records)
The E3.6
data frame has 50 rows and 6 columns.
data(E3.6)
data(E3.6)
This data frame contains the following columns:
a numeric vector, salary 1984, in dollars.
a numeric vector, salary 1983, in dollars.
a numeric vector, number of shares the chairman holds.
a numeric vector, total revenue of the company.
a numeric vector, total income of the company.
a numeric vector, age of chairman, in years.
Reprinted with permission from the May 13, 1985, issue of Crain's Chicago Business. Copyright 1985 by Crain's Communications, Inc. The data given are a portion of the original table.
data(E3.6) summary(E3.6)
data(E3.6) summary(E3.6)
The E3.7
data frame has 20 rows and 7 columns.
data(E3.7)
data(E3.7)
This data frame contains the following columns:
a numeric vector, day of measurement, all measurements are on the same sample.
a numeric vector, biological oxygen demand, mg/liter.
a numeric vector, total Kjeldahl nitrogen, mg/liter.
a numeric vector, total solids, mg/liter.
a numeric vector, total volatile solids, a component of x.3
,
in mg/liter.
a numeric vector, chemical oxygen demand, mg/liter.
a numeric vector, the response, log of oxygen demand, mg oxygen per minute.
This is data from an experiment to construct a model for total oxygen demand in dairy wastes as a dunction of five laboratory measurements. Data were collected on samples kept in suspension in water in a laboratory for 220 days. All observations given here were taken on the same sample over time, so are probably dependent.
Moore (1975) Total Biochemical Oxygen Demand of Animal Manures. Ph. D. thesis, University of Minnesota, Dept. of Agricultural Engineering.
data(E3.7) summary(E3.7)
data(E3.7) summary(E3.7)
The E3.8
data frame has 20 rows and 3 columns. 20 student volunteers
where given a map reading test and a test of route finding on transit maps.
data(E3.8)
data(E3.8)
This data frame contains the following columns:
a numeric vector, ability to find routes to a given destination on a transit route
map where scored y
.
a numeric vector, scores on a map reading ability test.
a factor with levels
Non.users
Users
, users and non-users of transit.
Preof. Siim Soot, Department of Geography, University of Illinois at Chicago.
data(E3.8) summary(E3.8)
data(E3.8) summary(E3.8)
The E3.9
data frame has 18 rows and 4 columns.
All the observations are for the same person.
data(E3.9)
data(E3.9)
This data frame contains the following columns:
a numeric vector, cardiac output.
a numeric vector, carbon dioxide level in the blood.
a numeric vector, blood flow velocity in the brain.
a factor with levels
no
with
, Aminophylline used or not. The hypothesis is that
aminophylline retards blood flow.
Tonse Raju, M.D., Department of Neonatology, University of Illinois at Chicago.
data(E3.9) summary(E3.9)
data(E3.9) summary(E3.9)
The E4.1
data frame has 10 rows and 3 columns.
Deaths are in deaths per 100 million vehicle miles.
data(E4.1)
data(E4.1)
This data frame contains the following columns:
a numeric vector, the year.
a numeric vector, number of deaths.
a numeric vector, deaths.t - deaths.(t-1).
The interest are in possible changes after new safety regulations where in effect after 1966.
Illinois Department of Transportation (1972).
data(E4.1) summary(E4.1)
data(E4.1) summary(E4.1)
The E4.10
data frame has 27 rows and 7 columns.
data(E4.10)
data(E4.10)
This data frame contains the following columns:
a numeric vector, precinct number.
a numeric vector, number of latin voters.
a numeric vector, number of non-latin voters.
a numeric vector, total number of votes cast.
a numeric vector, number of votes for Garcia.
a numeric vector, number of votes for Martinez.
a numeric vector, number of votes for Yanez.
Note that the votes for the three candidates may not add to the total turnout because of write-in votes, spoilt ballots, etc.
Ray Flores, The Latino Institute, Chicago.
data(E4.10) summary(E4.10)
data(E4.10) summary(E4.10)
The E4.11
data frame has 133 rows and 2 columns.
data(E4.11)
data(E4.11)
This data frame contains the following columns:
a numeric vector, the repair cost in dollars.
a factor with levels
Both
Ring gear
Starter
, the type of part being repaired.
M.R.Khavanin, Department of Mechanical Engineering, University of Illinois at Chicago.
data(E4.11) E4.11.m1 <- lm(Cost ~ Part - 1, data=E4.11) summary(E4.11.m1)
data(E4.11) E4.11.m1 <- lm(Cost ~ Part - 1, data=E4.11) summary(E4.11.m1)
The E4.12
data frame has 24 rows and 6 columns. Each row is the
activities and time taken by one dietician.
data(E4.12)
data(E4.12)
This data frame contains the following columns:
a numeric vector, sum of time taken for all activities.
a numeric vector, number of patient contacts for screening.
a numeric vector, number of patient contacts for diet class.
a numeric vector, number of patient contacts for meal rounds.
a numeric vector, number of patient contacts for team rounds.
a factor with levels
Intern
Prof
, dietician is professional or intern.
The data where made available to one of the authors by a student.
m1 <- lm(Time ~ SC+DC+MR+TR-1, data=E4.12, subset=Dietician=="Prof") summary(m1)
m1 <- lm(Time ~ SC+DC+MR+TR-1, data=E4.12, subset=Dietician=="Prof") summary(m1)
The E4.13
data frame has 49 rows and 5 columns. Data on hospital
charges for patients with an identical diagnosis.
data(E4.13)
data(E4.13)
This data frame contains the following columns:
a factor with levels
F
M
, male and female.
a factor with levels
499
730
1021
, three different medical doctors.
a factor with levels
1
2
3
4
, severity of illness.
a numeric vector, total hospital charge in dollars.
a numeric vector, age of patient in years.
Dr. Joseph Feinglass, Northwestern Memorial Hospital, Chicago.
data(E4.13) summary(E4.13)
data(E4.13) summary(E4.13)
The E4.4
data frame has 40 rows and 3 columns.
data(E4.4)
data(E4.4)
This data frame contains the following columns:
a numeric vector, a quality measure made using psychometric methods from results of questionares.
a numeric vector, an indicator variable for private ownership.
a numeric vector, an indicator variable for private for profit ownership.
The quality data, QUAL
, is constructed from questionares given
to users of such services in the state of Illinois. Multiple services
in the state of Illinois was scored using this method. The indicator variables
was constructed to give first (X.1
) a comparison between private
and public services, then (X.2
) a comparison between private
not-for-profit and private for profit services.
Slightly modified version of data supplied by Ms. Claire McKnight of the Department of Civil Engineering, City University of New York.
data(E4.4) summary(E4.4)
data(E4.4) summary(E4.4)
The E4.7
data frame has 101 rows and 3 columns.
data(E4.7)
data(E4.7)
This data frame contains the following columns:
a character vector, containing names of the countries.
a numeric vector, life expectancy, years. Early 1970's.
a numeric vector, per capita income in 1974 dollars. Early 1970's.
From the New York Times (September, 28, 1975, p E-3).
data(E4.7) attach(E4.7) plot(INC, LIFE) plot(log(INC), LIFE) detach()
data(E4.7) attach(E4.7) plot(INC, LIFE) plot(log(INC), LIFE) detach()
The E6.1
data frame has 62 rows and 2 columns.
data(E6.1)
data(E6.1)
This data frame contains the following columns:
a numeric vector, distance covered to come to a standstill after breaking.
a numeric vector, speed before breaking.
From Ezekiel,M. and F. A. Fox, Methods of Correlation and Regression Analysis. Copyright 1959 John Wiley and Sons, Inc.
data(E6.1) attach(E6.1) plot(sp., d.) detach()
data(E6.1) attach(E6.1) plot(sp., d.) detach()
The E6.10
data frame has 32 rows and 3 columns.
data(E6.10)
data(E6.10)
This data frame contains the following columns:
a numeric vector, number of respondents, weights for the linear regression.
a numeric vector, computed travel times between a pair of zones in Chicago.
a numeric vector, perceived travel times, as reported to the U.S. Census Bureau.
x
where computed from bus timetables, adding an average waiting time at the stop,
and an average walking time from zone center to bus stop. y
is the average reported by n
travelers, to the US census bureau. The variable
t
introduced in the example below is the one for multiple bus transfers, used
in example 8.1 page 161.
The data where selected by one of the authors from a larger data set compiled by Cæsar Singh from census tapes, timetables and maps.
data(E6.10) ## Manipulations of the data for example 8.1, page 161: t <- c(0,1,rep(0,20),1,rep(0,5),1,rep(0,3)) e6.10 <- data.frame(E6.10, t=t) rm(t) summary(e6.10)
data(E6.10) ## Manipulations of the data for example 8.1, page 161: t <- c(0,1,rep(0,20),1,rep(0,5),1,rep(0,3)) e6.10 <- data.frame(E6.10, t=t) rm(t) summary(e6.10)
The E6.11
data frame has 12 rows and 3 columns.
data(E6.11)
data(E6.11)
This data frame contains the following columns:
a numeric vector, height of father to the nearest inch.
a numeric vector, average heights of sons.
a numeric vector, number of fathers in each group.
dacey (1983, Ch. 1) from McNemar (1969, p. 130), Psycological Statistics.
data(E6.11) summary(E6.11)
data(E6.11) summary(E6.11)
The E6.8
data frame has 54 rows and 7 columns. It has 7 variables
describing 54 dial-a-ride services in U.S. and Canada. It needs
weighted regression.
data(E6.8)
data(E6.8)
This data frame contains the following columns:
a numeric vector, population of area where service where operating.
a numeric vector, area of the place where service where provided.
a numeric vector, number of riders using the system.
a numeric vector, hours of operation.
a numeric vector, number of vehicles in operation.
a numeric vector, the fare used.
a numeric vector, a composite index, 1 when several ridership enhancing features where present, and 0 elsewhere.
Collected by Louise Stanton-Maston, from 54 services in U.S. and Canada.
data(E6.8) summary(E6.8)
data(E6.8) summary(E6.8)
The E7.1
data frame has 4 rows and 12 columns.
Dental measurements for girls from 8 to 14 years old. Each measurement is the
distance, in mm, from the center of the pituary to the ptery-maxilliary fissure.
data(E7.1)
data(E7.1)
This data frame contains the following columns:
a numeric vector, age of girl when measurement was taken.
a numeric vector, measurements for girl 1.
a numeric vector, measurements for girl 2.
a numeric vector, measurements for girl 3.
a numeric vector, measurements for girl 4.
a numeric vector, measurements for girl 5.
a numeric vector, measurements for girl 6.
a numeric vector, measurements for girl 7.
a numeric vector, measurements for girl 8.
a numeric vector, measurements for girl 9.
a numeric vector, measurements for girl 10.
a numeric vector, measurements for girl 11.
Pothoff and Roy (1964).
data(E7.1) summary(E7.1)
data(E7.1) summary(E7.1)
The E7.2
data frame has 32 rows and 5 columns.
Prices are in 1972 cents (U.S) by 1000 BTU.
data(E7.2)
data(E7.2)
This data frame contains the following columns:
a numeric vector, year of observation.
a numeric vector, price of oil.
a numeric vector, price of Gas.
a numeric vector, price of Bituminous Coal and Lignite.
a numeric vector, price of Anthracite.
Darrel Sala, Institute of Gas Technology, Chicago.
data(E7.2) summary(E7.2)
data(E7.2) summary(E7.2)
The E7.3
data frame has 19 rows and 6 columns.
It gives the ratios u
of fluid intake to urine output over five
consecutive 8-hour periods for 19 babies divided in a control and
treatment group.
data(E7.3)
data(E7.3)
This data frame contains the following columns:
a factor with levels
surfactant
placebo
a numeric vector, u
for time period 1.
a numeric vector, u
for time period 2.
a numeric vector, u
for time period 3.
a numeric vector, u
for time period 4.
a numeric vector, u
for time period 5.
Rama Bhat, M.D., Department of Pediatrics, University of Illinoi at Chicago. This data is part of a larger data set.
data(E7.3) summary(E7.3)
data(E7.3) summary(E7.3)
The E7.4
data frame has 5 rows and 11 columns.
Five baby chimpanzees were injected with a heavy dose of HIV
infection. After six months, the radio-active microsphere technique
was used to measure brain blood flow in ml per 100 grams of brain tissue,
from five regions of the brain.
The partial pressure of carbon dioxide in millimeters of mercury was
also obtained.
data(E7.4)
data(E7.4)
This data frame contains the following columns:
a numeric vector, id number of chimpanzee.
a numeric vector, Frontal partial pressure of carbon dioxide.
a numeric vector, Frontal blood flow.
a numeric vector, Parietal partial pressure of carbon dioxide.
a numeric vector, Parietal blood flow.
a numeric vector, Occipital partial pressure of carbon dioxide.
a numeric vector, Occipital blood flow.
a numeric vector, Temporal partial pressure of carbon dioxide.
a numeric vector, Temporal blood flow.
a numeric vector, Cerebellum partial pressure of carbon dioxide.
a numeric vector, Cerebellum blood flow.
Tonse Raju, M.D. Department of pediatrics, University of Illinois at Chicago.
data(E7.4) summary(E7.4)
data(E7.4) summary(E7.4)
The E7.5
data frame has 26 rows and 6 columns.
data(E7.5)
data(E7.5)
This data frame contains the following columns:
a numeric vector, static weight of axle 1.
a numeric vector, weight in motion of axle 1.
a numeric vector, static weight of axles 2–3.
a numeric vector, weight in motion of axles 2–3.
a numeric vector, static weight of axles 4–5.
a numeric vector, weight in motion of axles 4–5.
Trucks can be weighted by two methods. In one, a truck needs to go into a
weighting station and each axle is weighted by conventional means. The
other is newer and a somewhat experimental method where a thin pad is placed
on the highway and axles are weighted as trucks pass over it. Former weight
are called static weights (sw
) while later are called weights in
motion (wim
).
Saleh Mumayiz, Urban Transportation Center, University of Illinois at Chicago, who compiled the data from a data set provided by the Illinois Department of Transportation.
data(E7.5) summary(E7.5) plot(E7.5)
data(E7.5) summary(E7.5) plot(E7.5)
The E7.6
data frame has 34 rows and 5 columns.
data(E7.6)
data(E7.6)
This data frame contains the following columns:
a character vector, name of area.
a numeric vector, percentage of population which are black.
a numeric vector, percentage of population which are spanish speaking.
a numeric vector, percentage of population over 65.
a numeric vector, median family income for each area.
Data set were constructed by Prof. Siim Soot, Dept. of Geography, University of Illinois at Chicago.
E7.7
, which is the adjacency
matrix for the 34 areas.
data(E7.6) summary(E7.6)
data(E7.6) summary(E7.6)
This is the contiguity matrix for the 34 areas in northern Chicago,
given in E7.6
. Contains only 0's and 1's with the
obvious interpretation.
data(E7.7)
data(E7.7)
The E8.12
data frame has 11 rows and 3 columns.
data(E8.12)
data(E8.12)
This data frame contains the following columns:
a character vector, the country.
a numeric vector, male deaths in 1950 for lung cancer, per million.
a numeric vector, per capita cigarette consumption in 1930.
Tufte, (1974) Data Analysis for Politics and Policy. Englewood Cliffs, N.J.: Prentice-Hall. Data are adapted.
data(E8.12) summary(E8.12)
data(E8.12) summary(E8.12)
The E8.13
data frame has 20 rows and 7 columns, giving
data on the effects of cloud seeding by silver iodide
crystals on precipitation. Each data point is one day.
data(E8.13)
data(E8.13)
This data frame contains the following columns:
a factor with levels
NoSeed
Seed
a numeric vector, number of days after the first day of the experiment.
a numeric vector, relates to heights of clouds.
a numeric vector, percentage of clode cover in the experimental area.
a numeric vector, total rainfall in the study area one hour before seeding (in $10^7$ cubic meters).
a factor with levels
Moving
Stationary
, indicating if the radar echo was mowing or not.
a numeric vector, the response, natural logarithm of precipitation in the target area in a 6-hour period (in $10^7$ cubic meters).
Woodley, et.al (1977) Rainfall Results 1970–1975: Florida Area Cumulus Experiment. . Science 95 735–742. Copyright 1977 by the AAAS.
data(E8.13) summary(E8.13) plot(E8.13)
data(E8.13) summary(E8.13) plot(E8.13)
The E9.11
data frame has 17 rows and 10 columns.
data(E9.11)
data(E9.11)
This data frame contains the following columns:
a numeric vector, average capacity of buses in service.
a numeric vector, ratio of buses in use during non-peak periods to those in use in peak periods.
a numeric vector, average speed.
a numeric vector, vehicle-miles contracted.
a numeric vector, distance of center from metroploitan area.
a numeric vector, population of metropolitan area.
a numeric vector, percentage of work trips in the metropolitan area that are made by transit.
a numeric vector, Buses owned by sponsor / buses owned by contractor
a numeric vector, per capita income for metropolitan area.
a numeric vector, per cent savings that occurred when some transit lines was given to private companies.
Prof E.K.Morlok, Dept. of Systems Engineering, University of Pennsylvania.
data(E9.11) summary(E9.11) plot(E9.11)
data(E9.11) summary(E9.11) plot(E9.11)
The E9.18
data frame has 51 rows and 4 columns.
data(E9.18)
data(E9.18)
This data frame contains the following columns:
a numeric vector, travel time by car, in tenth of minutes.
a numeric vector, travel time by public transportation, in tenth of minutes.
a numeric vector, number of those who used a car or van either as driver or passenger.
a numeric vector, number of people using any kind of public transportation.
Travel times modified by one of the authors to reflect the cost of parking. For downtown zones (Chicago) this amounted to about 60 minutes.
Selected by Robert Drozd from Census (US) Urban Transportation Planning Package, for the Chicago area.
data(E9.18) summary(E9.18) plot(E9.18)
data(E9.18) summary(E9.18) plot(E9.18)
The E9.19
data frame has 50 rows and 4 columns.
data(E9.19)
data(E9.19)
This data frame contains the following columns:
a numeric vector, Acceleration of different vehicles.
a numeric vector, weight-to-horsepower ratio.
a numeric vector, speed at which they were travelling.
a numeric vector, Grade of road, G=0 implies road was horizontal.
Raj Tejwaney, Department of civil Engineering, University of Illinoi at Chicago.
data(E9.19) summary(E9.19) plot(E9.19)
data(E9.19) summary(E9.19) plot(E9.19)
The E9.20
data frame has 16 rows and 3 columns.
data(E9.20)
data(E9.20)
This data frame contains the following columns:
a numeric vector, cost of cleanup. Units forgotten.
a numeric vector, sales at hot-dog stands. Units forgotten.
a numeric vector, sales at beer stands. Units forgotten.
The authors of the book.
data(E9.20) summary(E9.20) plot(E9.20)
data(E9.20) summary(E9.20) plot(E9.20)
The E9.21
data frame has 11 rows and 2 columns.
data(E9.21)
data(E9.21)
This data frame contains the following columns:
a numeric vector, units not given, probably years.
a numeric vector, units not given.
Diamond-stars Motors, Normal, Il. Gary Shultz, general Counsel, made this data available.
data(E9.21) summary(E9.21) plot(E9.21)
data(E9.21) summary(E9.21) plot(E9.21)
The E9.3
data frame has 50 rows and 3 columns.
Made by random sampling numbers.
data(E9.3)
data(E9.3)
This data frame contains the following columns:
a numeric vector, area of the rectangle.
a numeric vector, length of the rectangle.
a numeric vector, width of the ractangle.
data(E9.3) E9.3.m1 <- lm(y ~ x1 + x2, data=E9.3) attach(E9.3) plot(x1, resid(E9.3.m1)) plot(x2, resid(E9.3.m1)) detach(E9.3)
data(E9.3) E9.3.m1 <- lm(y ~ x1 + x2, data=E9.3) attach(E9.3) plot(x1, resid(E9.3.m1)) plot(x2, resid(E9.3.m1)) detach(E9.3)
The E9.8
data frame has 27 rows and 3 columns.
data(E9.8)
data(E9.8)
This data frame contains the following columns:
a numeric vector, Monthly rent in dollars.
a numeric vector, annual income in .
a numeric vector, household size.
Example 9.8 in Sen and Srivastava, page 201.
Selected by one of the authors from a much larger data set, collected from several sources about 20 years ago.
data(E9.8) attach(E9.8) E9.8.m1 <- lm(R ~ I + S, data=E9.8) summary(E9.8.m1) plot(I, resid(E9.8.m1, type="partial")[,"I"]) plot(S, resid(E9.8.m1, type="partial")[,"S"]) detach()
data(E9.8) attach(E9.8) E9.8.m1 <- lm(R ~ I + S, data=E9.8) summary(E9.8.m1) plot(I, resid(E9.8.m1, type="partial")[,"I"]) plot(S, resid(E9.8.m1, type="partial")[,"S"]) detach()
The Ec.8
data frame has 112 rows and 5 columns.
data(Ec.8)
data(Ec.8)
This data frame contains the following columns:
a character vector, containing country of origen of applicant.
a numeric vector, number of successful applications.
a numeric vector, number of denied applications.
a numeric vector, 1 if country is considered hostile to the U.S., 0 en other case.
a numeric vector, 1 if country is European or mainly inhabited by people of european descent.
Prof. Barbara Yarnold, Dept. of political science, Saginaw Valley State University, Saginaw, Michigan.
data(Ec.8) summary(Ec.8) attach(Ec.8) Ec.8.m1 <- glm(cbind(APR, DEN) ~ E + H, data=Ec.8, family=binomial) summary(Ec.8.m1) detach()
data(Ec.8) summary(Ec.8) attach(Ec.8) Ec.8.m1 <- glm(cbind(APR, DEN) ~ E + H, data=Ec.8, family=binomial) summary(Ec.8.m1) detach()
The Ex.7.7
data frame has 19 rows and 2 columns.
data(Ex.7.7)
data(Ex.7.7)
This data frame contains the following columns:
a numeric vector, U.S. population in thousands.
a numeric vector, year.
Sen and Srivastava.
##---- Should be DIRECTLY executable !! ---- data(Ex.7.7) with(Ex.7.7, plot(y ~ t)) summary(Ex.7.7)
##---- Should be DIRECTLY executable !! ---- data(Ex.7.7) with(Ex.7.7, plot(y ~ t)) summary(Ex.7.7)
The Ex4.4
data frame has 24 rows and 3 columns.
An experiment was conducted to examine the effects of air pollution
on interpersonal attraction. Twenty-four subjects were each placed
with a stranger for a 15-minute period in a room which was either
odor free or contaminated with ammonium sulfide. The stranger came
from a culture which was similar or dissimilar to that of the subject.
At the end of the encounter, each subject was asked to assess his degree
of attraction towards the stranger on a likert scale of 1–10 with
10 indicating strong attraction.
data(Ex4.4)
data(Ex4.4)
This data frame contains the following columns:
a numeric vector, attraction on a likert scale.
a factor with levels
Free
Odor
, room was contaminated or not.
a factor with levels
Dissimilar
Similar
, similar or dissimilar culture.
The full data set is given in Srivastava and Carter (1983).
data(Ex4.4) summary(Ex4.4) plot(Ex4.4)
data(Ex4.4) summary(Ex4.4) plot(Ex4.4)