Level 230 Level 232
103 words 0 ignored
Ready to learn Ready to review
Check the boxes below to ignore/unignore words, then click save at the bottom. Ignored words will never appear in any learning session.
A study that asks questions of a sample drawn from some population in the hope of learning something about the entire population.
A systematic error that favors a particular segment of the population or that tends to encourage only certain outcomes in the data.
The best defense against bias; each individual is given a fair, random chance of selection.
Number of individuals in a sample represents the population.
Used to measure a variable for every unit of a population.
Numericlaly valued attribute of a model for a population.
A sample is said to be _______________________ if the stats computed from it accurately reflect the corresponding population parameters.
Simple Random Sample (SRS)
A sample in which each set of "n" elements in the population has an equal chance of selection.
Simple Random Sample
The list of possible subjects who could be selected in a sample.
The natural tendency of randomly drawn samples to differ, one from another.
stratified random sample
A sampling design in which the population is divided into several subpopulations, or strata, and random samples are then drawn from each stratum.
A sampling design in which entire groups are chosen at random.
Sampling schemes that combine several sampling methods.
A sample drawn by selecting individuals systematically from a sampling frame.
A small trial run of a survey to check whether questions are clear.
voluntary response bias
Bias introduced to a sample when individuals can choose on their own whether to participate in the sample.
Consists of the individuals who are conveniently available to sample.
A sampling scheme that biases the sample in a way that gives a part of the population less representation.
Bias introduced when a large fraction of those sampled fails to respond.
Anything in a survey design that influences response.
Outcomes occur at random if each outcome is equally likely to occur.
Models a real-world situation by using random-digit outcomes to mimic the uncertainty of a response variance of interest.
A component uses equally likely random digits to model simple random occurrences whose outcomes may not be equally likely.
Trial (Chapter 11)
The sequence of several componets representing events that we are pretending will take place.
We _______________ data by taking the logarithm, the square root, the reciprocal, or some other mathematical operation on all values of a variance.
ladder of powers
Places in order the effects that many re-expressions have on the data.
If σ₁ and σ₂ are the variances of Y₁ and Y₂, the correlation coefficient is:
is a graph used to determine whether there is a relationship between paired data. Scatter plots can show trends in data.
A variable other than x and y that simultaneously affects both variables, accounting for the correlation between the two.
An equation of formula that simplifies and represents reality.
An equation of a line. To interpret a linear model, we need to know the variables and their units.
The value of y^ found for a given x-value in the data. This is found by substituting the x-value in reg. equation.
The vertical deviation between the observations and the LSRL
Specifics the unique line that minimizes the variance of the residuals or, equivalently, the sum of the squared residuals.
Regression to the mean
Because correlation is always less than 1.0 in magnitude, each predicted y^ tends to be fewer standard deviation from its mean than its corresponding x was from its mean.
The intercept b (little o), gives a starting value in y-units. It's the y^ - value when x = 0.
Predicting x and y values by using data outside the original data set
Data points whose x-value are far from the man of x, are said to exert _____________________________ on a linear model.
A point that influences where the LSRL is located; if removed, it will significantly change the slope of the LSRL
2 events share no outcomes in common.
no two samples rate linked or related
What does it mean if sample data is independent?
What does it mean is sample data is dependent/related?
e.g. taken ten people, measure height and weight, each persons weight is related/connected to their height.
drawing conclusions about some population based on a sample.
What are descriptive statistics?
simply describing the sample/data, they don't prove anything!
When looking at descriptive statistics for a sample do you account for the population?
some value compared from the data,
some value associated with the population,
some measured quantity, e.g. "number of words recalled".
a particular outcome e.g. "19 words".
Define independent variable
the variable which you change.
Define the dependent variable
the variable which you are observing, the thing that changes because of your altered IV.
Define Nominal Data
Categorize subjects and count how many is in each category, e.g. how many old and how many young.
Define Ordinal Data
Categorize subjects into ranked categories (doesn't need to be same interval between each rank)
Define Interval Data
When data can be ordered with equal intervals but data can't fall below 0.
Define Ratio Data
The same as interval except you can go below 0, e.g. temperature.
What is a numerical summary (sample statistics)?
can give info such as the typical magnitude or the amount of variation in the data.
What are the measures of location (magnitude)?
mean - arithmetic average (mean = (∑x)/n)
What are the measure of spread?
raw range - highest to lowest
(∑(x²) - ((∑x)²)/n)/(n-1)
What is the formula for variance?
What is the formula for standard deviation?
What is a histogram?
range of values is divided into classes (usually equal width) and the no. observations in each class shown. The AREA not height of each bar is the no. observations.
What are scatter plots used for?
to show relationships between multiple variables.
Exploratory Data Analysis.
What does EDA stand for?
What is EDA?
looking carefully at the sample of data, after EDA you can do statistical inference - see what the sample tells us about the population.
What does EDA emphasise?
robust measures and plots.
What is robustness?
when measure are not overly affected by a few very extreme observations,
How do you calculate the mid-spread?
Upper hinge (Q₃) - the lower hinge (Q₁)
(median position + 1)/2
How do you calculate the lower hinge position?
How do you calculate the upper hinge position?
take the lower hinge position form the other end.
What must you include in a stem and leaf diagram?
- extends from lower to upper hinge
What are the aspects of a box plot?
How do you calculate the inner fence?
1.5 × box length, above/below appropriate hinge.
How do you see Skew on a box plot?
- if the median line is in the middle it's symmetric, no skew
as many as you want, side-by-side comparison.
How many samples can you compare through box plots?
Confirmatory Data Analysis.
What does CDA stand for?
a numerical description of how likely it is that a particular outcome will occur,
Define random variable
a numerical outcome of an experiment.
What does probability distribution specify?
the probability outcome of a random variable.
if there are only a few possible values it can take.
if there are very many possible values it can take.
What is normal distribution?
the most important continuous probability distribution,
mean and standard deviation.
What are the parameters for normal distribution (parametric data)?
What is the standard normal distribution?
has a mean 0 and an SD 1.
What does correlation measure?
the strength of the linear relationship between two variables,
What is the first thing you do with paired data?
draw a scatterplot and see if there is a clear pattern, e.g. do the points lie along a straight line?
What are the main measures of correlation?
- Spearman's Rank Correlation (p, rho or rs) -> ordinal (rank) data
-1 and +1
What are correlation values ALWAYS within?
When is Spearman's rho most appropriate?
When the data only consists of two (paired) sets of ranks.
Pearson-Heartly product moment correlation coefficient.
What is the full name of Pearson's coefficient?
no it is scale independent.
Does correlation have a scale? Is it affected by scale?
a linear relationship.
What kind of relationship is appropriate for?
r = (covariance lk)/SDι × SDκ
What is the formal for Pearson's Coefficient for two variables?
(∑xy - (1/n)(∑x)(∑y))/√[∑x² - (1/n)(∑x)²][∑y² - (1/n)(∑y)²]
What is the more general formula for Pearson's Coefficient?
skewed data, non-parametric.
On what type of data do you use a Spearman's correlation?
normally distributed data, parametric.
On what type of data do you use a Pearson's correlation?
Y = a + bX
How do you represent a straight line?
What is linear regression?
finding the line of best fit to a set of points,
What is the most common method for linear regression?
least squares, minimises the sum of the squared vertical elevations of the points from the line.
y (hat) = a + bX
For any given x value we can use the fitted line to estimate/predict a y value (y hat), what is the formula for this?
a bivariate linear regression.
As there are only two variables what is this sometimes called?
a multivariate regression.
What is it called is there are more than two variables?
- decide on a critica value, c
How do you test a null hypothesis, H₀?