Level 226
Level 228

#### 80 words 0 ignored

Ready to learn
Ready to review

## Ignore words

Check the boxes below to ignore/unignore words, then click save at the bottom. Ignored words will never appear in any learning session.

**Ignore?**

statistics

The science of collecting, organizing, analyzing, and interpreting data in order to make decisions

Types of Observational Studies

Simulation, survey, census, experiment, sampling

Descriptive Statistics

The methods of organizing and summarizing data

Inferential Statistics

Involves making generalizations from a sample to a population

Population (N)

The entire collection of individuals or objects about which information is desired

Sample (n)

A subset of the population, selected for study in some prescribed manner

Variable

an alphabetic character representing a number, called the value, which is either arbitrary or not fully specified or unknown. It is usually a letter like x or y.

data

A collection of information gathered for a purpose. Data may be in the form of either words or numbers.

Not counting/measuring

Categorical (qualitative) Data

Numerical (quantitative) Data

Counting (discrete) and measuring (continuous)

Discrete Data

A list-able set of values; usually counts of items

Continuous Data

Data can take on any values in the domain of the variable; usually measurements of something

Numerical and discrete

What type of data is the income of adults in your city?

Categorical

What type of data is the color of M&M's?

Numerical and continuous

What type of data is the birth weights of female babies born in a particular hospital?

Bar graphs

What graphs are appropriate for categorical data?

Bar graph

Bars do not touch; categorical data is typically on the horizontal axis; to describe: comment on which occurred the most often or least often

Measures of Center

Mean, median, mode

Mean

the sum of all the values divided by the number of values

Median

A segment or Ray that joins a vertex to the midpoint of the opposite side

Mode

the most common value

x̄

Sample mean (average)

μ

Population mean (average)

Resistant

Not significantly affected by extreme values

Is either the mean or the median resistant?

The median in resistant; the mean is not resistant

Measures of Spread (Variation)

Range, interquartile range (IQR), variance, and standard deviation (σ)

Outliers, gaps, clusters

What do unusual features include?

Mean is bigger than the median

Is the mean or median bigger in right-skewed data?

Median is bigger than the mean

Is the mean or median bigger in left-skewed data?

parameter

Number that describes a population

p

Population proportion

σ^2

Population variance

Statistic

A number that describes a sample

When should you report the mean and standard deviation?

When the graph is symmetrical and there are no outliers in the data

When should you report the median and IQR?

When the graph is skewed right/left OR the data has outliers

Population Variance

σ^2: average of the squared deviations

Population Standard Deviation

The square root of variance

Standard Deviation

Measure of the average distance from the mean

Yes

Is the IQR resistant?

What happens to a data set if we add a number, x?

The measures of center are increased by x while the measures of spread are not changed

Density

Can be created by smoothing histograms; ALWAYS on or above the horizontal axis; has an area of exactly ONE

Z score

Standardized score

Normal Curve

Bell-shaped, symmetrical curve; as the standard deviation increases, the curve flattens and spreads

Empirical Rule

Approximately 68% of the data are within 1 σ of the mean; approximately 95% of the data are within 2 σ of the mean

When can the empirical rule be used?

Only when the graph is a normal curve

Center, shape, spread, unusual features, context

What should you use to describe a box plot?

Continuous Random Variables

A variable that may take on an infinite number of potential outcomes.

Uniform Distribution

f(x) =

Standard Normal Density Curves

Do not show actual values; written in terms of z scores; always has a mean of 0 and a SD of 1

The units found in the problem

What units are always used for SD?

Normal PDF

Graphing only

Normal CDF

Will find the probability of an area from the lower bound to the upper bound

InvNormal

Will find the z-score for probability

Normal PDF (X)

Standard normal curve

A scatterplot

What type of graph should be used for bivariate, numerical data?

Sample Correlation Coefficient (r)

A quantitative assessment (measurement) of the STRENGTH & DIRECTION of the LINEAR relationship between bivariate, quantitative data

Properties of r

Legitimate values include: [-1,1]; 0 implies no correlation

What does r tell you?

It is a measure of the extent to which x & y are linearly related

Least Squares Regression Line (LSRL)

Line of best fit; minimizes the sum of the squares of the deviations from the line

X Variable

Independent or explanatory variable

Y Variable

Dependent or response variable

What is the interpretation for slope?

For each unit increase in x, we predict an approximate mean increase/decrease of b in y

Extrapolation

Predicting x and y values by using data outside the original data set

The LSRL and r and both non-resistant

Are the correlation coefficient (r) and the LSRL resistant or non-resistant?

Residuals

The vertical deviation between the observations and the LSRL

The sum is always zero

What is the sum of the residuals for the LSRL?

Observed - expected (y-y^)

How do you find the residual?

Residual Plot

A scatterplot of the (x, residual) pairs

Yes

Can residuals be graphed against other statistics besides x?

What is the purpose of a residual plot?

To tell if a linear association exists between the x & y variables

What happens if no pattern exists in the residual plot?

It is called random scatter and the relationship is linear

Negative

Counting from the decimal point to the left makes the exponent ___________________________________

Positive

Counting from the decimal point to the right makes the exponent ___________________________________________

Variation

Difference of values; spread

Variance

Measures of spread

Coefficient of Determination (r^2)

Remains the same no matter which variable is labled x; just because we know r^2 doesn't mean we know the sign of r

Interpretation of r^2

Approximately r^2% of the variation in y can be explained by the linear relationship of x & y

outlier

a data value that is either much greater or much less than the median

Influential Point

A point that influences where the LSRL is located; if removed, it will significantly change the slope of the LSRL

No

Is the coefficient of determination resistant?