Level 226 Level 228
80 words 0 ignored
Ready to learn Ready to review
Check the boxes below to ignore/unignore words, then click save at the bottom. Ignored words will never appear in any learning session.
The science of collecting, organizing, analyzing, and interpreting data in order to make decisions
Types of Observational Studies
Simulation, survey, census, experiment, sampling
The methods of organizing and summarizing data
Involves making generalizations from a sample to a population
The entire collection of individuals or objects about which information is desired
A subset of the population, selected for study in some prescribed manner
an alphabetic character representing a number, called the value, which is either arbitrary or not fully specified or unknown. It is usually a letter like x or y.
A collection of information gathered for a purpose. Data may be in the form of either words or numbers.
Categorical (qualitative) Data
Numerical (quantitative) Data
Counting (discrete) and measuring (continuous)
A list-able set of values; usually counts of items
Data can take on any values in the domain of the variable; usually measurements of something
Numerical and discrete
What type of data is the income of adults in your city?
What type of data is the color of M&M's?
Numerical and continuous
What type of data is the birth weights of female babies born in a particular hospital?
What graphs are appropriate for categorical data?
Bars do not touch; categorical data is typically on the horizontal axis; to describe: comment on which occurred the most often or least often
Measures of Center
Mean, median, mode
the sum of all the values divided by the number of values
A segment or Ray that joins a vertex to the midpoint of the opposite side
the most common value
Sample mean (average)
Population mean (average)
Not significantly affected by extreme values
Is either the mean or the median resistant?
The median in resistant; the mean is not resistant
Measures of Spread (Variation)
Range, interquartile range (IQR), variance, and standard deviation (σ)
Outliers, gaps, clusters
What do unusual features include?
Mean is bigger than the median
Is the mean or median bigger in right-skewed data?
Median is bigger than the mean
Is the mean or median bigger in left-skewed data?
Number that describes a population
A number that describes a sample
When should you report the mean and standard deviation?
When the graph is symmetrical and there are no outliers in the data
When should you report the median and IQR?
When the graph is skewed right/left OR the data has outliers
σ^2: average of the squared deviations
Population Standard Deviation
The square root of variance
Measure of the average distance from the mean
Is the IQR resistant?
What happens to a data set if we add a number, x?
The measures of center are increased by x while the measures of spread are not changed
Can be created by smoothing histograms; ALWAYS on or above the horizontal axis; has an area of exactly ONE
Bell-shaped, symmetrical curve; as the standard deviation increases, the curve flattens and spreads
Approximately 68% of the data are within 1 σ of the mean; approximately 95% of the data are within 2 σ of the mean
When can the empirical rule be used?
Only when the graph is a normal curve
Center, shape, spread, unusual features, context
What should you use to describe a box plot?
Continuous Random Variables
A variable that may take on an infinite number of potential outcomes.
Standard Normal Density Curves
Do not show actual values; written in terms of z scores; always has a mean of 0 and a SD of 1
The units found in the problem
What units are always used for SD?
Will find the probability of an area from the lower bound to the upper bound
Will find the z-score for probability
Normal PDF (X)
Standard normal curve
What type of graph should be used for bivariate, numerical data?
Sample Correlation Coefficient (r)
A quantitative assessment (measurement) of the STRENGTH & DIRECTION of the LINEAR relationship between bivariate, quantitative data
Properties of r
Legitimate values include: [-1,1]; 0 implies no correlation
What does r tell you?
It is a measure of the extent to which x & y are linearly related
Least Squares Regression Line (LSRL)
Line of best fit; minimizes the sum of the squares of the deviations from the line
Independent or explanatory variable
Dependent or response variable
What is the interpretation for slope?
For each unit increase in x, we predict an approximate mean increase/decrease of b in y
Predicting x and y values by using data outside the original data set
The LSRL and r and both non-resistant
Are the correlation coefficient (r) and the LSRL resistant or non-resistant?
The vertical deviation between the observations and the LSRL
The sum is always zero
What is the sum of the residuals for the LSRL?
Observed - expected (y-y^)
How do you find the residual?
A scatterplot of the (x, residual) pairs
Can residuals be graphed against other statistics besides x?
What is the purpose of a residual plot?
To tell if a linear association exists between the x & y variables
What happens if no pattern exists in the residual plot?
It is called random scatter and the relationship is linear
Counting from the decimal point to the left makes the exponent ___________________________________
Counting from the decimal point to the right makes the exponent ___________________________________________
Difference of values; spread
Measures of spread
Coefficient of Determination (r^2)
Remains the same no matter which variable is labled x; just because we know r^2 doesn't mean we know the sign of r
Interpretation of r^2
Approximately r^2% of the variation in y can be explained by the linear relationship of x & y
a data value that is either much greater or much less than the median
A point that influences where the LSRL is located; if removed, it will significantly change the slope of the LSRL
Is the coefficient of determination resistant?