77 words to learn

Ready to learn       Ready to review

Ignore words

Check the boxes below to ignore/unignore words, then click save at the bottom. Ignored words will never appear in any learning session.

All None

Ignore?
Population
the collection of all possible outcomes of interest.
Population Parameter
an unknown fixed value that describes the population.
Experiment
any procedure that can (at least in theory) be infinitely repeated and has a well-defined set of outcomes or values.
Random variable
any variable whose value is unknown until it is observed. It has a well-defined set of possible values (or outcomes) it can take. Can only think of it in terms of probabilities.
Discrete RV
Takes on only a finite number of values.
Continuous RV
can take any one of a continuous range of values.
Probability Density function
The PDF for a discrete RV X indicates the probability of each possible value occurring. The probability that the RV X takes on the value x: f(x)=P(X=x) for all values of x. we use the PDF of a continuous RV only to compute the probability of the RV lying within a given range of values. Areas under the PDF represent probabilities that X falls in an interval.
Cumulative Density Function
The CDF of a RV X, denoted F(x), gives the probability that X is less than or equal to a specific value x: F(x)=P(X≤x).
Joint PDF
denoted by f(x,y), shows the probability that X takes on the value x and Y takes on the value y: f(x,y)=P(X=x and Y=y ) for all values of x and y.
Marginal PDF of X
The probability of X without reference to the values of Y.
Conditional PDF
, denoted by f(y|x), is the probability that Y takes on the value y, when the value of X is fixed at x: f(y|x)=P(Y=y|X=x).
Expected value
a measure of the average value or location of the PDF of X. It is denoted by µX or simply µ when the RV is understood. It is defined as the weighted average value of X in an infinite number of repetitions of the underlying experiment.
Variation
a measure of dispersion around the population mean. It is denoted by σ 2 X or simply σ 2 when the RV is understood. It is defined as weighted average of the squares of the possible deviations of X from its population mean. Fixed value usually unknown.
Covariance
a measure of association between two RVs X and Y. Denoted σXY , measures the extent to which they move together. It provides an indication of the direction of the linear relationship between 2 RVs, X and Y. Unknown and fixed, doesn't measure strength
Correlation Coefficient
denoted by ρXY or simply ρ, measures the strength of the linear relationship between X and Y, with the measure being limited to the range [-1,1]. This is also a fixed and unknown parameter.
Conditional Expectation
E(Y|x), is the expected or weighted average value of Y given that X takes on an specific value x. Linear or non-linear.
Conditional variance
the variance associated with the conditional PDF of Y given X.
Statistically independent
if knowing the outcome of X does not change the probabilities of the possible outcomes of Y and vice versa. This means that the conditional PDF of Y given X is the same as the unconditional (or marginal) PDF of Y alone and vice versa. This implies that the joint PDF X and Y is equal to the product of their marginal probabilities.
Normal Distribution
This PDF has the following characteristics: Bell-shaped Symmetrical Asymptotic (tails get closer to horizontal-axis but never touch it) Areas below the curve define probabilities Total area under the curve is 1
Standard normal distribution
a normal distribution with mean equal to zero and variance equal to 1. This means that all normal probabilities can be obtained from the single table of probabilities of the standard normal distribution.
Standardised random variable
a RV transformed by subtracting off its expected value and dividing the result by its standard deviation. That is, by applying the Z score formula
Random sample
a sample that is selected by sampling randomly from the population. A random sampling technique implies that: i) for each element in the sample, each member of the population is equally likely to be selected and ii) the elements are selected independently.
Independent and identically distributed RVs
if a sample of n observations (Y1, Y2,…, Yn ) is randomly selected from a population Y with PDF f(y,θ) , then all the sample elements (Y1, Y2,…, Yn ) have the same PDF as that of Y: f(y,θ).
Sample statistic
Any statistic that we calculate from a sample
Sampling distribution
the PDF of a sample statistic
sample mean
Average value in a sample: an unbiased and consistent estimator of the population mean.
Sample variance
an unbiased and consistent estimator of the population variance
estimator
A sample statistic which is constructed to provide information about an unknown population parameter. a rule or procedure that we use to calculate an estimate of the unknown population parameter..
Estimate
the numerical value taken on by an estimator for a particular sample of data.
Unbiased estimator
an estimator whose expected value is equal to the population parameter
A minimum variance unbiased estimator
an unbiased estimator with the smallest variance amongst all unbiased estimators.
Asymptotically unbiased
as the sample size (n) increases without bound, the sequence of the estimator’s mean converges to the population parameter.
Consistent estimator
an estimator that converges in probability to the population parameter as the sample size increases without bound.
Point Estimate
a single value is used to estimate the unknown population parameter
Confidence interval
h a range of values is used to estimate an unknown population parameter.
Hypothesis testing
the procedure of testing whether a particular hypothetical value of the population parameter is supported by the observed data
Simple linear regression model
Examines the relationship between two variables x and y
Population regression function
The relationship between Y and X that holds on average across different households in the population
error term (e)
The difference between actual measure of a hypothetical unit i from the population and the PRF
Systematic component
B1+B2xi (takes you up to PRF)
Random component
ei from PRF to actual
Homoskedasticity
Conditional probability of y for each value of x all have the same variance
Ordinary least squares estimator
Create line of best fit from sample data. Minimises the average square difference between actual Y and predictions from estimated line
sample regression function
estimate of PRF depending on particular sample of data
Residual
e hat- Different between actual y and fitted value y hat
Y level X level
A one “unit” increase in X will increase Y by b2 “units”.
Y log X log
A 1% increase in X will increase Y by b2%.
Y log X level
A one “unit” increase in X will increase Y by 100*b2%.
Y level X log
A 1% increase in X will increase Y by b2/100 “units”.
Gauss-Markov Theorem
under the certain assumptions of the linear regression model: 11.3. The Gauss-Markov Theorem Topic 11. Statistical properties of the OLS estimators, the OLS estimators are the BEST.
Central Limit Theorem
as the sample size gets large, the sampling distribution of the estimators can be approximated by the normal distribution even if the population itself is not normal.
Point Estimate
a single value is used to estimate an unknown population parameter
Interval Estimate
a range of values is used to estimate an unknown population parameter. This range of values is probably where the population parameter lies.
single slope coefficient
measures the marginal effect on Y of a unit change in one variable (X) holding the value of the others constant.
perfect collinearity
An exact linear relationship between Xs
multicollinearity
a high degree of linear relationship between X1 and X2 can lead to large variances for the OLS slope estimators.
VIF
The factor by which the variance of the estimator is higher because it's X is correlaed with other explanatory variables
Coefficient of determination (R2)
the proportion of variation in the dependent variable explained by all the explanatory variables included in the linear model.
adjusted R2
attempts to compensate for this automatic upward shift by imposing a penalty for increasing the number of explanatory variables
Qualitative factors
factors not measurable in numerical terms
Dummy variables
also called indicator, binary or dichotomous variables, because they take just two values, usually one or zero.
Intercept dummy variable
D is introduced as an explanatory variable on its own.
Slope dummy variable
D is introduced as an explanatory variable interacted with the relevant X variable.
F-test
assesses whether the reduction in RSS from including extra variables is sufficiently large to be significant
Chow test
an F-test for the equivalence of two regressions.
Marginal effect
Differentiating the equation with respect to X, one obtains the change in Y per unit change in X
Interaction variable
The product of two variables
Ramsey's RESET test
provides a simple indicator of evidence of nonlinearities
Internally valid
if the inferences about the causal effects are valid for the population being studied.
Externally valid
if the inferences about the causal effects can be generalized from the population and the setting studied to other populations and settings
heteroscedastocity
the dispersion of values Y about their mean change with the level of income, X.
Robust standard errors
help us to avoid computing incorrect confidence intervals or incorrect test statistics for hypothesis testing in the presence of heteroscedasticity
Endogeneity
When the Xs are correlated with e we say that the Xs are endogenous
Omitted variable bias
arises when a variable that affects Y and is correlated with one or more explanatory variables is omitted.
Simultaneous causality bias
arises when in addition to the causal link of interest form X to Y, there is a causal link from Y to X.
Misspecification of the functional form
arises when we fail to include relevant non-linear terms in our estimated equation.
Bias due to measurement errors
when an independent variable is measured imprecisely.