# Econometrics

Econometrics is the application of mathematics and statistical methods to economic data and described as the branch of economics that aims to give empirical content to economic relations. [1] More precisely, it is "the quantitative analysis of actual economic phenomena based on the concurrent development of theory and observation, related by appropriate methods of inference."[2] An influential introductory economics textbook describes econometrics as allowing economists "to sift through mountains of data to extract simple relationships."[3] The first known use of the term "econometrics" (in cognate form) was by Pawel Ciompa in 1910. Ragnar Frisch is credited with coining the term in the sense that it is used today.[4] Econometrics is the unification of economics, mathematics, and statistics. This unification produces more than the sum of its parts.[5] Econometrics adds empirical content to economic theory allowing theories to be tested and used for forecasting and policy evaluation. Basic econometric models: linear regression The basic tool for econometrics is the linear regression model. In modern econometrics, other statistical tools are frequently used, but linear regression is still the most frequently used starting point for an analysis.[7] Estimating a linear regression on two variables can be visualized as fitting a line through data points representing paired values of the independent and dependent variables. Okun's law representing the relationship between GDP growth and the unemployment rate. The fitted line is found using regression analysis. For example, consider Okun's law, which relates GDP growth to the unemployment rate. This relationship is represented in a linear regression where the change in unemployment rate () is a function of an intercept (), a given value of GNP growth multiplied by a slope coefficient and an error term, : The unknown parameters and can be estimated. Here is estimated to be -1.77 and is estimated to be 0.83. This means that if GNP grew one point faster, the unemployment rate would be predicted to drop by .94 points (-1.77*1+0.83). The model could then be tested for statistical significance as to whether an increase in growth is associated with a decrease in the unemployment, as hypothesized. If the estimate of were not significantly different from 0, we would fail to find evidence that changes in the growth rate and unemployment rate were related.

Theory See also: Estimation theory Econometric theory uses statistical theory to evaluate and develop econometric methods. Econometricians try to find estimators that have desirable statistical properties including unbiasedness, efficiency, and consistency. An estimator is unbiased if its expected value is the true value of the parameter; It is consistent if it converges to the true value as sample size gets larger, and it is efficient if the estimator has lower standard error than other unbiased estimators for a given sample size. Ordinary least squares (OLS) is often used for estimation since it provides the BLUE or "best linear unbiased estimator" (where "best" means most efficient, unbiased estimator) given the Gauss-Markov assumptions. When these assumptions are violated or other statistical properties are desired, other estimation techniques such as maximum likelihood estimation, generalized method of moments, or generalized least squares are used. Estimators that incorporate prior beliefs are advocated by those who favor Bayesian statistics over traditional, classical or "frequentist" approaches. Gauss-Markov theorem The Gauss-Markov theorem shows that the OLS estimator is the best (minimum variance), unbiased estimator assuming the model is linear, the expected value of the error term is zero, errors are homoskedastic and not autocorrelated, and there is no perfect multicollinearity. Linearity The dependent variable is assumed to be a linear function of the variables specified in the model. The specification must be linear in its parameters. This does not mean that there must be a linear relationship between the independent and dependent variables. The independent variables can take non-linear forms as long as the parameters are linear. The equation qualifies as linear while , does not. Data transformations can be used to convert an equation into a linear form. For example, the Cobb-Douglas equation—often used in economics—is nonlinear: But it can be expressed in linear form by taking the natural logarithm of both sides:[8] This assumption also covers specification issues: assuming that the proper functional form has been selected and there are no omitted variables. Expected error is zero The expected value of the error term is assumed to be zero. This assumption can be violated if the measurement of the dependent variable is consistently positive or negative. The miss-measurement will bias the estimation of the intercept parameter, but the slope parameters will remain unbiased.[9] The intercept may also be biased if there is a logarithmic transformation. See the Cobb-Douglas equation above. The multiplicative error term will not have a mean of 0, so this assumption will be violated.[10] This assumption can also be violated in limited dependent variable models. In such cases, both the intercept and slope parameters may be biased.[11] Spherical errors Error terms are assumed to be spherical otherwise the OLS estimator is inefficient. The OLS estimator remains unbiased, however. Spherical errors occur when errors have both uniform variance (homoscedasticity) and are uncorrelated with each other.[12] The term "spherical errors" will describe the multivariate normal distribution: if in the multivariate normal density, then the equation f(x)=c is the formula for a “ball” centered at ? with radius ? in n-dimensional space.[13] Heteroskedacity occurs when the amount of error is correlated with an independent variable. For example, in a regression on food expenditure and income, the error is correlated with income. Low income people generally spend a similar amount on food, while high income people may spend a very large amount or as little as low income people spend. Heteroskedacity can also be caused by changes in measurement practices. For example, as statistical offices improve their data, measurement error decreases, so the error term declines over time. This assumption is violated when there is autocorrelation. Autocorrelation can be visualized on a data plot when a given observation is more likely to lie above a fitted line if adjacent observations also lie above the fitted regression line. Autocorrelation is common in time series data where a data series may experience "inertia."[14] If a dependent variable takes a while to fully absorb a shock. Spatial autocorrelation can also occur geographic areas are likely to have similar errors. Autocorrelation may be the result of misspecification such as choosing the wrong functional form. In these cases, correcting the specification is the preferred way to deal with autocorrelation. In the presence of non-spherical errors, the generalized least squares estimator can be shown to be BLUE.[15] Exogeneity of independent variables This assumption is violated if the variables are endogenous. Endogeneity can be the result of simultaneity, where causality flows back and forth between both the dependent and independent variable. Instrumental variable techniques are commonly used to address this problem. Full rank The sample data matrix must have full rank or OLS cannot be estimated. There must be at least one observation for every parameter being estimated and the data cannot have perfect multicollinearity.[16] Perfect multicollinearity will occur in a "dummy variable trap" when a base dummy variable is not omitted resulting in perfect correlation between the dummy variables and the constant term. Multicollinearity (as long as it is not "perfect") can be present resulting in a less efficient, but still unbiased estimate.