Regression
Regression is a statistical method used to model the quantitative relationship between one or more explanatory variables (predictors or regressors) and a target variable (dependent variable or regressand). When there is more than one explanatory variable, the method is referred to as multiple regression.
Linear Regression
Linear regression is the simplest form of regression. It models linear relationships such as, "For every degree Celsius increase in the daily maximum temperature (explanatory variable), the number of ice cream sales per day (target variable) increases by a fixed amount." A linear regression estimates a suitable coefficient (factor) for each explanatory variable, ensuring that the combined effect best describes the target variable. Graphically, the mechanism of simple linear regression (i.e., involving only one explanatory variable) can be illustrated as follows: When data points are plotted on a coordinate system (x-axis: explanatory variable; y-axis: target variable), the goal is to find a line that best approximates these data points.
Collinearity
(Stochastic) Collinearity refers to a situation where one explanatory variable is highly correlated with another explanatory variable. Collinearity is a common issue in regression models. When two variables are strongly correlated, it becomes difficult from a data or model perspective to determine which one truly influences the dependent variable. One of them may be redundant, or both might be relevant in an appropriate weighting based on logical reasoning. Collinearity leads to unstable estimates of the model coefficients in regression and generally complicates the interpretation of the model. If an explanatory variable is correlated not just with one but with several other explanatory variables, this is referred to as multicollinearity.
Regularized Regression
Regularized regressions are special forms of regressions in which model complexity is penalized with the aim of generating a model that is as robust and generalizable as possible and avoiding overfitting.
To account for model complexity when estimating the model, not only the deviations of the model from the actual data are considered, but also the magnitudes of the model coefficients are examined and controlled effectively. Examples of regularized regressions include Ridge Regression, Lasso Regression, and Elastic Nets.
- In Ridge Regression, in addition to the squared model errors, the squared coefficients are also included in the cost function for estimating the model.
- In Lasso Regression, the absolute values of the coefficients are considered at this point, rather than the squared values.
- Elastic Net combines both types of regularization. Both Ridge and Lasso Regression are special cases of Elastic Net.