What Is Regression In Statistics?


Regression is a statistical data management technique that is commonly used in the financial sector. Its applications vary from. The purpose of using regression is determining the nature of the relationship between certain independent variables and one dependent variable. It is a very important factor in determining the dynamic relationships between variables. Statistical regression can be categorised into 5 major types. These are linear, Logistic, Ridge, Lasso and polynomial regressions. 

Basically regression can be linear or multiple linear regression. There are non linear regressions but they are certainly complex in nature and used for determining complex relationships between multiple variables. It generally helps in making future predictions based on various economic data and reports such as GDP growth, economic surveys, sales or prevalent conditions in the economy. It is not necessarily a tool for persons from finance background and can also be used in other professions too. This article gives a basic idea of what is regression in statistics.

Before moving to the major types of regression it is important to know the basic terminologies of the regression methodology. Getting familiar with the terminology makes it easier for laymen to understand the basic concept of regression and the various types of regressions.  Terminologies associated with regression are not very complicated and could be easily understood. Key terminologies used in regression:

  • Multicollinearity – It is the multiplicity of similar variables. Variables that are highly similar to each other make the data analysis process difficult. It obstructs the variable ranking based on importance. Usually it is a precondition to regression that there should not be any Multicollinearity.
  • Outliers – Outliers is the terminology that is used to denote the piece of data is overwhelmingly unique from the general data. It can be understood as the odd one out from the crowd, or the outlier. 
  • Heteroscedasticity – It is the invariability between the independent and target variable.
  • Overfit and underfit – Overfitting or high variance relates to the unwanted explanation of variables. It is an indicator that the algorithm only works well on the training and not on test sets. On the contrary to this when the algorithm is unable to run even in the training set it is said to be underfitting.

After dealing with the basic terminologies next are the different types of regressions. The first and the simplest of all forms of regressions is linear regression.

  1. Linear regression – In linear regression there are basically two variables, one whose value is to be determined and other whose value is integral in determining the value of the other variables. The one whose value is to be determined is the dependent variable and the other variable who helps in determining the value is the independent variable. 

An example of linear regression.

Linear regression is divided into two parts:

  • Simple linear regression – This method uses two quantitative variables that are used for performing the predictive operations. One of them is an independent variable and the other is the dependent variable is the one whose value is to be determined.
  • Multiple linear regressions – On contrary to simple linear regression, the multiple linear regression uses multiple variables to determine the value of the targeted variable.
  1. Logistic regression – This method of statistical data analysis determines the dependent variable based on a predefined set of observations. It is like determining a dependent variable based on multiple independent variables. Logistic regressions are Binary, Multinomial and Ordinal logistic regressions.  

An example of logistic regression.

  1. Ridge regression– Ridge regression is specifically used for analysing multicollinear. This data consists of highly similar data sets that are generally difficult to work upon using the regular methods of regressions. Using conventional data analysing methods on multicollinear data produces unrealistic results. 

Ridge regression example.

  1. Lasso regression – It is a regularisation method. Lasso regression method is used to produce more accurate predictions using shrinkage. In this type of regression the data sets are shrunk to the average value and then processed. Following this procedure the regression becomes simpler.

An example of lasso regression.

  1. Polynomial regression – It can also be considered as a special case for multiple linear regression. Apart from the two independent and dependent variables there is a higher order polynomial used. Using this polynomial makes the data and regression concurrent.

An example of polynomial regression.


       Regression is a very useful method in data analysis. It can be used to determine the relevance of various factors on a dependent variable.  Based on these observations it becomes easier to prioritise factors according to their respective impact. One can also decide on excluding certain factors if they are irrelevant, based on regression results. by reading this article one must get a general idea of what is regression in statistics.


Please enter your comment!
Please enter your name here