Multiple Linear Regression
Contact: Haralambos Sarimveis
|Multiple Linear Regrssion|
|Input:||Instances, feature vectors, real-numbered target values|
|Input format:||Dependent on implementation, e.g., Weka's ARFF format|
|Output format:||Dependent on implementation, e.g., Weka: plain text; binary models|
|Reporting information:||Apart from the model coefficients, several other statistical results are reported by the MLR method concerning the training data: coefficient of determination, adjusted coefficient of determination, F-statistic, t-statistic for each individual independent variable, confidence intervals, residuals and variance inflation factor.|
MLR (Multiple Linear Regression) is a simple and popular statistical technique that uses several explanatory
(independent) variables to predict the outcome of a response (dependent) variable. The model creates a
relationship in the form of a straight line (linear) that best approximates all the individual data points.
Background (publication date, popularity/level of familiarity, rationale of approach, further comments)
Multiple linear regression (MLR) is the most widely used mathematical technique in QSAR analysis.
Bias (instance-selection bias, feature-selection bias, combined instance-selection/feature-selection bias, independence assumptions?, ...)
The error is assumed to be a random variable with a mean of zero conditional on the explanatory variables.
The independent variables are error-free.
The predictors must be linearly independent, i.e. it must not be possible to express any predictor as a linear combination of the others.
The errors are uncorrelated, that is, the variance-covariance matrix of the errors is diagonal and each non-zero element is the variance of the error.
Lazy learning/eager learning
Interpretability of models (black box model?, ...)
Good (linear model, i.e., produces a simple linear weighting of given features), If the variables are standardized to have mean of zero and standard deviation of one, then the regression coefficients (beta coefficients). Allow the comparison of the relative contribution of each independent variable in the prediction of the dependent variable.
Type of Descriptor:
Programming language(s): Java
Operating system(s): Linux, Win, Mac OS
Input format: Dependent on implementation, e.g., Weka's ARFF format
Output format: Dependent on implementation, e.g., Weka: plain text; binary models