DS by DB


Looking at the world through data

Practical Applications of Statistical Tests

If you want to learn data science, you’re going to have to learn statistics.


Attempts and Techniques to Detect and Eliminate Multicollinearity

In discussing large-scale datasets, we often talk about the difficulties of and rationales behind omitting certain columns and keeping others. One reason for eliminating a column is because it is merely a placeholder, such as an arbitrary ID number, and will certainly have no effect, linear or otherwise, on our target variable. Sometimes the data in one or more columns is so mangled, with so little hope of restoring it, that it is in everyone’s best interest if we put those columns out of our misery. Another particularly common reason for column elimination, especially when employing linear regression, is multicollinearity.