What are the advantages of the Bayesian approach?

Kim Larson at Stich Fix Technology has written the following regarding the Bayes approach:

The frequentist paradigm enjoys the most widespread acceptance for statistical analysis. Frequentist concepts such as confidence intervals and p-values dominate introductory statistics curriculums from science departments to business schools, and frequentist methods are still the go-to tools for most practitioners. Although most would benefits from integrating Bayes into their methdology.

Two Commonly Encountered Problems in Standard Regression Analysis

Most of us have heard of multicollinearity (correlated regressors/drivers) and overfitting (e.g., having too many parameters compared to observations). Multicollinearity can cause inflated coefficient standard errors, nonsensical coefficients, and a singular sample covariance matrix. Overfitting leads to models that describe random error noise instead of real relationships. Simply put, both issues can lead to models that do not make sense.

So how do we deal with this in a classical (non-regularized) frequentist model? We can try techniques such as principal component analysis, change time lags (for times series) or even remove business drivers that have non-intuitive coefficients. But such strategies offer little chance of getting an interpretable and robust model if the input data is thin and/or plagued by collinearity. We need to use a more forceful tool - we need to take control of the assumptions of our model.

Bayesian Regression and Regularization: Closely Related Cousins

Like regularization techniques, Bayesian regression offers a potent (but not surefire) weapon against overfitting and multicollinearity. By influencing our model with outside information, we can ensure that the model stays within reason.

For example, let’s say that I am building a regression model to analyze the impact of pricing changes across time. But history is limited, price is correlated with other drivers, and the resulting model is counterintuitive. We don’t have another year to wait for the data to mature, but we do have an outside study which suggests that the average industry price elasticity is −0.8. In a classical frequentist model, we would have no way of combining these two pieces of information to ensure a useful model, but in a Bayesian model we do.

Now let’s consider the use case where we don’t have a prior, but just want to make sure the model does not go crazy by reducing the prior variance (shrinking). Tightening the variance increases the diagonal of the A matrix, which in turn increases the diagonal of the X′X matrix during estimation (see formula for the posterior above). This makes our model less susceptible to being plagued by variance inflation and singularity of the sample covariance matrix. This is the same idea employed by the regularization techniques described earlier.

A third use case is the Hierarchical Bayes model where we leverage the hierarchical structure of a dataset (if present) to create a prior structure, and the amount of shrinkage induced is informed by the data. This technique is particularly powerful when estimating effects at a granular level, e.g., at the individual level, because the model allows the slope estimate for an individual to "borrow" information from similar individuals.

Last Word

Bayesian statistics offers a framework to handle uncertainty that is, based on a more intuitive mental model than the frequentist paradigm. Moreover, Bayesian regression shares many features with its powerful regularization cousins while also providing a principled approach to explicitly express prior beliefs.

One criticism of Bayesian methods is that priors can be subjective assumptions. This is true, but it is hard to avoid assumptions in statistical analysis. For example, frequentist methods are typically based on several assumptions such as normally distributed errors, independent observations and asymptotic convergence across infinite repetitions. These can be hard to verify, and violations can impact results in a non-trivial manner. The good news is that the Bayesian paradigm is transparent due to its simplicity. We have Bayes theorem, some priors, and that’s it. You may make the wrong assumption, but it is stated loud and clear and you can work from there as you learn more.

Read the entire post here.