# Simple Regression in Stata

Jeremy Albright

Posted on
Stata regression correlation

This tutorial shows how to fit a simple regression model (that is, a linear regression with a single independent variable) using Stata. The details of the underlying calculations can be found in our simple regression tutorial. The data used in this post come from the More Tweets, More Votes: Social Media as a Quantitative Indicator of Political Behavior study from DiGrazia J, McKelvey K, Bollen J, Rojas F (2013), which investigated the relationship between social media mentions of candidates in the 2010 and 2012 US House elections with actual vote results. The replication data in Stata format can be downloaded from our github repo.

In this example, we will assess the relationship between the percentage of social media posts that mention a Congressional candidate and how well the candidates did in the next election. The variables of interest are:

• vote_share (dependent variable): The percent of votes for a Republican candidate
• mshare (independent variable): The percent of social media posts for a Republican candidate

Both variables are measured as percentages ranging from zero to 100.

## Data Visualization

It is always a good idea to begin any statistical modeling with a graphical assessment of the data. This allows you to quickly examine the distributions of the variables and check for possible outliers. The following code returns a histogram for the vote_share variable, our outcome of interest.

label var vote_share "Vote Share"
label var mshare "Tweet Share"

histogram vote_share, freq kdensity