# Simple Regression in R

Posted on
R regression

This tutorial shows how to fit a simple regression model (that is, a linear regression with a single independent variable) using R. The details of the underlying calculations can be found in our simple regression tutorial. The data used in this post come from the More Tweets, More Votes: Social Media as a Quantitative Indicator of Political Behavior study from DiGrazia J, McKelvey K, Bollen J, Rojas F (2013), which investigated the relationship between social media mentions of candidates in the 2010 and 2012 US House elections with actual vote results. The replication data in R format (.rds) can be downloaded from our github repo.

In this example, we will assess the relationship between the percentage of social media posts that mention a Congressional candidate and how well the candidates did in the next election. The variables of interest are:

• vote_share (dependent variable): The percent of votes for a Republican candidate
• mshare (independent variable): The percent of social media posts for a Republican candidate

Both variables are measured as percentages ranging from zero to 100.

library(tidyverse)
library(knitr)
library(readr)

Next we will load the data. We use the select function from dplyr to keep only the variables of interest.

twitter_data <- read_rds("data/twitter_data.rds") %>%
select(vote_share, mshare)

## Data Visualization

It is always a good idea to begin any statistical modeling with a graphical assessment of the data. This allows you to quickly examine the distributions of the variables and check for possible outliers. The following code returns a histogram for the vote_share variable, our outcome of interest.

twitter_data %>%
ggplot(aes(x=vote_share)) +
geom_histogram(aes(y = ..density..),
color = 'black',
fill = 'firebrick') +
geom_line(stat="density") +
labs(x ="Vote Share",  y = "Density")
## stat_bin() using bins = 30. Pick better value with binwidth.