Analyzing CV data (The Linear Probability Model)

This blog post focuses on how to estimate a linear probability model with contingent valuation data.

The Deepwater Horizon CV study

For this blog post, I will be using data from the CV study on the Deepwater Horizon oil spill, which was published in 2017

Oil spills generally involve not only loss of use values, but also loss of nonuse values. Nonuse value losses exist if respondents attach value to a resource just from knowing it exists. Respondents are neither direct or indirect users of the resource.  In such cases, contingent valuation or other stated preference methods are well-suited to estimate losses in value.

In the Deepwater Horizon study, a random sample of American citizens were presented with a program to prevent oil spill damages in the next 15 years. Then they were asked to vote for or against this program, given a mandatory increase in taxes. By analyzing people’s choices, the researchers were able to estimate a lower-bound WTP per household of $136 for a smaller set of damages, and $153 for a larger set of damages.

If you want to follow this tutorial by using the same data, feel free to download it by clicking “Main and Non-Response Follow-Up Survey Data“. This is a zip file that includes a Stata datafile and an Excel file. Please save one of these as a csv file, so we can use R to analyze it.

Elicitation Formats in Contingent Valuation Studies

Contingent valuation studies may elicit the WTP using different types of questions. Practicioners can use open-ended questions or a payment card, which results in a continuous variable, or a referendum question, which yields a binary variable. In a referendum question, a respondent is asked if (s)he is willing to pay a given amount in a certain amount of time to get the environmental good (or prevent some environmental damages). The respondent either says yes or no, which yields a binary dependent variable. This is the type of data I will be using in this blog post.

This is how this question looked like in the Deepwater Horizon study:


Usually, the “Vote” binary variable is coded as a 1 if the respondent is willing to pay a given monetary amount to obtain an environmental good/service (or avoid a set of damages), and coded as a 0 if the respondent is not willing to pay the same amount. Hence, I am rather estimating the probability of answering yes to this question, rather than a marginal effect on a continuous variable.

One can use different models to analyze this variable. Some of the most well-known are the probit and logit models. Instead, I will use the linear probability model as a first step to analyze binary data. That is because I want to replicate the results here.[1]

Estimating a Linear Probability Model

The linear probability model is a simple linear regression model with a binary variable as its dependent variable.[3] The estimated coefficients are best interpreted the impact in observing the dependent variable as a success (where success is defined as Y=1) due to a marginal change in one independent variable, ceteris paribus. Because this is a linear regression model, the standard lm function in R is all we need to estimate a linear probability model.

As always, I start by importing my data:

deepwater <- read.csv("DIRECTORY/deepwater.csv", 
                      stringsAsFactors = FALSE)

The dataset has many variables, but all we need for this exercise is the “q24”, “version”, “flag” and “bidvalue” variables. First, I need to create a proper binary variable as the dependent variable:

deepwater$Vote[deepwater$q24=="Against"] <- 0
deepwater$Vote[deepwater$q24=="For"] <- 1

q24 indicated if the respondent voted in favor or against the program. With the code above I created a variable called “Vote” that takes the value 1 if the respondent was for the program, and zero if (s)he was against.

To see how the respondents voted, I use the table function:

> table(deepwater$Vote)

0    1 
2304 1616

2304 respondents voted against the program, and 1616 voted in favor of the program.

The “version” variable indicates whether the respondent faced the survey with a smaller or a larger set of damages. I create a “damage” variable that takes the value 1 if the version B (with a larger set of damages) was presented to the respondent.

deepwater$damages <- 0
deepwater$damages[deepwater$version=="B"] <- 1
deepwater$damages[$damages)] <- 0

Because I am replicating the results from [1], I also need to transform the bid variable, like they did:

deepwater$log_bid <- log(deepwater$bidvalue)

Finally, the fourth variable that we have to modify is the “flag” variable. It just indicates whether the respondent should be dropped from the sample or not.

deepwater$flag[deepwater$flag=="Yes"] <- 1
deepwater$flag[deepwater$flag=="No"] <- 0

Now, I am ready to run my linear probability model. First, I am dropping the respondents who were flagged:

DATA <- deepwater[deepwater$flag==0,]

The DATA dataframe only has the observations that I want. To run the linear probability model, use the lm function:

OLS_A <- lm(Vote ~ log_bid, data=DATA[DATA$damages==0,])
OLS_B <- lm(Vote ~ log_bid, data=DATA[DATA$damages==1,])

OLS_A are the results for the smaller set of damages, and OLS_B are the results for the larger set of damages.

> summary(OLS_A)

             Estimate    Std. Error   t value    Pr(>|t|) 
(Intercept)  0.792944    0.046148     17.183     <2e-16 ***
log_bid     -0.084939    0.009567     -8.879     <2e-16 ***
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> summary(OLS_B)

             Estimate    Std. Error   t value   Pr(>|t|) 
(Intercept)  0.83181     0.04695      17.718    <2e-16 ***
log_bid     -0.08397     0.00974      -8.621    <2e-16 ***

In [1], if you look at tables 6 and 7 (first column) you’ll see that we got (almost) the same estimates for the constant and the log of tax amount.

And here is the linear probability model. In some applications, the bid variable does not need to be transformed to its logarithm. Soon I will follow up with an application of a probit or logit model.



[2] Bishop, R. C., Boyle, K. J., Carson, R. T., Chapman, D., Hanemann, W. M., Kanninen, B., … & Paterson, R. (2017). Putting a value on injuries to natural assets: The BP oil spill. Science356(6335), 253-254.




Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s