# Analyzing CV data (non-parametric)

When analyzing CV data, one can do so by parametrically, or non-parametrically. When using the parametric approach, we need to assume some distribution for the willingnesss-to-pay (WTP). Two widely used distributions are the normal and logistic distributions. In this previous post, I showed how to obtain the mean WTP if we do so parametrically.

I am using the estimates from the previous post to provide an example of distributional assumptions. I estimated the coefficients for the bid, damages and the intercept, and their respective standard errors.

I will generate a normal distribution for each of the bid, damages and intercept parameters, given their mean and standard errors. For example, the mean for the intercept was 0.029 and the standard error was 0.039. Assuming the intercept is drawn from a normal distribution, I generate a vector of intercept estimates with 1000 observations by typing:

`Intercept <- rnorm(1000, 0.0291990 , 0.0390835)`

I will do the same for the bid and damages coefficients.

```Bid     <- rnorm(1000, -0.0016771 , 0.0001446)
Damages <- rnorm(1000,  0.1151763 , 0.0425382)```

I have created three vectors of length 1000. What I want however is estimates for the WTP, which is a function of these three variables. The WTP_Small is the WTP to avoid the smaller set of damages and WTP_Large is the WTP to avoid a larger set of damages. Just like last week’s post, I estimate the WTP by typing:

```WTP_Small <- -(Intercept)/Bid
hist(WTP_Small)

WTP_Large <- -(Intercept+Damages)/Bid
hist(WTP_Large)```

Unlike last week, in this exercise, intercept, bid and damages are vectors rather than a single number, so WTP_Small and WTP_Large are also vectors.

These commands result in the following histograms:

This is not exactly the right way to plot the distribution of WTP, but it gives an idea of how it looks like when forcing a specific distribution on it. This assumption may be correct… or it may be completely off and I may seriously be making a big mistake by assuming anything.

Instead, in this post, I will be showing how to estimate the WTP using a non-parametric approach, thus not assuming any distribution on WTP.

I use the same data as the previous posts [1] and [2]. To recap, I created a variable called Vote by running the following commands:

```deepwater\$Vote[deepwater\$q24=="Against"] <- 0
deepwater\$Vote[deepwater\$q24=="For"] <- 1```

The first step is to calculate the proportion of people who voted yes for each of the bid values presented. The deepwater CV data has five different bid amounts: 15, 65, 135, 265 and 435 US dollars. First, I can check how many respondents were faced with each bid:

```> table(deepwater\$bidvalue)

15  65 135 265 435
793 812 795 785 780```

The number of respondents from one bid amount to the other are quite consistently at around 800 respondents.

I want to know how many of the 800 respondents voted yes.

```YES=table(deepwater\$Vote,deepwater\$bidvalue)[2,1:5]
NO =table(deepwater\$Vote,deepwater\$bidvalue)[1,1:5]
Prob=YES/(NO+YES)```

The YES vector has the number of people voting yes for each of the five bids:

```> YES
15    65    135   265   435
445   378   320   264   209```

As we can see, the number of people voting yes drops slightly as the bid amount increases. This conforms with economic expectations: as the price of the program to avoid oil spills increases, people are less likely to be willing to pay.

The probabilities of voting yes is simply the number of people that voted yes divided by the total number of respondents that got that bid amount:

```> Prob
15        65        135       265       435
0.5654384 0.4689826 0.4066074 0.3410853 0.2728460```

The same pattern arises: as the bif value increases, the proportion of people that vote yes decreases.

For illustration purposes, here are the survival functions. The blue one is simply a linear interpolation of the five probabilities, and the red one has steps for each probability.

```plot (Prob~ c(15,65,135,265,435),type="l", col="darkturquoise")
lines(Prob~ c(15,65,135,265,435),type="S", col="darksalmon")```

Fortunately, the survival functions are decreasing and monotonic. If the probabilities were to increase at any point in the survival function, some corrections to the probabilities using the pooled adjacent violators algorithm would have to be done.

Basically, the way to calculate the mean WTP is to calculate the area below either the blue line or the pink line. If we calculate the area below the pink function, then we are using the Kaplan-Meier-Turnbull method. This should generate conservative WTP estimates. If we calculate the area below the blue line, the WTP estimates will be higher and it is the approach of Kristom (1990).

I will follow the steps outlined in Aizaki et al. (2014). They use the DCchoice package and the functions kristrom and turnbull.sb for single-bounded data (as is the case with the deepwater CV data).

```install.packages("DCchoice",
repos = c("http://www.bioconductor.org/packages/release/bioc",
"https://cran.rstudio.com/"),
dep = TRUE)
library(DCchoice)```

The Kristrom command yields the following results:

```> summary(kristrom(Vote ~ bidvalue, data=deepwater))
Survival probability:
Upper Prob.
1 0 1.0000
2 15 0.5654
3 65 0.4690
4 135 0.4066
5 265 0.3411
6 435 0.2728
7 Inf 0.0000

WTP estimates:
Mean: 151.118120 (Kaplan-Meier)
Mean: 261.760929 (Spearman-Karber)
Median: 48.921450```

As expected, the calculated probabilities given the bid are the same as the ones we “manually” calculated before.

There are two means reported after the kristrom command. Both of these are the means of the WTP under the blue line we plotted above. The first mean (“Kaplan-Meier”) assumes that the upper bound of the bid (the point where the blue line intercepts the x-axis) is equal to the maximum bid offered (Aizaki et al., 2014). The second mean (“Spearman-Karber”) assumes that this upper bound bid amount is where the blue line in the plot above intercepts the x-axis.

The syntax of using the turnbull.sb command is basically the same. The only difference is to change the name of the command.

```> summary(turnbull.sb(Vote ~ bidvalue, data=deepwater))
Survival probability:
Upper Prob.
1 0 1.0000
2 15 0.5654
3 65 0.4690
4 135 0.4066
5 265 0.3411
6 435 0.2728
7 Inf 0.0000

WTP estimates:
Mean: 151.118120 (Kaplan-Meier)
Mean: 169.031139 (Spearman-Karber)
Median in: [ 15 , 65 ]```

If we want to be conservative, it is best to go with the WTP of the Kaplan-Meier estimator: 151 US dollars per respondent. What is missing is to calculate different WTP for different sets of damages. To do so, I just changed the data= part of the code. Otherwise, it’s identical to the one above.

```> summary(turnbull.sb(Vote ~ bidvalue,
data=deepwater[deepwater\$damages==1,]))
Survival probability:
Upper Prob.
1 0 1.0000
2 15 0.5929
3 65 0.4903
4 135 0.4049
5 265 0.3740
6 435 0.2917
7 Inf 0.0000

WTP estimates:
Mean: 159.959740 (Kaplan-Meier)
Mean: 177.575027 (Spearman-Karber)
Median in: [ 15 , 65 ]
> summary(turnbull.sb(Vote ~ bidvalue,
data=deepwater[deepwater\$damages==0,]))
Survival probability:
Upper Prob.
1 0 1.0000
2 15 0.5381
3 65 0.4464
4 135 0.4084
5 265 0.3098
6 435 0.2539
7 Inf 0.0000

WTP estimates:
Mean: 142.423499 (Kaplan-Meier)
Mean: 160.668041 (Spearman-Karber)
Median in: [ 15 , 65 ]```

By following a consevrative approach, we can conclude that the WTP to avoid a larger set of damages is around 160 US dollars per person and to avoid a smaller set of damages is around 142 US dollars per person. These estimates are slightly higher but very similar to the ones reported in [3]. One difference is that we did not exclude ‘flagged’ respondents.

References:

Aizaki, H., Nakatani, T., & Sato, K. (2014). Stated preference methods using R. Chapman and Hall/CRC.Aizaki, H., Nakatani, T., & Sato, K. (2014). Stated preference methods using R. Chapman and Hall/CRC.

Kriström, B. (1990). A non-parametric approach to the estimation of welfare measures in discrete response valuation studies. Land economics66(2), 135-139.