In this previous blog post, I estimated the willingness to pay to establish a marine protected area in two different areas. Using a conditional logit model, I estimated the WTP to establish a marine protected area in Western shoal to be within the 95% confidence interval of [7.01 , 13.94]. This WTP for Apollo shoal is within the 95% confidence interval of [2.97 , 9.07]. I concluded that, since the confidence intervals overlap, the two WTP are not statistically different.
Poe et al. (2015) argues that simply comparing the confidence intervals and checking whether these overlap is not appropriate. To check if the the difference between two WTP distributions is statistically different than zero, Poe et al. (2005) proposes a convolution test instead. Poe et al. (2005) defines the convolution as “the probability of each possible outcome, considering all possible combinations of the two independent distributions”. In this blog post I will show how to use the convolution test to estimate whether the WTP using the same data as Karlõševa et al. (2016) are different.
To conduct the convolution test, I am using the mded package in R (Aizaki, 2014). As usual, I start by installing the package and calling the library function.
In order to implement the convolution test, I first simulate the coefficients of the cost variable, and the coefficients associated with a marine protected area in western and Apollo shoal. Each of the coefficients is a random distribution with the mean set at the estimated mean from the model (see previous post to see how), and standard deviation as the squared root of the corresponding value in the diagonal of the variance-covariance matrix.
coef_C <- rnorm(1000, coef(model), sqrt(vcov(model)[1,1])) coef_ws <- rnorm(1000, coef(model), sqrt(vcov(model)[3,3])) coef_as <- rnorm(1000, coef(model), sqrt(vcov(model)[6,6]))
I simulate a vector of WTP to establish a marine protected area in Western shoal and Apollo shoal as the coefficient of WS or AS over the cost coefficient:
wtp_ws <- c(-coef_ws/coef_C) wtp_as <- c(-coef_as/coef_C)
wtp_ws and wtp_as are vectors of length 1000 representing the distribution of willingness to pay to establish a marine protected area. They are constructed by dividing the coef_ws or coef_as vector by the coef_C vector. Like in the previous post, I report the mean WTP for the Western Shoal to be 10.5 euros, and for the Apollo shoal to be 6 euros. The WTP in the Western shoal ranges from 4.7 and 17.2, while for the Apollo shoal it ranges from 2.3 to 11.0. The descriptive statistics are below:
> summary(wtp_ws) Min. 1st Qu. Median Mean 3rd Qu. Max. 4.684 9.273 10.476 10.510 11.675 17.184 > summary(wtp_as) Min. 1st Qu. Median Mean 3rd Qu. Max. 2.264 4.989 5.988 6.009 7.040 11.030
To visualize the histograms of the WTP distributions, I use the ggplot package:
library(ggplot2) plot_ws <- ggplot(data = df.data, aes(x = X.coef_ws.coef_C)) + geom_histogram(binwidth = 1, fill = "seagreen", color = "green") plot_as <- ggplot(data = df.data, aes(x = X.coef_as.coef_C)) + geom_histogram(binwidth = 1, fill = "blue", color = "skyblue1")
If I look at the histograms, I can see that the two distributions overlap to some extent.
The idea of the convolution test is to combine all 1000 draws for the wtp_ws distribution with all 1000 draws from the wtp_as distribution. If the two distributions are independent, then we use the complete combinatorial method by doing 1000×1000= 1 000 000 possible combinations. I then compare how many cases of the difference between the two vectors have a negative sign. The p-value is the proportion of negative values of the difference between the two distributions.
In other words, let’s look at the first 5 observations from each vector:
> wtp_ws[1:5]  9.180609 10.309286 9.690702 8.260901 11.385065 > wtp_as[1:5]  5.356340 6.341630 8.578429 5.335886 7.908616
Let’s imagine than the vectors were only 5 elements long. To perform the complete combinatorial method, I combine each of the elements from wtp_ws with those from wtp_as. This gives 25 possible combinations. For example, I combine 9.180609 (the first element from wtp_ws with each of the five elements in wtp_as, and calculate the difference. 9.180609 is larger than all of the five elements below, so that yields five comparisons with the same outcome. However, when I compare the forth element from wtp_ws, it is lower than the third element from wtp_as. So out of 25 possible combinations, wtp_ws is lower than wtp_as in only one case.
The p-value is just the number of negative outcomes, which is one, divided by all possible combinations, that is 25. This yields a p-value of 0.034, which is lower than the conventional 5% significance level.
The mded package does all possible combinations for me. Let’s use the full vectors of 1000 elements.
> mded(wtp_as, wtp_ws, detail = TRUE, independent = TRUE) Test: H0 wtp_ws = wtp_as H1 wtp_ws > wtp_as significance level = 0.03418 Data: distr1 = wtp_ws distr2 = wtp_as Means: means n wtp_ws 10.558 1000 wtp_as 6.149 1000 Cases in the difference: n true 34183 false 965817 total 1000000
The mded package reports a significance level of 0.03418. This means than we conclude that the two distributions are in fact different and WTP in the Western shoal is statistically higher than in the Apollo shoal. This is the opposite conclusion from comparing overlapping confidence intervals. Hence, even if the confidence intervals overlap, using the convolution test may indicate that the two distributions are statistically different.
The mded package also reports all cases (“total 1000000”) and how many comparisons yielded wtp_ws>wtp_as, that is 965817, and how many comparisons yielded wtp_ws<wtp_as, that is 34183. The p-value is just 34183/1000000, which equals 0.034.
If the distributions are not independent from each other, then independent = FALSE, when running the mded function (Aizaki, 2014).
Aizaki H (2014). mded: Measuring the difference between two empirical distributions, R package version 0.1-1. URL http://CRAN.R-project.org/package=mded.
Karlõševa, A., Nõmmann, S., Nõmmann, T., Urbel-Piirsalu, E., Budziński, W., Czajkowski, M., & Hanley, N. (2016). Marine trade-offs: Comparing the benefits of off-shore wind farms and marine protected areas. Energy Economics, 55, 127-134.
Poe GL, Giraud KL, Loomis JB (2005). Computational methods for measuring the difference of empirical distributions. American Journal of Agricultural Economics, 87, 353–365.