In the spirit of replicating studies, I decided to dedicate a blog post to replicate the results from Sundt and Rehdanz (2015). The reason being that the data from this study is publicly available (and can be found here) and it is an application of a meta-analysis.
Another reason I want to replicate this study is that I am convinced the results are misreported in Table 3. All the coefficients show up as negative, whereas in the discussion some of these variable are referred to as having a positive effect on the dependent variable. For example, in the text the authors mention that controlling for education (att_edu) has a “large positive effect” on WTP, but the coefficient associated with att_edu is -2.2437.
Sundt and Rehdanz (2015) estimated a meta-regression on consumer preferences regarding the share of renewable energy in the electricity mix. The authors assembled a series of peer- and non-peer-reviewed studies that applied the contingent and choice experiment methods and estimated the WTP for an increase in renewable energy in the current electricity mix. The estimated WTP in each paper was converted into a single metric: either the mean WTP per household per month or the WTP per killowatt-hour. All values are in 2010 US dollars. Because the welfare estimates had to be converted into a single metric, out of the 101 initial studies identified, only 18 were actually used in the meta-regression. This yielded 85 WTP estimates, and an average of 4.72 estimates per study.
The authors find that a one percentage point increase in green electricity production increases the WTP per household by a factor of 1.36. Meanwhile, a one percentgae increase in hydropower decreases the WTP by a factor of 0.93.
In order to do the replication, I downloaded the data and imported it to R.
The dependent variable (WTP) is the natural logarithm of the WTP. Here are the descriptive statistics of the dependent variable:
> summary(meta_data$ln_wtp) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.000 1.517 2.327 2.063 2.819 3.761
Since the dependent variable (WTP) is continuous, a linear regression is enough to understand the effect of the independent variables. This is generally the case in meta-analyzes.
Since one study could report several WTP estimates, the authors used a weighted linear regression. In essence, this implies running a standard linear regression after giving each WTP estimate a weight equal to 1 divided by the number of WTP estimates within that study.
For example, a study in Greece yielded one WTP estimate for this meta-regression. This paper is the one with id_paper=19.
Therefore, the weight of this WTP estimate is equal to 1 divided by the number of WTP estimates extracted from this study, which is one. The weight should be one, and it is:
> meta_data$weight[meta_data$id_paper==19]  1
On the other hand, one study conducted in the US contributed with 19 WTP estimates for the meta-analysis. Hence, the weight of each WTP estimate extracted from this study should be 1/19 = 0.052.
> meta_data$weight[meta_data$id_paper==5]  0.250 0.250 0.125 0.125 0.250 0.250 0.250 0.125 0.250 0.250 0.125 0.125 0.125 0.250 0.250 0.250 0.125 0.250 0.250
Apparently the authors decided to attribute different weights to this paper. Unfortunately, in my opinion it is not well documented in the paper how the weights are created.
It is easy to run a weighted linear regression on R. The function to run a linear regression lm just needs to have an extra option specifying the weights: weights = VECTOR.
I use the exact same explanatory variables as the paper I am trying to replicate: year, re_share, hydro_share, usa, method_cv, un_spec, att_hh, att_inc, att_info, and att_edu (the variable att_info is actually the att_know variable in the paper). The vector of weights is just called weight.
model1 <- lm(ln_mean_wtp ~ year + re_share + hydro_share + usa + method_cv + un_spec + att_hh + att_inc + att_info + att_edu, data= meta_data, weights = weight)
And the results are as follows:
> summary(model1) Call: lm(formula = ln_mean_wtp ~ year + re_share + hydro_share + usa + method_cv + un_spec + att_hh + att_inc + att_info + att_edu, data = meta_data, weights = weight) Weighted Residuals: Min 1Q Median 3Q Max -0.64588 -0.14647 -0.03215 0.03058 1.31568 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 4.58491 0.25285 18.133 < 2e-16 *** year -0.51022 0.03490 -14.621 < 2e-16 *** re_share 0.27340 0.02001 13.666 < 2e-16 *** hydro_share -0.34368 0.02722 -12.625 < 2e-16 *** usa 2.49727 0.18866 13.237 < 2e-16 *** method_cv 1.01033 0.17036 5.931 8.95e-08 *** un_spec -2.40566 0.20740 -11.599 < 2e-16 *** att_hh -1.15126 0.19292 -5.967 7.69e-08 *** att_inc 0.71805 0.17415 4.123 9.66e-05 *** att_info -0.36031 0.15800 -2.280 0.0255 * att_edu 2.14779 0.20982 10.236 8.01e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.2606 on 74 degrees of freedom Multiple R-squared: 0.8671, Adjusted R-squared: 0.8491 F-statistic: 48.26 on 10 and 74 DF, p-value: < 2.2e-16
I can now compare the results I obtained with Table 3 from Sundt and Rehdanz (2015). If you ignore the negative signs on Table 3, then the estimated coefficients we obtained are almost identical to the reported coefficients in the paper (see their Model 1 without interactions). For example, our estimated coefficient for att_hh (which is a dummy denoting whether household variables were included in the original study) is -1.15126 and the reported coefficient in Sundt and Rehdanz (2015) is -1.2335.
The coefficients of att_edu, att_inc, method_cv, usa, re_share and the intercept all have positive signs, but these were misreported in the paper has having a negative sign.
If you ignore the fact that the signs are wrong in their table, I was able to sucessfully replicate their results.
Sundt, S., & Rehdanz, K. (2015). Consumers’ willingness to pay for green electricity: A meta-analysis of the literature. Energy Economics, 51, 1-8.