Power Calculations for Binary Outcomes: The Effect of Cheap Talk Scripts (2)

In my previous blog post, I illustrated how to perform power calculations in the case of continuous outcomes. I then do power calculations ex-post for Martinsson and Carlsson (2006)’s study, who conducted a survey wherein they included an open-ended question elliciting WTP. Their dependent variable was continuous, i.e. “how much are you willing to pay?”. This time, I will illustrate how to perform these power calculations when the dependent variable is binary.

I will use the same context as before: estimating effect of a cheap talk script on WTP. Concerns with hypothetical bias has lead economists to include cheap talk scripts in contingent valuation surveys to attenuate yay-saying. A cheap-talk script is a body of text that asks people to think of the hypothetical scenario as a real scenario. It is expected that the cheap talk script will reduce average WTP.

However, as Murphy et al (2015) indicates, the effect of the cheap talk script in the literature is mixed. Cheap talk scripts across surveys differ in terms of length and content, which impacts the effectiveness of cheap talk scripts to attenuate hypothetical bias. Namely, Murphy et al. (2015) find that the effect of cheap talk on hypothetical bias is only present for high payment levels. To test such a hypothesis, one needs to obtain several sub-samples by varying the payment levels and the cheap talk treatment.

My research question is hence: does Murphy et al. (2015) have enough power in their survey experiment to obtain the effect of the cheap talk script in high payment levels? To do so, I will perform power calculations given the information reported in Murphy et al. (2015).

The Murphy et al. (2015) study

To estimate people’s willingness to pay (WTP), instead of asking an open-ended WTP question, Murphy et al. (2015) used a single-bounded referendum question.

‘‘Are you willing to contribute $—— to the Nature Conservancy so that signs can be placed in and around Mt. Toby identifying the trails and rare species?’’

In a referendum type question, if an individual is willing to pay to obtain a certain level of environmental protection, (s)he will respond yes. The utility of saying “yes” must exceed the utility of saying no. The dependent variable is binary: it takes the value one if the survey respondent answered “yes”, that is, they would be willing to pay a fixed price to obtain more environmental quality, and zero if they answer “no”. The expected outcome is a higher propensity of answering “yes” without the cheap talk script.

Power Calculations

In order to perform power calculations, one needs three of four inputs: power level, effect size, sample size and significance level. The convention for power level and significance level is 80% and 5%, respectively. If one has information on three of these variables, one can find the fourth as a function of the others.

Sample sizes are as follows: 60 respondents got the cheap talk survey, while 68 respondents were in the control group. The authors report only 11 respondents willing to pay with the cheap talk script (16%), and 19 willing to pay without the cheap talk script (32%). Since my dependent variable is binary, I want to investigate the magnitude of the difference in the probabilities of a certain outcome with and without the treatment to find the possible effect size. Hence, the difference in percentage points from the inclusion of the cheap talk script is quite substantial: 16 percentage points. 

However, to perform the power calculations, the difference between the proportions is not an appropriate estimate for effect size (Cohen, 2013, p.181). That is because the detectability of this difference in proportions depends on the absolute values of the probablity of saying yes in the control and treatment groups (16% and 32%). Instead, Cohen (2013, p.181) proposed a non-linear transformation of the proportions of saying yes in the control and treatment groups to find the effect size, in what is popularly known as Cohen’s H:


h is the effect size that we need to perform the power calculations. To find the effect size in R, I typed in R the equation above and inputed 32% and 16%:

> h = abs(2*asin(sqrt(0.32))-2*asin(sqrt(0.16)))
> h
[1] 0.3794947

The estimated effect size (0.38) is rather large compared with the effect size from last week (0.07). Again, this is but an estimate from this particular sample.

According to Duflo et al. (2007), the power of a design is “the probability that, for a given size and a given statistical significance level, we will be able to reject the hypothesis of zero effect”. If the goal of the researcher is to estimate the needed sample size, significance level or minimum detectable effect size, one would set power at 80% by convention (Duflo et al., 2007). Instead, my aim is to estimate the power of this experiment.

To do so, I am using the pwr package in R (Champely et al., 2018).


I have to set the sample size for each of the sub-samples:


I then perform the power calculation as such:

> pwr.2p2n.test(h=h, n1=n1, n2=n2, sig.level = 0.05, power = NULL, 
                alternative = c("greater"))

     difference of proportion power calculation for binomial distribution (arcsine transformation) 

              h = 0.3794947
             n1 = 60
             n2 = 68
      sig.level = 0.05
          power = 0.6906508
    alternative = greater

NOTE: different sample sizes

Despite the low sample size, the power of the experiment is 70%, almost the conventional value of 80%. This means that there is a 70% chance of finding a statistically significant effect, if there is such an effect.

Murphy et al. (2015) find that their cheap talk dummy coefficient is statistically significant in one of the estimated OLS regressions, not statistically insignificant in the other.

However, the null hypothesis of Murphy at al. (2015) is that the cheap talk script should have an effect at higher levels of payment rather than low. Such split-sampling (by different payment levels) would lower the number of observations per treatment even more, that is, they would have to compare the answers for individuals who got high payment levels versus low. In fact, their experiment does not seem to have enough observations to provide a reliable answer to that research question.

In conclusion, their experiment has some power if indeed the effect size of a cheap talk is as high as the estimate they got. I am less reluctant to consider any further split-sample comparisons to have enough power. Hence any conclusions drawn from these should be taken with caution.



Champely, S., Ekstrom, C., Dalgaard, P., Gill, J., Weibelzahl, S., Anandkumar, A., Ford, C., Volcic, R., De Rosario, H., De Rosario, M.H., 2018. Package ‘pwr.’ R Package Version 1–2.

Cohen, J., 2013. Statistical power analysis for the behavioral sciences. Routledge.

Duflo, E., Glennerster, R., Kremer, M., 2007. Using Randomization in Development Economics Research: A Toolkit (Discussion Paper Series No. 6059).

Martinsson, P., & Carlsson, F. (2006). Do experience and cheap talk influence willingness to pay in an open-ended contingent valuation survey?. rapport nr.: Working Papers in Economics, (190).

Murphy, J. J., Stevens, T., & Weatherhead, D. (2005). Is cheap talk effective at eliminating hypothetical bias in a provision point mechanism?. Environmental and Resource economics30(3), 327-343.


1 Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s