# Back to Basics: Extracting Consumer Surplus estimates from count data models

Summary: In this post, I show how to obtain consumer surplus estimates. I check whether calculating consumer surplus using the standard formula $CS = - \frac{x}{\beta_0}$ yields the same estimate as the one obtained by integrating under the demand curve for trips. I provide an R script to calculate consumer surplus after estimating any count data model by integrating below the demand curve.

One major method in environmental valuation (and a personal favorite!) is the travel cost method. It is used to understand the demand for recreation and it can be applied to various sites: beaches, mountains, rivers, or even to activity-based recreation (water sports, hiking, etc). To implement the travel cost method, we start by sampling a given population, then estimate a model and extract welfare estimates. This blog post will focus on the latter.

The travel cost method was first proposed in a letter dated from June 18, 1947 by Harold Hotelling. The initial reasoning proposed in this letter is what we now call the zonal travel cost model. Harold Hotelling proposed drawing “concentric zones (…) around each park so that the cost of travel to the park (…) is approximatelly constant” (Hotelling, 1947). If a person decides to visit the park, then “their service of the park is at least worth the cost” (Hotelling, 1947). The consumer surplus associated with a visit to a park is the integral of the estimated demand curve for park visits, which is the measure of “benefits to the public in the particular year” (Hotelling, 1947).

The proposed travel cost model has gone through major improvements over time, but the main reasoning remains: the goal is to use revealed preference data to plot a demand curve for visits to recreational sites. We always expect that the number of demand trips decreases with increases in travel costs to get to the site. The other element that remains is the researcher’s to estimate some form of welfare measure, which is frequently consumer surplus.

In my exposition, I follow the framework of Haab and McConnell (2002), which is an excellent textbook. Let cost to travel to several sites be defined by a vector $c_{ij}$, where i denotes the individual and j defines the site. Income is defined by $y_i$. An individual maximizes utility by choosing the number of visits to site j given income and travel costs (s)he faces. Solving for the optimal number of visits yields the demand function for the number of trips to site j, which is denoted by $x_{ij}$:

$x_{ij} = f(c_{ij} , y_i)$

To estimate this model in an empirical setting, we have to make some kind of assumption about what the function f looks like. For example, I may assume a simple linear model, which I can estimate by ordinary least squares, as follows:

$E(x_{ij}) = \beta_0 + \beta_1 * c_{ij} + \beta_2 * y_i$

Instead, to restricting predicted number of trips to be non-negative, we may also assume that the probability of a given count (for example, of taking two trips) follows a Poisson distribution. This imples that the expected number of trips can be given as:

$E(x_{ij}) = exp( \beta_0 + \beta_1 * c_{ij} + \beta_2 * y_i )$

Once we estimate these models, we can estimate mean consumer surplus by non-linear combination of parameters:

$CS = - \frac{x}{\beta_0}$ ,

where x is the mean value of trips. Consumer surplus per unit of good (i.e. per trip) is given by:

$CS = \frac{1}{\beta_0}$ .

Several studies, such as Czajkowski et al. (2019) and Kipperberg et al. (2019) use this expression to calculate consumer surplus. Czajkowski et al. (2019) finds consumer surplus estimates around 160 to 390 PLN for a visit in a stork breeding colony in Poland. Kipperberg et al. (2019) finds consumer surplus estimates between 70 and 150 NOK for a hiking trip in Norway.

My aim in this post is to check whether using the equation above yields the same consumer surplus as doing what microeconomic theory tells us, that is, integrating under the demand curve. I want to compare the two approaches that should yield the same consumer surplus estimate. It is useful to have an R code to calculate consumer surplus, especially if we are estimating non-standard count data models.

Let us use data on boating trips used in Cameron and Trivedi (2013)’s econometrics textbook. The data includes a TRIPS variable, that is the number of boating trips to Lake Somerville, East Texas, in 1980, and other variables, such as C3 which denotes the Travel cost to Lake Somerville (C3). The average number of trips is 2.24. This data is publicly available here. For an explanation of the variables in the data set, check this link. If you want to use the same data, you should import it into R after downloading it.

To estimate the demand for boating trips and calculate consumer surplus, we need the ggplot2 package, which I already installed but call with the library function:

library(ggplot2)

I estimate a simple count data (Poisson model) by using the glm function, as I explained in this previous post:

model_p <- glm(TRIPS ~ C3 + SO + SKI + I + FC3 , family="poisson", data=data_boating)
> summary(model_p)

Call:
glm(formula = TRIPS ~ C3 + SO + SKI + I + FC3, family = "poisson",
data = data_boating)

Deviance Residuals:
Min       1Q   Median       3Q      Max
-7.9504  -1.2343  -0.9828  -0.4961  19.9452

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept)  0.586097   0.091906   6.377 1.80e-10 ***
C3          -0.015315   0.001014 -15.098  < 2e-16 ***
SO           0.540831   0.015942  33.924  < 2e-16 ***
SKI          0.454188   0.056463   8.044 8.69e-16 ***
I           -0.157829   0.019502  -8.093 5.82e-16 ***
FC3          1.101518   0.079901  13.786  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 4849.7  on 658  degrees of freedom
Residual deviance: 2687.5  on 653  degrees of freedom
AIC: 3452.6

Number of Fisher Scoring iterations: 7

Fortunately, the coefficient associated with travel cost (-0.015315) is negative and statistically significant. Mean consumer surplus is:

$E(CS) = - \frac{2.24}{-0.015} = 149.33$

Consumer surplus per trip is:

$CS = - \frac{1}{-0.015} = 66.67$

That was easy. Let us now do the complicated version: to calculate consumer surplus as the integral of the demand function for boating trips.

To plot the demand function for boating trips, you need to do two things: 1) define the function of the demand for trips, and 2) integrate under the demand curve. To create the demand function, I need to use the coefficients estimated by the Poisson regression above (model_p$coefficients). I use the function command to create the function for expected number of trips (as the y variable) and defining travel cost as the x variable: fun.poisson <- function(x) exp(model_p$coefficients[1]                    +
model_p$coefficients[2]*x + model_p$coefficients[3]*mean(data_boating$SO) + model_p$coefficients[4]*mean(data_boating$SKI) + model_p$coefficients[5]*mean(data_boating$I) + model_p$coefficients[6]*mean(data_boating\$FC3))

Before calculating consumer surplus, I would like to plot the demand function. Here’s my R code using the ggplot2 package (note that I flipped the coordinates using coord_flip() to show a “standard” demand curve):

p <- ggplot(data = data.frame(x = 0), mapping = aes(x = x))
p + stat_function(fun = fun.poisson, size=1.5, mapping = aes(color = "Poisson Model"), show.legend = FALSE) +
theme_bw(base_size = 15) + coord_flip() + xlab("Price") + ylab("Quantity") +
scale_y_continuous(limits = c(0, 4), expand = c(0,0)) +
scale_x_continuous(limits = c(0, 200), expand = c(0,0)) +
geom_vline(xintercept = 59.928 , linetype="dashed", size=1) +
labs(title="Boating Trips") 

This yields this demand function for boating trips:

The horizontal axis is the quantity, i.e. the number of boating trips, and the vertical axis is price, i.e. the travel cost.

Finally, to find consumer surplus per trip, I use the integrate function:

> integrate(fun.poisson, 59.928, Inf)
66.35559 with absolute error < 0.0078

The result (66.36) is almost identical to the consumer surplus per trip (66.67) as we estimated previously.

Mean consumer surplus is the integral of the demand function from zero price to infinity:

> integrate(fun.poisson, 0, Inf)
166.1362 with absolute error < 0.019

This value (166.14) is slightly higher than the consumer surplus estimated with the simple fractional form (149.33). I conclude that $CS = - \frac{x}{\beta_0}$ is a good approximation of consumer surplus but may not always provide the most accurate estimates. It may be interesting to check whether consumer surplus estimates as reported in previous studies are accurate by over-simplifying the consumer surplus expression.

References:

Cameron, A. C., & Trivedi, P. K. (2013). Regression analysis of count data (Vol. 53). Cambridge university press.

Czajkowski, M., Giergiczny, M., Kronenberg, J., & Englin, J. (2019). The individual travel cost method with consumer-specific values of travel time Savings. Environmental and Resource Economics74(3), 961-984.

Haab, T. C., & McConnell, K. E. (2002). Valuing environmental and natural resources: the econometrics of non-market valuation. Edward Elgar Publishing.

Hotelling, H. (1947). Letter to the National Park Service in Economics of Outdoor Recreation–The Prewitt Report.

Kipperberg, G., Onozaka, Y., Bui, L. T., Lohaugen, M., Refsdal, G., & Sæland, S. (2019). The impact of wind turbines on local recreation: Evidence from two travel cost method–contingent behavior studies. Journal of Outdoor Recreation and Tourism25, 66-75.