Estimating a MDCEV model: the rmdcev package (1)

It has been quite a while since I wrote something on this blog (my aim is to write every two weeks). I have to admit I have had a rough (last) week, so it was challenging to prepare something good enough to share on the blog. This week I want to disseminate a new package for those particularly interested in modeling discrete and/or continuous variables: the rmdcev package. This package was developed by Patrick Lloyd-Smith, who I have had the opportunity to meet in various conferences. I will first explain how to prepare the data to implement a MDCEV model, and then follow-up with modelling examples in future blog posts.

Any type of consumer behavior data has two dimensions. These are the choice of a specific product/location/brand/etc, as well as how many units of a product/location/brand/etc to purchase. Depending on the research focus, the respective models to analyze consumer choices are discrete choice models or count data models.

Discrete choice models focus on the analysis of the choice of one or more specific goods among a large set of alternative goods. Alternatively, one can focus on explaining frequency of choice of a specific or a bundle/category of goods, which one can do by using count data models. These are the most popular approaches to explain consumer behavior.

A natural extension of the modelling effort is to jointly explain choice of good as well as number of purchases. Some models can capture this duality of consumer choice, such as repeated discrete choice models, linked seasonal demand models or Kuhn-Tucker models (Parsons, 2017; Parsons et al., 2009).

Kuhn-Tucker (KT) models are a relatively recent solution that combine number of purchases and good choice (Kuriyama et al., 2010). Not purchasing a product represents a corner solution, and purchasing several units represent interior solutions (Kuriyama et al., 2010). KT models are derived from a theoretically consistent demand system model (von Haefen et al., 2004).

The KT approach starts by specifying a random utility function, just like site choice models, adding the number of purchases as part of the utility function and allowing for diminishing marginal utility (Phaneuf and Siderelis, 2003; von Haefen et al., 2004). The solution to the utility maximization problem results in the Kuhn-Tucker conditions, from where the method take its name (von Haefen et al., 2004). Given a change in price or quality of the goods, an individual can adjust his consumer behavior, both in number of purchases and which goods to purchase (Phaneuf and Siderelis, 2003).

However, applications of the KT approach are scarce due to the model being computationally burdensome. Running a KT model requires a system of 2M comparisons between the indirect utility function of pairs of alternatives, where M represents the size of the choice set (i.e. the number of goods) (von Haefen et al., 2004). If the size of the choice set is large, solving the KT model is computationally difficult (Kuriyama et al., 2010; von Haefen et al., 2004). von Haefen et al. (2004) and Kuriyama et al. (2010) provide alternative algorithm iterations to run KT models more efficiently.

The good news is that the rmdcev package (and other packages) closes the gap between the theoretical potential of KT models and the difficulty with applying it in practice. In this blog post, I will go through the data preparation to use the rmdcev package.

Different frameworks have been developed to apply KT models. To date, the most popular of these is the multiple-discrete continuous extreme value (MDCEV) model, which I will focus on. This model was first introduced by Bhat (2008). Similarly to discrete choice models, the MDCEV can account for preference heterogeneity, a desired property for discrete choice modellers.

Prior to installing the rmdcev package, you might have to install the rstan package as well. This webpage provides a guide as to how to install it.

Using the usual command to install packages (i.e. install.packages) did not work for me, so I used the alternative command to install the rmdcev package, as shared on the rmdcev GitHub webpage:

if (!require(devtools)) {
install.packages("devtools")
library(devtools)
}
install_github("plloydsmith/rmdcev", dependencies = TRUE, INSTALL_opts="--no-multiarch")

Note: in order to successfully install the rmdcev package, I had to update R and Rstudio. I highly advise doing the same.

Once installed, I need to get the data in the right format. This can be quite the time-consuming task, so I will dedicate the rest of this post to illustrating how to prepare the data.

As documented in Lloyd-Smith (2020), the data should be in long format. In order words, each row corresponds to a particular individual and a particular alternative. The variable of interest is a “choice” variable, wherein the number of times that person chose that particular alternative shows up.

This already complicates things for me. My data is not in “wide” format. Instead, each row represents one individual, wherein my 12 alternatives are in 12 columns. I have 1354 rows, and what I need is a data frame with 1354 (individuals) * 12 (alternatives) =16248 rows.

What I have looks like this:

To transform this data, I found the melt function within the reshape2 package that can do the trick. I installed the reshape2 package and then used the following code to obtain a dataframe in long format:

data_long <- melt(data,id.vars=c(1:7,14,21:64))

where id.vars are the variables that you want to keep “intact”. The melt function will assume all other variables are to be “melted”. The new dataframe looks like this:

So, for every individual, I can see how many purchases they did of each option (which I have labelled 1 to 12).

Finally, before analyzing the data, I have to convert my dataframe into a mdcev.data object. Note: it turns out that to create a mdcev object, you need to at least create a price variable and indicate the income variable. Additional explantions can be found here.

I finally run the code and obtain the following message:

> data_mdcev <- mdcev.data(data_long,
+                          id.var = "RESPID",
+                          alt.var = "variable",
+                          choice = "value",
+                          income = "INCOME")
Sorting data by id.var then alt...
Checking data...
Data is good

The data_mdcev dataframe is a dataframe with 11580 rows and 55 columns. Each row pertains one individual and a specific alternative.

I am now ready to start estimating MDCEV models. I will pursue this endeavour in my next blog post.

References:

https://cran.r-project.org/web/packages/rmdcev/index.html

Bhat, C.R. (2008) “The Multiple Discrete-Continuous Extreme Value (MDCEV) Model: Role of Utility Function Parameters, Identification Considerations, and Model Extensions” Transportation Research Part B, 42(3): 274-303.

Kuriyama, K., Hanemann, W. M., & Hilger, J. R. (2010). A latent segmentation approach to a Kuhn–Tucker model: An application to recreation demand. Journal of Environmental Economics and Management60(3), 209-220.

Lloyd-Smith, P. (2020). Kuhn-Tucker and Multiple Discrete-Continuous Extreme Value Model Estimation and Simulation in R: The rmdcev Package. The R Journal.

Parsons G.R. (2017) Travel Cost Models. In: Champ P., Boyle K., Brown T. (eds) A Primer on Nonmarket Valuation. The Economics of Non-Market Goods and Resources, vol 13. Springer, Dordrecht

Parsons, G. R., Kang, A. K., Leggett, C. G., & Boyle, K. J. (2009). Valuing Beach Closures on the Padre Island National Seashore. Marine Resource Economics24(3), 213-235.

Phaneuf, D. J., & Siderelis, C. (2003). An application of the Kuhn-Tucker model to the demand for water trail trips in North Carolina. Marine Resource Economics18(1), 1-14.

von Haefen, R. H., Phaneuf, D. J., & Parsons, G. R. (2004). Estimation and welfare analysis with large demand systems. Journal of Business & Economic Statistics22(2), 194-205.