R has a very useful package, the gmnl package, to analyze discrete data. While the gmnl package will be the focus of future posts, this blog post will focus on how to get the data in a format so that the gmnl package can analyze it. More precisely, how to expand repeated choices so that R can analyze the data properly.
The problem is that data is not always in the format that the software and packages are programmed to interpret. This is true for a lot of different packages. If so, we have to make some changes before we can run our models.
Sometimes we have information on repeated choices by the individual. In a way, such a scenario is more informative: by observing repeated choices by the same individual, we have a panel of responses and we may infer on preference heterogeneity across respondents. If a respondent prefers one attribute over all others, it will become obvious that (s)he always go for the options that have higher levels of that attribute.
As an example, we use the choice experiment design presented in . Imagine a scenario where there are the following three attributes (cost, new jobs, catch reduction and litter increase) and only three alternatives:
|Catch Reduction||-5 Kg||-2 Kg||0 Kg|
|Litter increase||+ 50%||+ 25%||0%|
Usually in choice experiments, when it comes to visualizing the data itself, the choice task including these three alternatives is in a single row, where the choice itself is in one column (taking the values 1, 2 or 3), and all of the attribute levels are represented in the next columns. This is wide format.
But what if this exact choice task was repeated 10 times? For example, the same respondent was provided with extra background information every time, and asked to choose an alternative again for 10 times.
In such a situation there are 10 choice occasions. By choice occasion I mean one instance when the individual is asked to choose across alternatives. For example, the respondent might choose Alternative1 five times, Alternative2 four times, and Alternative1 one time. Ideally, the data should have ten rows representing all of the choice ocasions. It should look like this, which is in wide format:
This “wide” format is exactly what R needs to run the models. However, instead of this structure, the data might be in a different format, where the REP column represents the amount of times this alternative was selected. Basically, the data could look like this:
While it looks more organized, this is not the data format R can interpret the data in. The objective of this blog post is to show how to go from the second data format to the first.
One way to do this is to use the rep function in R. The rep function replicates the values of the dataset a certain amount of times. The solution I found is summarized in . “Choice” is the vector of chosen alternatives by the individual and REP is the number of times the alternative was chosen, i.e. how many times the rows have to be repeated in the wide format.
DATA <- rep(RAW.DATA$Choice,RAW.DATA$REP)
If you run the R command above, you should obtain a DATA vector that has ten elements, which correctly represent the ten choice occasions.
If you happen to run into the problem that you have several repeated decisions, it might be useful to create a function. I am going to call this function “expandFUNC”.
This function has two arguments: x and n. The arguments will be used in the expression. Everytime I use the name of the function and declare the two arguments, the rep function will be used. Using functions like this is very useful, especially if the functions end up very long. Thus, they can be “called” with a smaller expression: “expandFUNC”.
DATA <- expandFUNC(RAW.DATA$Choice,RAW.DATA$REP)
This should result in the exact same vector as before.
Now I just need to join the DATA vector with the data frame describing the attributes to get the wide format:
EXPANDED.DATA <- matrix(rep(RAW.DATA[1,3:14],10), nrow=10, byrow=TRUE) EXPANDED.DATA <- data.frame(DATA,EXPANDED.DATA)
And finally we have a data format that R likes! We can now analyze this repeated data with the gmnl package. Which we will do in a future post.
 Aanesen, M., Falk-Andersson, J., Vondolia, G. K., Borch, T., Navrud, S., & Tinch, D. (2018). Valuing coastal recreation and the visual intrusion from commercial activities in Arctic Norway. Ocean & Coastal Management, 153, 157-167.