Repeated choices and data format

R has a very useful package, the gmnl package, to analyze discrete data. While the gmnl package will be the focus of future posts, this blog post will focus on how to get the data in a format so that the gmnl package can analyze it. More precisely, how to expand repeated choices so that R can analyze the data properly.

The problem is that data is not always in the format that the software and packages are programmed to interpret. This is true for a lot of different packages. If so, we have to make some changes before we can run our models.

Sometimes we have information on repeated choices by the individual. In a way, such a scenario is more informative: by observing repeated choices by the same individual, we have a panel of responses and we may infer on preference heterogeneity across respondents. If a respondent prefers one attribute over all others, it will become obvious that (s)he always go for the options that have higher levels of that attribute.

As an example, we use the choice experiment design presented in [1]. Imagine a scenario where there are the following three attributes (cost, new jobs, catch reduction and litter increase) and only three alternatives:

Attributes Alternative1 Alternative2 Alternative3
Cost 0 500 1000
New Jobs 500 350 250
Catch Reduction -5 Kg -2 Kg 0 Kg
Litter increase + 50% + 25% 0%

Usually in choice experiments, when it comes to visualizing the data itself, the choice task including these three alternatives is in a single row, where the choice itself is in one column (taking the values 1, 2 or 3), and all of the attribute levels are represented in the next columns. This is wide format.

But what if this exact choice task was repeated 10 times? For example, the same respondent was provided with extra background information every time, and asked to choose an alternative again for 10 times.

In such a situation there are 10 choice occasions. By choice occasion I mean one instance when the individual is asked to choose across alternatives. For example, the respondent might choose Alternative1 five times, Alternative2 four times, and Alternative1 one time. Ideally, the data should have ten rows representing all of the choice ocasions. It should look like this, which is in wide format:

Choice Alt1.Att1 Alt2.Att1 Alt3.Att1 Alt1.Att2 Alt2.Att2 Alt3.Att2 Alt1.Att3 Alt2.Att3 Alt3.Att3 Alt1.Att4 Alt2.Att4 Alt3.Att4
1 0 500 1000 500 350 250 -5 -2 0 50% 25% 0%
1 0 500 1000 500 350 250 -5 -2 0 50% 25% 0%
1 0 500 1000 500 350 250 -5 -2 0 50% 25% 0%
1 0 500 1000 500 350 250 -5 -2 0 50% 25% 0%
1 0 500 1000 500 350 250 -5 -2 0 50% 25% 0%
1 0 500 1000 500 350 250 -5 -2 0 50% 25% 0%
2 0 500 1000 500 350 250 -5 -2 0 50% 25% 0%
2 0 500 1000 500 350 250 -5 -2 0 50% 25% 0%
2 0 500 1000 500 350 250 -5 -2 0 50% 25% 0%
2 0 500 1000 500 350 250 -5 -2 0 50% 25% 0%
3 0 500 1000 500 350 250 -5 -2 0 50% 25% 0%

This “wide” format is exactly what R needs to run the models. However, instead of this structure, the data might be in a different format, where the REP column represents the amount of times this alternative was selected. Basically, the data could look like this:

Choice REP Alt1.Att1 Alt2.Att1 Alt3.Att1 Alt1.Att2 Alt2.Att2 Alt3.Att2 Alt1.Att3 Alt2.Att3 Alt3.Att3 Alt1.Att4 Alt2.Att4 Alt3.Att4
1 5 0 500 1000 500 350 250 -5 -2 0 Kg 50% 25% 0%
2 4 0 500 1000 500 350 250 -5 -2 0 50% 25% 0%
3 1 0 500 1000 500 350 250 -5 -2 0 50% 25% 0%

While it looks more organized, this is not the data format R can interpret the data in. The objective of this blog post is to show how to go from the second data format to the first.

One way to do this is to use the rep function in R. The rep function replicates the values of the dataset a certain amount of times. The solution I found is summarized in [2]. “Choice” is the vector of chosen alternatives by the individual and REP is the number of times the alternative was chosen, i.e. how many times the rows have to be repeated in the wide format.

DATA <- rep(RAW.DATA$Choice,RAW.DATA$REP)

If you run the R command above, you should obtain a DATA vector that has ten elements, which correctly represent the ten choice occasions.

If you happen to run into the problem that you have several repeated decisions, it might be useful to create a function. I am going to call this function “expandFUNC”.

expandFUNC<-function(x,n) rep(x,n)

This function has two arguments: x and n. The arguments will be used in the expression. Everytime I use the name of the function and declare the two arguments, the rep function will be used. Using functions like this is very useful, especially if the functions end up very long. Thus, they can be “called” with a smaller expression: “expandFUNC”.

DATA <- expandFUNC(RAW.DATA$Choice,RAW.DATA$REP)

This should result in the exact same vector as before.

Now I just need to join the DATA vector with the data frame describing the attributes to get the wide format:

EXPANDED.DATA <- matrix(rep(RAW.DATA[1,3:14],10), nrow=10, byrow=TRUE)

EXPANDED.DATA <- data.frame(DATA,EXPANDED.DATA)

And finally we have a data format that R likes! We can now analyze this repeated data with the gmnl package. Which we will do in a future post.

 

References:

[1] Aanesen, M., Falk-Andersson, J., Vondolia, G. K., Borch, T., Navrud, S., & Tinch, D. (2018). Valuing coastal recreation and the visual intrusion from commercial activities in Arctic Norway. Ocean & Coastal Management153, 157-167.

[2] https://stackoverflow.com/questions/46298148/equivalent-of-statas-expand-in-r

1 Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s