Designing an experiment design with the idefix package: the parameter values (1)

To the best of my knowledge, there are three ways of designing choice experiments in R: full factorial designs developed by Aizaki (2012) and optimal designs developed by Horne (2018) and by Traets et al. (2020). Alternatives outside R include the most well-known software NGENE, the RDG available here to download and documented in van Cranenburgh & Collins (2019), and the STATA user-written command fsdesign. Since I have already covered Aizaki (2012)’s package, I will be exploring the most recent package: Traets et al. (2020)’s idefix package.

Aizaki (2012)’s package generates full factorial designs. As I showed previously, when the number of attributes and attribute levels increases, so do the number of choice sets necessary in a full factorial design. For a design with 6 attributes and several levels for each, the number of choice cards would have to be 72! This is unreasonably large. The convention nowadays is to use statistically optimal designs rather than full factorial designs.

Unlike full factorial designs, optimal designs can make use of prior information, namely the sign and magnitude of the coefficients of interest (Traets et al., 2020). For example, I expect that the cost attribute will be negative, so I can inform the experimental design of that expectation. From reading the literature, I can predict the value of certain parameters. If I had no information about the coefficients of interest, maybe I want to prepare a pilot study, wherein I can set the parameter values to be zero in the experimental design.

The package idefix is a recently developed package specializing in experimental design for choice experiments. I will go through the steps in idefix to set an experimental design, focusing on setting up the necessary information, exploring the importance of choosing good initial parameters, and explaining some changes I did in the code. Let us start with a similar experimental design as last time. It has three attributes: number of employed fishermen, number of vessels, and costs per household and per year. Each attribute has different levels.

AttributesLevels
Number of employed fishermen300, 600, 900
Number of vessels207, 600, 800
Cost per household per year0, 5, 10, 20, 35, 60

One can start by defining how many choice sets our choice experiment will have (12):

library(idefix)
n.sets <- 12

I also define the attributes and number of attribute levels:

at.lvls <- c(3, 3, 6)

There are three elements in the at.lvls vector to represent three attributes. The first two attributes have 3 levels, as specified by each element in the vector, and the last has six. We also need to specify the nature of the attribute. Is it effects-coded (“E”), dummy (“D”) or a continuous (“C”) variable? I have learned that the code works poorly with continuous attributes, but a good job with dummy or effects-coded attributes. So, in this example, even though I have two continuous variables, I define their type in the c.type vector as dummy (since it is more intuitive than effects-coded):

c.type <- c("D","D", "D")

After all of this, we define the candidate set of alternatives to feature in the choice cards.

> cs <- Profiles(lvls = at.lvls, coding = c.type, c.lvls = con.lvls)
> head(cs)
  Var12 Var13 Var22 Var23 Var32 Var33 Var34 Var35 Var36
1     0     0     0     0     0     0     0     0     0
2     1     0     0     0     0     0     0     0     0
3     0     1     0     0     0     0     0     0     0
4     0     0     1     0     0     0     0     0     0
5     1     0     1     0     0     0     0     0     0
6     0     1     1     0     0     0     0     0     0

The cs object contains 54 rows: these are alternatives to include in a choice card. I show the first 6 lines above. For example, alternative 1 has all dummies coded to zero, meaning alternative 1 represents 300 employed fishermen, 207 vessels and 0 cost/payment.

As a side-note, one may be interested in deleting alternatives from the candidate choice set. For this, I suggest: transforming the cs object into a dataframe, deleting any rows that are not necessary (e.g. contain some combination that does not make sense), and then transforming the cs object back into a matrix. Here is an example, wherein I delete the first three rows:

cs1 <- data.frame(cs)
cs1 <- cs1[c(4:27),]
cs <- as.matrix(cs1) 

I now define the mean value for the parameters for each attribute and dummy. Because the cs object has 9 columns, I need to define nine parameters. The first one represents the increase in utility of going from 300 fishermen to 600; the second represents the increase in utility of going from 300 to 900. The next two correspond to the vessels attribute, and the last five correspond to the cost attribute. The means are stored in the object mu, and I also need to create the covariance matrix sigma, and draws from the multivariate prior distribution, which are stored in the matrix M.

mu <- c(0.5*300 *0.01,    # Parameter for jobs parameter
        0.5*600 *0.01,
        3*300 *0.01,    # Parameter for vessels
        3*500 *0.01,    
        -0.001*(5 ),
        -0.001*(10),
        -0.001*(20),
        -0.001*(35),
        -0.001*(60) )         # Parameter for cost
sigma <- diag(length(mu)) 
M <- MASS::mvrnorm(n = 100, mu = mu, Sigma = sigma) 

Before I go on, I think it is important to point out that the values chosen at this step are very important. One can and should use values obtained from previous literature, but some combinations of these values might produce strange experimental designs.

For this reason, I would suggest taking a step back and understanding whether one attribute seems to “dominate” the others. To do so, we can calculate the indirect utility of each alternative in the object cs, as such:

cs1 <- data.frame(cs)
cs1$utility <- 0
for(i in 1:nrow(cs1)){
  cs1$utility[i] <- cs1[i,1]*mu[1]+
    cs1[i,2]*mu[2]+
    cs1[i,3]*mu[3]+
    cs1[i,4]*mu[4]+
    cs1[i,5]*mu[5]+
    cs1[i,6]*mu[6]+
    cs1[i,7]*mu[7]+
    cs1[i,8]*mu[8]+
    cs1[i,9]*mu[9]  }
cs1
> head(cs1)
  Var12 Var13 Var22 Var23 Var32 Var33 Var34 Var35 Var36 utility
1     0     0     0     0     0     0     0     0     0     0.0
2     1     0     0     0     0     0     0     0     0     1.5
3     0     1     0     0     0     0     0     0     0     3.0
4     0     0     1     0     0     0     0     0     0     9.0
5     1     0     1     0     0     0     0     0     0    10.5
6     0     1     1     0     0     0     0     0     0    12.0

I can see from the code above that the indirect utility of the alternatives with “1” at a higher number of vessels (600) is significantly higher than with a lower number of vessels (300). These parameter values imply that alternatives with a hgher number of vessels will likely dominate when they show up in a choice card. This is not necessarily my expectation from theory, so perhaps I can decrease the parameter associated with the second attribute (“vessels”) from “3*300 *0.01” to “0.3*300 *0.01”. This is the output in terms of utility (the first 10 lines):

> cs1
   Var12 Var13 Var22 Var23 Var32 Var33 Var34 Var35 Var36 utility
1      0     0     0     0     0     0     0     0     0   0.000
2      1     0     0     0     0     0     0     0     0   1.500
3      0     1     0     0     0     0     0     0     0   3.000
4      0     0     1     0     0     0     0     0     0   0.900
5      1     0     1     0     0     0     0     0     0   2.400
6      0     1     1     0     0     0     0     0     0   3.900
7      0     0     0     1     0     0     0     0     0   1.500
8      1     0     0     1     0     0     0     0     0   3.000
9      0     1     0     1     0     0     0     0     0   4.500
10     0     0     0     0     1     0     0     0     0  -0.005

This way, the utility seems to be much more balanced and this set-up may yield more plausible choice sets. Having balanced choice sets is important to extract statistical information from individual choices; i.e. higher precision in the parameters of interest.

With more balanced alternatives, the next step is to take these alternatives and create choice sets. This will be pursued in future blog posts.

References:

Aizaki, H. (2012). Basic functions for supporting an implementation of choice experiments in R. Journal of statistical software50, 1-24.

Horne J (2018). choiceDes: Design Functions for Choice Studies. R package version 0.9-3,
URL https://CRAN.R-project.org/package=choiceDes.

Traets, F., Sanchez, D. G., & Vandebroek, M. (2020). Generating optimal designs for discrete choice experiments in R: the idefix package. Journal Of Statistical Software96, 1-41.

van Cranenburgh, S., & Collins, A. T. (2019). New software tools for creating stated choice experimental designs efficient for regret minimisation and utility maximisation decision rules. Journal of choice modelling31, 104-123.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s