Combining RP&SP data: identifying the scale parameter

In this blog post I will be finishing my series of posts on the methods and challenges of combining revealed preference (RP) and stated preference (SP) data. The final aim is to replicate the findings of Swait and Louviere (1993). To fully understand the content of this post, I recommend reading Part 1 and Part 2.

In Part 2 of combining RP and SP data, I combined two datasets which are intended to represent a RP and a SP dataset. They are generated from the same underlying preferences (i.e. the parameters are the same given the explanatory variables). The only difference is that the scale of the errors of the RP dataset is lower than the SP dataset. This is meant to represent hypothetical bias of SP choices, for example.

In this blog post, I will estimate the two datasets jointly, but this time accounting and identifying the scale parameter.

Again, I start similarly to the previous post by generating two samples that only differ in terms of the scale of the Gumbel errors:

library(evd) 

# First Dataset

error1.A <- rgumbel(500 ,loc=0, scale=1)
error1.B <- rgumbel(500 ,loc=0, scale=1)
error1.C <- rgumbel(500 ,loc=0, scale=1)

XA <- runif(500, min = 0, max = 100)
XB <- runif(500, min = 0, max = 100)
XC <- runif(500, min = 0, max = 100)

YA <- runif(500, min = 0, max = 1000)
YB <- runif(500, min = 0, max = 1000)
YC <- runif(500, min = 0, max = 1000)

gamma1.1 <- 1.0
gamma1.2 <- 0.3
gamma1.3 <- -0.03
gamma1.4 <- 0.005

Utility_1.A <- gamma1.1+gamma1.3*XA+gamma1.4*YA+error1.A
Utility_1.B <- gamma1.2+gamma1.3*XB+gamma1.4*YB+error1.B
Utility_1.C <- gamma1.3*XC+gamma1.4*YC+error1.C

choice1 <- 0
for(i in 1:500){
choice1[i] <- apply(cbind(Utility_1.A[i],Utility_1.B[i],Utility_1.C[i]),
                    1,which.max)
}

# Second Dataset

error2.B <- rgumbel(500 ,loc=0, scale=2)
error2.A <- rgumbel(500 ,loc=0, scale=2)
error2.C <- rgumbel(500 ,loc=0, scale=2)

gamma2.1 <- 1.0
gamma2.2 <- 0.3
gamma2.3 <- -0.03
gamma2.4 <- 0.005

Utility_2.D <- gamma2.1+gamma2.3*XA+gamma2.4*YA+error2.A
Utility_2.E <- gamma2.2+gamma2.3*XB+gamma2.4*YB+error2.B
Utility_2.F <- gamma2.3*XC+gamma2.4*YC+error2.C

choice2 <- 0
for(i in 1:500){
choice2[i] <- apply(cbind(Utility_2.D[i],Utility_2.E[i],Utility_2.F[i]),
                    1,which.max)
}

In order to combine the two choice vectors, I will change the second choices choice2:

choice2 <- choice2+3

Then, choice1 takes the values 1, 2 and 3, and choice2 takes the values 4, 5, and 6. To merge these two vectors, I simply write:

choice1.2 <- c(choice1, choice2)

There is another variable I need to create. DataID takes the value 0 if the observation comes from the first dataset and 1 if it comes from the second dataset. This is some sort of indicator variable.

DataID <- 0
DataID[1:500] <- 0
DataID[501:1000] <- 1

Similarly to Part 2, I wrote my own log likelihood function. It’s a simple conditional logit model, almost identical to Part 2‘s.

I want to find the five parameters that maximize the likelihood: four gammas (gamma1 through 4) and a scale parameter. However, the log likelihood has some difficulty finding the scale parameter because it does not show up linearly in the function. So I define the scale outside of the log likelihood function.

The initial value of the parameters as set as:

parameters <- c(1.0, 0.3, -0.03, 0.005)

The vectors of the log likelihood function (explanatory variables X and Y, and the dependent variable CHOICE) are:

X_vectorA <- XA
X_vectorB <- XB
X_vectorC <- XC
Y_vectorA <- YA
Y_vectorB <- YB
Y_vectorC <- YC
CHOICE <- choice1.2

And the log likelihood function is called MNL_loglik2 and is defined as:

MNL_loglik2 <- function(parameters){

Utility_A <- parameters[1]+parameters[3]*X_vectorA+parameters[4]*Y_vectorA
Utility_B <- parameters[2]+parameters[3]*X_vectorB+parameters[4]*Y_vectorB
Utility_C <- parameters[3]*X_vectorC+parameters[4]*Y_vectorC

Utility_D <- SCALE*parameters[1]+SCALE*parameters[3]*X_vectorA+
             SCALE*parameters[4]*Y_vectorA
Utility_E <- SCALE*parameters[2]+SCALE*parameters[3]*X_vectorB+
             SCALE*parameters[4]*Y_vectorB
Utility_F <- SCALE*parameters[3]*X_vectorC+SCALE*parameters[4]*Y_vectorC

choice_rp <- (Utility_A* ifelse(CHOICE == 1,1,0) + 
              Utility_B* ifelse(CHOICE == 2,1,0) + 
              Utility_C* ifelse(CHOICE == 3,1,0) )

Prob_rp <- (1-DataID)*exp(choice_rp)/
           (exp(Utility_A)+exp(Utility_B)+exp(Utility_C))

choice_sp <- (Utility_D* ifelse(CHOICE == 4,1,0) + 
              Utility_E* ifelse(CHOICE == 5,1,0) + 
              Utility_F* ifelse(CHOICE == 6,1,0) )

Prob_sp <- DataID*exp(choice_sp)/
           (exp(Utility_D)+exp(Utility_E)+exp(Utility_F))

loglik <- sum(log(Prob_sp[501:1000]))+sum(log(Prob_rp[1:500]))

return(loglik)
}

The difference between the likelihood function above and a traditional conditional logit model is that Utility_D, Utility_E and Utility_F are multiplied by a SCALE parameter, which I have yet not defined. The second difference is that I use the DATAID indicator variable to tell the function if the observation corresponds to the first or second datasets.

To find the parameters that maximize the log likelihood, I need an optimizer, so I use maxLik:

library(maxLik)

Now, I just need to define a value for the SCALE parameter for the function to run:

SCALE <- 1
MNL <- maxLik(MNL_loglik2, start = c(1.0, 0.3, -0.03, 0.005))

I can now see the estimated parameters and the log likelihood value:

> summary(MNL)
--------------------------------------------
Maximum Likelihood estimation
Newton-Raphson maximisation, 4 iterations
Return code 2: successive function values within tolerance limit
Log-Likelihood: -743.55 
4 free parameters
Estimates:
Estimate Std. error t value Pr(> t) 
[1,] 0.790555 0.097949 8.071 6.97e-16 ***
[2,] 0.277164 0.104013 2.665 0.00771 ** 
[3,] -0.021610 0.001824 -11.844 < 2e-16 ***
[4,] 0.003956 0.000209 18.932 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
--------------------------------------------
> MNL$maximum
[1] -743.55

This is not yet the end of the story. We want to find the highest maximum likelihood, so by changing the SCALE parameter we might increase the log likelihood value. If the SCALE is 1.2, then:

> SCALE <- 1.2
> MNL <- maxLik(MNL_loglik2, start = c(1.0, 0.3, -0.03, 0.005))
> MNL$maximum
[1] -755.3256

And if the scale is 0.8, then:

> SCALE <- 0.8
> MNL <- maxLik(MNL_loglik2, start = c(1.0, 0.3, -0.03, 0.005))
> MNL$maximum
[1] -733.3768

Because I am lazy, I wrote a function to assign 100 different values for the scale and run the log likelihood function to all these 100 different scale values:

LOGLIK <- 0
for (i in 1:100){
   SCALE <- 0.02*i
   MNL <- maxLik(MNL_loglik2, start = c(1.0, 0.3, -0.03, 0.005))
   LOGLIK[i] <- MNL$maximum
}

LOGLIK is a vector with 100 values of the log likelihood values assuming different SCALE parameters.

I want to find what is the scale that generates the maximum log likelihood value. I am going to create the vector SCALE, with 100 values, with a similar function to the one above.

for (i in 1:100){
   SCALE[i] <- 0.02*i
}

I can plot the log likelihood values against the scale parameters by typing:

plot(SCALE, LOGLIK, 
     xlab="Scale parameter", ylab="Log Likelihood Value", pch=19)

The command above generates the following graph:

LogLik

As expected, there is a single peak of the log likelihood values. The maximum log likelihood is:

> max(LOGLIK)
[1] -727.3552

This happens when the scale is 0.56, as shown here:

> which(LOGLIK==max(LOGLIK))
[1] 28
> SCALE[28]
[1] 0.56

It turns out that this scale parameter is just the inverse of the Scale parameter of the Gumbel errors.

> 1/0.56
[1] 1.785714

1.78 is basically two, which is the scale of the Gumbel error, so we have retrieved that scale. For illustration, here are the estimated parameters when the scale parameter is set to 0.56:

> SCALE <- 0.56
> MNL <- maxLik(MNL_loglik2, start = c(1.0, 0.3, -0.03, 0.005))
> summary(MNL)
--------------------------------------------
Maximum Likelihood estimation
Newton-Raphson maximisation, 3 iterations
Return code 2: successive function values within tolerance limit
Log-Likelihood: -727.3552 
4 free parameters
Estimates:
Estimate Std. error t value Pr(> t) 
[1,] 1.1232568 0.1285006 8.741 < 2e-16 ***
[2,] 0.3518927 0.1359791 2.588 0.00966 ** 
[3,] -0.0295161 0.0024368 -12.113 < 2e-16 ***
[4,] 0.0053894 0.0002889 18.655 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
--------------------------------------------

The main message is that: if the two datasets come from the same (gamma) parameters and we account for differences in scale, they can be sucessfully combined.

 

 

References: 

Swait, J., & Louviere, J. (1993). The role of the scale parameter in the estimation and comparison of multinomial logit models. Journal of marketing research30(3), 305-314.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s