Apollo package (1): Conditional Logit model

I want to talk about one of the most useful R packages for environmental economists doing discrete modelling: the Apollo package.

Back in 2019, I was struggling with a way to model the (discrete) choice data I collected during my PhD. I wanted to implement the models we consider to be standard when it comes to choice modelling, such as the conditional logit, mixed logit and latent class models.

There are packages available to estimate conditional logit, mixed logit or latent class models, such as the gmnl, mlogit or mclogit packages. Different packages have different advantages and disadvantages. However, I wanted to modify the log-likelihood function and account for different scale across datasets. No package was available allowing for preference heterogeneity and scale differences at the same time. I would have had to write my model from scratch to make various allowances to the log-likelihood function. For example, the aforementioned packages could not account for choice availability: that is, whether a certain alternative was available when the respondent made a choice. Overall, I felt that there was no package available that could estimate all of the discrete choice models we have in our toolbox.

I was fortunate that the Apollo package became publicly available in early 2019 (Hess and Palma, 2019), saving me a lot of time in the process. The Apollo package is an R package for choice modelling developed by Stephane Hess and David Palma at the Choice modelling Centre. It follows a very logical structure, and I can tailor my code to my needs. The Apollo package helps us integrate in our models a multitude of aspects relevant to the individual’s decision-making process.

There is a lot of useful documentation in the official website. It has a superb manual which explains how to implement all the models it is capable of estimating. The website also includes examples of R script, estimation results and data to help write R scripts (I am a big fan of learning by doing, so I usually start from a pre-existing code!). Another cool thing is that it is continuously under development and new features are continuously being added.

As usual, I install the Apollo package (if not yet in the computer) and call the library function:

install.packages("apollo")
library(apollo)

To exemplify using the Apollo package, I will estimate a conditional logit model using simulated data. This data was provided as part of an ICMC competition, wherein researchers estimate discrete choice models for three different datasets.

The simulated dataset includes choice data about mode of transportation choice. The dependent variable is the choice between travelling by car, airplane, rail or high-speed rail. Each mode has a cost, as well as a time of travel. I expect that the higher the cost of travel and travel time, the less likely an individual will choose to travel by a given mode of transport.

The Apollo package works like a huge function, wherein each component is defined by the researcher. The first step is to initialise the code and give names to the models.

apollo_initialise()

apollo_control = list(
  modelName       = "MNL_sim",
  modelDescr      = "Simple MNL model",
  indivID         = "ID", 

  outputDirectory = "output"
)

I defined the individual ID variable (“ID”); the rest of the code is just naming the model and description. The second step is to decide which are the parameters to be estimated. In this case, I want to estimate alternative specific constants (asc) for each mode of transport, and a parameter for cost, time and time for access. The name I gave to the parameters is pretty self-explanatory:

apollo_beta=c(asc_car                      = 0,
              asc_air                      = 0,
              asc_rail                     = 0,
              asc_hsr                  = 0,
              beta_cost                = 0,
              beta_time                = 0 , 
              beta_access              = 0 )

apollo_fixed = c("asc_car")

They are initially set to zero (i.e. the parameters’ starting value). Setting the starting values in a straightforward manner is already a huge advantage of the Apollo package. Some models take hours to run, so to save time, I can set the starting values to the values that I know will maximize the log-likelihood function, so that I can obtain the optimal solution much faster.

In addition to the parameter names and starting values, we have the option of choosing some parameters to be fixed at a certain value. Since I have a full set of alternative specific constants (ASC), one of them should be fixed at zero, and the rest of the ASCs are relative to the one that is fixed (travelling by car in this case).

Finally, the Apollo package has this intermediate function to make sure everything written before is correct. I just run this function:

> apollo_inputs = apollo_validateInputs()
Several observations per individual detected based on the value of ID. Setting panelData in apollo_control set to TRUE.
All checks on apollo_control completed.
All checks on database completed.

This leads me to what I call the second part of the code. This second part is basically a single function starting with the “{” parenthesis:

apollo_probabilities=function(apollo_beta, apollo_inputs, functionality="estimate"){
  
  apollo_attach(apollo_beta, apollo_inputs)
  on.exit(apollo_detach(apollo_beta, apollo_inputs))
  
  P = list()

I don’t change this code above, but the next piece of code is where I spend a lot of time tailoring it to my data:

  V = list()
  V[["car"]]  = asc_car + car_cost * beta_cost + car_time * beta_time
  V[["air"]]  = asc_air + air_cost * beta_cost + air_time * beta_time  + air_access * beta_time * beta_access    
  V[["rail"]] = asc_rail+ rail_cost* beta_cost + rail_time* beta_time  + rail_access* beta_time * beta_access
  V[["hsr"]]  = asc_hsr + hsr_cost * beta_cost + hsr_cost * beta_time  + hsr_access * beta_time * beta_access

  mnl_settings = list(
    alternatives  = c(car=1, air=2, rail=3, hsr=4), 
    choiceVar     = choice,
    utilities     = V
  )

The “V” list defines the indirect utility function for each alternative. The ASC captures the average utility of each mode of transport. Cost of each mode of transport is multiplied by the parameter of cost of travel, and so is time. The cost and time of travel variables are individual and alternative-specific. Access (which is the time needed to access the mode of transport) is multiplied by both the parameter of access, but also the parameter for time. One should change this list of indirect utilities according to the problem at hand.

Finally, in the mnl_settings, I define the name of the choice variable (just “choice” in this case), and what the names given in the V list correspond in the choice variable (e.g. “car” equals to 1 in the choice variable).

Finally, I estimate the model by running the last lines of my code:

  P[["model"]] = apollo_mnl(mnl_settings, functionality)

  P = apollo_panelProd(P, apollo_inputs, functionality)
  

  P = apollo_prepareProb(P, apollo_inputs, functionality)
  return(P)
}


model = apollo_estimate(apollo_beta, apollo_fixed, apollo_probabilities, apollo_inputs)

I do not usually change the model above. The model object shows us the estimation results including parameter estimates:

> apollo_modelOutput(model)
Model run by anafl using Apollo 0.2.7 on R 4.1.1 for Windows.
www.ApolloChoiceModelling.com

Model name                       : MNL_sim
Model description                : Simple MNL model
Model run at                     : 2022-04-05 11:10:26
Estimation method                : bfgs
Model diagnosis                  : successful convergence 
Number of individuals            : 400
Number of rows in database       : 4000
Number of modelled outcomes      : 4000

Number of cores used             :  1 
Model without mixing

LL(start)                        : -5545.18
LL(0)                            : -5545.18
LL(C)                            : -4225.05
LL(final)                        : -3758.19
Rho-square (0)                   :  0.3223 
Adj.Rho-square (0)               :  0.3212 
Rho-square (C)                   :  0.1105 
Adj.Rho-square (C)               :  0.1091 
AIC                              :  7528.39 
BIC                              :  7566.15 

Estimated parameters             :  6
Time taken (hh:mm:ss)            :  00:00:1.16 
     pre-estimation              :  00:00:0.4 
     estimation                  :  00:00:0.48 
     post-estimation             :  00:00:0.28 
Iterations                       :  25  
Min abs eigenvalue of Hessian    :  5.882557 

Unconstrained optimisation.

Estimates:
               Estimate        s.e.   t.rat.(0)    Rob.s.e. Rob.t.rat.(0)
asc_car        0.000000          NA          NA          NA            NA
asc_air       -1.351050    0.197495      -6.841    0.199502        -6.772
asc_rail      -2.148340    0.083604     -25.697    0.078042       -27.528
asc_hsr       -1.222939    0.173707      -7.040    0.182353        -6.706
beta_cost     -0.027547    0.001149     -23.974    0.001168       -23.587
beta_time     -0.005142  4.2422e-04     -12.122  4.3666e-04       -11.777
beta_access    1.409348    0.359438       3.921    0.363383         3.878

There is a lot of information about the estimation process (such as time needed for convergence) and whether the model converged to begin with. Fortunately, it did.

We can see in the estimated parameters that, as expected, both cost and time decrease the probability of choosing a certain mode of transport. The parameters are both negative and statistically significant. Moreover, relative to travelling by car, travelling by other modes of transport brings disutility, as the ASCs are negative and statistically significant. Finally, the value of time needed to access the modes of transport is 1.409348*-0.005142=-0.007246867, which means that the time needed to access transportation brings even more disutility than time of travel in vehicle.

I find that this way of estimating discrete choice models is flexible yet simple. It might be overwhelming for a newbie that just wants to estimate a conditional logit model in a single line. In that case, going for the mlogit or gmnl packages might be the way to go. For the intermediate to advanced choice modeller, I think it is worth it to get familiar with the Apollo package.

Overall, the Apollo package is a good package to play around with. The manual is a great guide to help setting up the model of choice.

References:

Hess, S. & Palma, D. 2019; Apollo: a flexible, powerful and customisable freeware package for choice model estimation and application, Journal of Choice Modelling, Volume 32, September 2019, 100170

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s