Getting around the “initial value in ‘vmmin’ is not finite” error

As many other researchers, I have been trying to write my own log-likelihood function. My goal is to eventually change a standard log-likelihood function to account for other aspects that are relevant for my research. Some examples exist on various websites as to how to write the log-likelihood function in R: for example here or here for the Poisson distribution (both very good sources).

After writing the log-likelihood function, I try to find the maximum log-likelihood value using the R’s optim function (alternatives exists such as the mle function but this still uses the optim function). This seems to be straighforward: optim should find the parameter values that maximize the log-likelihood function. More often than not, optim yields the following error message:

Error in optim(start, f, method = method, hessian = TRUE, ...) :
  initial value in 'vmmin' is not finite

My first instinct is to google this error message to find ways to tackle it. I was surprised by the amount of people posting on help forums struggling with the same problem. So I decided to try to “solve” this problem.

I say “solve” because in fact this error message is not a problem with the code or the program and does not require solving. It is very likely that nothing is wrong with your log-likelihood function.

A good discussion as to why this error arises is at stackoverflow.com. To sum up their discussion, the error arises because the optimizer cannot do its job with the starting values provided by us. The log-likelihood function is very sensitive to the parameter starting values.

These are the arguments needed for optim to run (taken from the optim help documentation):

optim(par, fn, gr = NULL, ...,
      method = c("Nelder-Mead", "BFGS", "CG", "L-BFGS-B", "SANN",
                 "Brent"),
      lower = -Inf, upper = Inf,
      control = list(), hessian = FALSE)

The first argument (“par”) are the initial values for the parameters, which are set by us (the researchers).

In theory, to solve this error all we need to do is to change the starting values.

Except it doesn’t always work. I tried changing the starting values several times and I could not find a combination that optim could work with. Trying to find a set of starting values that does work can end up being cumbersome, especially if you have many parameters to play around with (or very complex LL functions).

My “solution” is to let optim start with multiple sets of starting values. Instead of manually changing these starting values, I am generating different combinations for the optim function to try with. I am not changing anything in my log-likelihood function; all I am doing is to ask optim to run more than once and find the solution to each combination of starting values. With some combinations, optim will yield an error message, and with some combinations optim will find the maximum log-likelihood value. I can then dig in and find the maximum LL value out of all solutions found.

Now to the example. First I generate a sample of 400 individuals. This is an application of the travel cost method. The number of visits is my dependent variable which follows a Poisson distribution:

n <- 400



TC       <- runif(n, min = 0 , max=50)   #rnorm(n, mean = 25, sd = 12)
TC[TC<0] <- 0
wages    <- runif(n, min = 0 , max=500)

beta_0   <- 0.1
beta_1   <- -0.05
beta_2   <- 0.005
mu       <- exp(beta_0 + beta_1*TC + beta_2*wages)
visits   <- rpois(n=n, lambda = mu)     # Assuming trips are Poisson distributed

data <- data.frame(ID=c(1:n), visits, TC, wages)

The number of visits depends on wages and travel cost.

I type my own (standard) log-likelihood function:

poisson.lik = function(df,par){
  log_likelihood = sum(+df$visits*(par[1] + par[2]*df$TC + par[3]*df$wages)
                          -exp(par[1] + par[2]*df$TC + par[3]*df$wages)
                          -lfactorial(df$visits)
                          )
  return(log_likelihood)
}

Now it is time to use the optim function to maximize this LL function. If I choose 2 to be my starting values (c(2,2,2,)), I get the error message.

> optim(par=c(2,2,2),poisson.lik,df=data,control=list(fnscale=-1), method="BFGS")
Error in optim(par = c(2, 2, 2), poisson.lik, df = data, control = list(fnscale = -1),  : 
  initial value in 'vmmin' is not finite

I create instead a dataframe called “starting.values” with three variables drawn from a normal distribution with mean zero and standard deviation 1.

starting.values <- data.frame(rnorm(100,0,1),rnorm(100,0,1),rnorm(100,0,1))

I use this dataframe within a loop which runs optim for each row of starting values:


values <-0
for(i in 1:nrow(starting.values)){
  
assign(paste("result", i, sep="_") , tryCatch(optim(par=starting.values[i,],poisson.lik,df=data,control=list(fnscale=-1), method="BFGS"), error=function(err) NA) )
values[i]  <-   tryCatch(get(paste("result" , i, sep = "_"))[["value"]], error=function(err) NA)
  
}

The secret to get this to run is to use the tryCatch function. This function returns NA in case the optim yields the “vmin” error. Because of this, even if optim finds an error, the loop continues.

Out of 100 tries of the optim function (100 being the number of rows in the starting.values dataframe), it was able to find a solution 90 out of 100 times. The optim function returned an error with the remaining 10 combinations of starting values.

Once the loop above runs, all I need is to find the solution from optim which maximizes the log-likelihood value:

> which.max(values)
[1] 97
> get(paste("result", which.max(values), sep="_"))
$par
  rnorm.100..0..1. rnorm.100..0..1..1 rnorm.100..0..1..2 
      -0.187889779       -0.043748484        0.005410105 

$value
[1] -547.7552

$counts
function gradient 
     299       21 

$convergence
[1] 0

$message
NULL

According to my results, the maximum LL value found is -547.7552, with the parameters -0.19, -0.04 and 0.005. I know the true parameter values (because I defined them when generating the data). The second and third parameter estimates are almost identical to the true values (-0.05 and 0.005), but the intercept is not. Perhaps my sample of 400 is not large enough to correctly identify the intercept, but I am quite satisfied with the solution found.

If you want to implement this “solution” in your own application, all you need to do is to copy the optim loop above and change the optim function to adapt it to your case. That is, change the name of the log-likelihood function (“poisson.lik”) and the name of your dataframe (“data”).

I would be happy to know if this idea works for you, as well as discuss challenges you face.

Note: changing the optim method (“method=”) might help solve this error, but it can still yield the “vmin” error (especially with large samples).

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s