Writing your own log-likelihood function from scratch

Sometimes it is handy to know how to write log-likelihood functions from scratch. I have read many papers that illustrate how a certain refinement of the econometric model conforms to economic theory and yields unbiased parameter estimates. To publish such a paper, you have to start from a “standard” log-likelihood function and make some type of adjustment.

For example, Englin and Shonkwiler (1995) want to correct for truncation and endogenous stratification in their data. To do so, they adjust the standard negative binomial model. They derive the density function that is most relevant for their data, which implies small changes in the corresponding log-likelihood function.

Other examples exist that propose an adjustment of the log-likelihood to conform with theory (e.g. choice set considerations). Bottom line is: it is useful to know how to program these maximum likelihood routines in R to more readily apply different models. Many of these novel models are not programmed into any software, and the authors do not often share publicly their code (which might explain why these model developments are rarely implemented by other authors).

Which leads me to the question: how to write a log-likelihood function in R?

It was useful to remind myself what maximum likelihood is all about. The maximum likelihood estimator is the one that maximizes the log-likelihood function (Wooldridge, 2010). This function is simply the product of the assumed density function of all N observations in the sample:

max \prod\limits^{N}_{i=1}  f(y_i | x_i ,\theta)

So the researcher’s first step is to make an assumption about the conditional density function f of a variable y_i given x_i (Wooldridge, 2010). There are “standard” models with well-known density functions. For example, the probability of a success in the logit model is:

P(Y=1|X) =  \frac{e^{X\beta}}{1+ e^{X\beta}}

The density function of observation y_i is:

f(y_i | x_i ,\theta) = P(Y=1|X)^{y_i} (1-P(Y=1|X)) ^{1-y_i}

If we replace this density function in the first equation, we get the likelihood function for the logit model:

max \prod\limits^{N}_{i=1}  P(Y=1|X)^{y_i}  (1-P(Y=1|X)) ^{1-y_i}

Maximizing the likelihood is equivalent to maximizing the log-likelihood, that is why we add a log to the equation above. When we take the log, instead of the product of N observations, we deal with the sum of N observations.

For illustration purposes, I will use the Poisson model. I want to write the Poisson’s log-likelihood function in R. I use R to write a function that depends on my dataframe (called df) and a number of parameters in a vector that I called par. So I write the log-likelihood function depending on Y (the dependent variable), X1 and X2 , as well as three parameters (par[1], par[2] and par[3]). The function returns the log-likelihood value.

poisson.lik = function(df,par){
  log_likelihood = sum(  df$y*(par[1] + par[2]*df$X_1 + par[3]*df$X_2)
                          -exp(par[1] + par[2]*df$X_1 + par[3]*df$X_2)
                          -lfactorial(df$y)
                          )
  return(log_likelihood)
}

Now, this function needs to find the three parameter values that maximize the function I typed. To find such values, I use the optim function. This function requires the arguments of the functions, which are a set of starting values (3 in our case), and the name of the dataframe, as well as the name of the function to be optimized.

optim(par=(1,1,1),poisson.lik,df=data,control=list(fnscale=-1))

Since we want to maximize (rather than minimize which is the default in optim), I add control=list(fnscale=-1) to the list of options in the optim function.

I illustrate below how this function can estimate the parameters of interest. I first generate a sample of 100 observations that follow a Poisson distribution conditional on X1 and X2. The true parameters are 0.1, -0.05 and 0.5. Then I write the log-likelihood function and run the optim function. The code below can be copied and replicated without any trouble. Note: the precise outcome will change due to the data generated being slightly different in every computer.

x_1 <- runif(100, min = 0 , max=50)
x_2 <- runif(100, min = 0 , max=10)
beta_1   <- 0.1
beta_2   <- -0.05
beta_3   <- 0.5
mu       <- exp(beta_1 + beta_2*x_1 + beta_3*x_2)
y        <- rpois(n=100, lambda = mu)
data <- data.frame(ID=c(1:100), y, x_1, x_2)
poisson.lik = function(df,par){
  log_likelihood = sum(  df$y*(par[1] + par[2]*df$x_1 + par[3]*df$x_2)
                          -exp(par[1] + par[2]*df$x_1 + par[3]*df$x_2)
                          -lfactorial(df$y)
                          )
  return(log_likelihood)
}
optim(par=c(1,-1,1),poisson.lik,df=data,control=list(fnscale=-1))

I get the following output:

$par
[1]  0.26442288 -0.04681021  0.46706506
$value
[1] -207.9706
$counts
function gradient 
     154       NA 
$convergence
[1] 0
$message
NULL

My log-likelihood function estimated 0.26442288, -0.04681021 and 0.46706506 as the parameters of interest. These values are very close to the true parameters (that I defined myself). The output also indicates that -207.97 is the maximum value that the log-likelihood function takes.

When I use the glm function to estimate a Poisson model, I get the following output:

> summary(glm(y ~ x_1 + x_2, family="poisson", data=data))
$par
[1]  0.1234623 -0.0576165  0.5048355
$value
[1] -210.9898
$counts
function gradient 
     140       NA 
$convergence
[1] 0
$message
NULL
Call:
glm(formula = y ~ x_1 + x_2, family = "poisson", data = data)
Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.4597  -0.7073  -0.1575   0.7335   2.6595  
Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.124948   0.146268   0.854    0.393    
x_1         -0.057629   0.002568 -22.439   <2e-16 ***
x_2          0.504665   0.016912  29.841   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
    Null deviance: 2418.230  on 99  degrees of freedom
Residual deviance:   98.362  on 97  degrees of freedom
AIC: 427.98
Number of Fisher Scoring iterations: 4

The estimated parameters are also very close (but surprisingly the log-likelihood value is lower). Hence, writing the log-likelihood function from scratch or using a command in R that is already built-in yields almost the same results.

Advertisement

1 Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s