I have not been able to found the code in R to estimate a Negative Binomial model by typing the log-likelihood function by hand. That is all I aim to do with this blog post (with some explanation about my strategy to do so).

A few weeks ago, I showed how to estimate a Poisson model by typing the log-likelihood function by hand. Like I said, the beauty of doing this is to easily make any changes to the log-likelihood function depending on the research question and/or what makes most sense from a theoretical point of view. Pre-programmed packages (such as the glm and glm.nb) do not have this flexibility.

To start off, I had to find the log-likelihood function for the Negative Binomial model. There are many authors that derive it, but the one I used is Zwilling (2013). In page 2 (equation 3), they derive the log-likelihood function and on page 3 they share the *Mathematica* code to estimate a Negative Binomial model. The coding language is very similar to R. Below is the Negative Binomial model’s log-likelihood function:

Source: Zwilling (2013).

Let y be the vector with the dependent variable and x_1 and x_2 are the independent variables (we assume two for simplicity, but you can easily adjust the code to include more or fewer).

In R, the code looks like this:

```
nb.lik = function(df,par){
log_likelihood = sum( y*log(par[4]) +
y*(par[1] + par[2]*x_1 + par[3]*x_2) -
(y+1/par[4])*log(1+par[4]*exp(par[1] + par[2]*x_1 + par[3]*x_2))+
Lgamma(y+1/par[4]) -
Lgamma(y+1) -
Lgamma(1/par[4])
)
return(log_likelihood)
}
```

After defining the log-likelihood function, we then use optim to find the parameter values that maximize the log-likelihood function:

`optim(par=c(0.001,-0.05,0.005,200),nb.lik,df=data,control=list(fnscale=-1))`

The option “control=list(fnscale=-1)” is there to make sure optim will maximize instead of minimize (since minimization is optim’s default).

I want to explain briefly why I used the Lgamma function inside the log-likelihood function when defining the nb.lik function.

The negative binomial log-likelihood is dependent on the gamma function. There is a problem with using the in-built gamma function in R. For very large numbers, the gamma function yields Infinity. For example, I have counts in the hundreds, which means the gamma function yields infinity inside the log-likelihood expression. Consequently, the natural logarithm of “Inf” is also “Inf”, resulting in NAs in the log-likelihood expression. Instead of “Inf”, the result of gamma given a large argument is just a ridiculously high number, whose logarithm may not be that high.

For example, if the argument of the gamma function is 200, then using the in-built gamma function in R:

```
> gamma(200)
[1] Inf
> log(gamma(200))
[1] Inf
```

Instead, I used the Lgamma function in the Rfast package to retrive the natural logarithm of the gamma(200):

```
> Lgamma(200)
[1] 857.9337
```

857 is a much more manageable number that will no longer yield NAs in the log-likelihood function.

Finally, I want to illustrate how the new function I defined (nb.lik) can retrive the same result as the in-build *glm.nb* function.

I first generate some data (same as last blog post’s):

```
x_1 <- runif(100, min = 0 , max=50)
x_2 <- runif(100, min = 0 , max=10)
beta_1 <- 0.1
beta_2 <- -0.05
beta_3 <- 0.5
mu <- exp(beta_1 + beta_2*x_1 + beta_3*x_2)
y <- rnbinom(n=1000, size=0.15, mu = mu)
data <- data.frame(ID=c(1:1000), y, x_1, x_2)
```

The true parameters are 0.1, -0.05, 0.5 and 0.15, which is the dispersion parameter in the Negative Binomial model (theta, or size in the *rnbinom* function).

Note: In some papers, alpha is the parameter used (which is also the parameter that the log-likelihood function uses), which is 1/theta (theta is the parameter that the *rnbinom* uses).

I now define the nb.lik function, i.e. the negative binomial log-likelihood function, and then estimate the parameters of interest:

```
> nb.lik = function(df,par){
+ log_likelihood = sum( y*log(par[4]) +
+ y*(par[1] + par[2]*x_1 + par[3]*x_2) -
+ (y+1/par[4])*log(1+par[4]*exp(par[1] + par[2]*x_1 + par[3]*x_2))+
+ Lgamma(y+1/par[4]) -
+ Lgamma(y+1) -
+ Lgamma(1/par[4])
+ )
+ return(log_likelihood)
+ }
> optim(par=c(0.001,-0.05,0.005,200),nb.lik,df=data,control=list(fnscale=-1))
$par
[1] -0.1995365 -0.0414935 0.4837096 6.5105415
$value
[1] -1930.25
$counts
function gradient
373 NA
$convergence
[1] 0
$message
NULL
Warning messages:
1: In log(par[4]) : NaNs produced
2: In log(1 + par[4] * exp(par[1] + par[2] * x_1 + par[3] * x_2)) :
NaNs produced
3: In log(par[4]) : NaNs produced
4: In log(1 + par[4] * exp(par[1] + par[2] * x_1 + par[3] * x_2)) :
NaNs produced
```

The estimated parameters are:

```
$par
[1] -0.1995365 -0.0414935 0.4837096 6.5105415
```

With the exception of the first one, they are very close to the true parameters of e parameters are 0.1, -0.05, 0.5 and the dispersion parameter, 1/0.15 = 6.66.

If I am to estimate the same model with the in-built package (glm.nb):

```
> data <- data.frame(y,x_1,x_2)
> model_nb <- glm.nb(y ~ x_1 + x_2, data=data)
> summary(model_nb)
Call:
glm.nb(formula = y ~ x_1 + x_2, data = data, init.theta = 0.1536655628,
link = log)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.3999 -0.9791 -0.7002 -0.2673 2.5316
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.195532 0.218233 -0.896 0.37
x_1 -0.041853 0.006005 -6.970 3.16e-12 ***
x_2 0.484276 0.027734 17.461 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for Negative Binomial(0.1537) family taken to be 1)
Null deviance: 1057.28 on 999 degrees of freedom
Residual deviance: 733.31 on 997 degrees of freedom
AIC: 3868.5
Number of Fisher Scoring iterations: 1
Theta: 0.15367
Std. Err.: 0.00980
2 x log-likelihood: -3860.49600
```

The estimated parameters are identical to the ones estimated by hand. The maximum likelihood value by hand was -1930.25, which is the same as the one achieved with the glm.nb function:

```
> -3860.49600/2
[1] -1930.248
```

You now have the code which you can use to estimate your negative binomial model by hand. Adjustments for the code to fit your own dataframe and number of explanatory variables are relatively straighforward. Comment below if you need help with this. I also show that the estimated parameters and overall solution is just as good as using an in-built package in R. The advantage to doing it by hand is the possibility to make adjustments to the log-likelihood function if needed.

**References: **

Zwilling, M. L. (2013). Negative binomial regression. *The Mathematica Journal*, *15*, 1-18.

## 1 Comment