Panel Data applications to tackle pollution

Many research questions in environmental economics require the use of panel data. For example, Jaffe and Palmer (1997) use panel data to investigate whether stimulating domestic innovation has a positive effect on domestic firms, rather than foreign firms. The dynamic at play is that increasing environmental regulations will foster innovation by firms, who then can export their environmentally friendly technologies and become major players in that field.

To provide some evidence of this hypothesis, Jaffe and Palmer (1997) collect data on pollution control expenditures, and measures of innovative activity and performance across firms and time. Innovation is measured by the total private expenditures in research and development and the number of patents. The authors ultimately find that R&D expenditures are positively affected by lagged environmental compliance expenditures.

The data in Jaffe and Palmer (1997) has two dimensions: industries and years. The same industries are observed for a series of years, resulting in a dataset which is two-dimensional. This is a common setup for panel data.

To use panel data, we generally need data spanning over at least one dimension. Time is a very common dimension to have in panel data (e.g. years, days, months, decades). The other dimension is more varied: it can be comprised of individuals (e.g. Egan and Herriges, 2006), firms (e.g. Elsayed and Paton, 2005), provinces (e.g. Du et al., 2012) or countries (e.g. Welsch, 2006).

Perhaps the most important advantage of having panel data is the possibility to control for year and individual fixed effects. For example, one individual might be systematically affected by some unobserved variable which is actually fixed over time. Let’s say this is his month of birth. Because this is a fixed effect, i.e. there is no variation year from year, the effect of this variable cannot be estimated if we were able to observe the individual over time. However, it can be controlled for in a panel data analysis by, for example, including a dummy variable for that specific individual. The coefficient associated with this dummy would capture all individual specific effects (both observed and unobserved). It would not be possible to estimate the effect of these time-invariant factors. So the effect of month of birth variable would be captured partly by the dummy variable. At the very least, it would be controlled for by this dummy variable.

For this reason, panel data is more useful to test causal relationships than the linear regression counterpart, so long as the variable of interest varies over both the time and space dimensions. One can argue that it is possible to obtain an unbiased estimate of the effct of the variable of interest in the dependent variable, since all other factors are controlled for.

I will be illustrating how to use panel data in environmental economics. This time I don’t have a specific dataset to provide as an example. Instead, practitioners are free to take the R code and apply it to their own data. I will use an old dataset that I compiled many years ago.

My research question at the time was whether the ratification of the Kyoto protocol had decreased carbon emissions of the country. Interestingly, many of the papers I have found that use panel data in environmental economics have research questions related to pollution (e.g. Jaffe and Palmer, 1997; Welsch, 2006; Du et al., 2012; Elsayed and Paton, 2005). Hence, my variable of interest is a dummy that takes the value of one when and if a country has ratified the Kyoto protocol, and zero otherwise. I use several control variables: population, GDP and other dummy variables.

Before we use the panel structure, I will run a simple OLS regression with my data.

model <- lm(CO21 ~ lPopulation + Kyoto_Protocol + Commitment +
                                    GDP1 + Urban + Technology1 + 
                                    Agriculture1 + Industry1 + Machinery1 + 
                                    Women1 + Primary1 + Secondary1 + Tertiary1,
> summary(model)

lm(formula = CO21 ~ Population + Kyoto_Protocol + Commitment + 
GDP1 + Urban + Technology1 + Agriculture1 + Industry1 + Machinery1 + 
Women1 + Primary1 + Secondary1 + Tertiary1, data = Panel_Data)

Min 1Q Median 3Q Max 
-0.41890 -0.12002 -0.04190 0.08138 1.02431

                Estimate Std. Error t value Pr(>|t|)    
(Intercept)     9.748e-02  1.479e-01   0.659  0.51015    
lPopulation     9.754e-03  6.462e-03   1.510  0.13170    
Kyoto_Protocol  9.056e-03  1.869e-02   0.485  0.62818    
Commitment     -3.931e-03  1.717e-03  -2.290  0.02240 *  
GDP1           -5.224e-06  1.112e-06  -4.696  3.31e-06 ***
Urban          -7.554e-04  7.101e-04  -1.064  0.28786    
Technology1    -2.282e-03  7.612e-04  -2.998  0.00283 ** 
Agriculture1    2.490e-03  1.689e-03   1.475  0.14076    
Industry1       1.311e-02  1.259e-03  10.409  < 2e-16 ***
Machinery1     -3.296e-03  1.402e-03  -2.351  0.01905 *  
Women1         -4.172e-03  1.028e-03  -4.059  5.61e-05 ***
Primary1       -3.691e-03  9.247e-04  -3.991  7.41e-05 ***
Secondary1      4.600e-03  7.282e-04   6.317  5.32e-10 ***
Tertiary1       1.360e-05  6.373e-04   0.021  0.98299    
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1974 on 581 degrees of freedom
(12569 observations deleted due to missingness)
Multiple R-squared: 0.373, Adjusted R-squared: 0.359 
F-statistic: 26.59 on 13 and 581 DF, p-value: < 2.2e-16

From our OLS regression we conclude that ratification (Kyoto_Protocol) has no effect on a country’s carbon dioxide emissions. That is because the p-value associated with the estimated coefficient is 0.13, thus failing to reject the null hypothesis that the true value of this parameter is zero.

The R package to analyze panel data is plm. To account for country fixed effects, it is necessary to indicate model = “within” inside the plm function.

panel_model <- plm(CO21 ~ lPopulation + Kyoto_Protocol + Commitment +
                          GDP1 + Urban + Technology1 + Agriculture1 +
                          Industry1 + Machinery1 + Women1 + Primary1 +
                          Secondary1 + Tertiary1 , 
                  data = Panel_Data,
                  index = c("country1", "year1"), 
                  model = "within")
> summary(panel_model)
Oneway (individual) effect Within Model

plm(formula = CO21 ~ lPopulation + Kyoto_Protocol + Commitment + 
GDP1 + Urban + Technology1 + Agriculture1 + Industry1 + Machinery1 + 
Women1 + Primary1 + Secondary1 + Tertiary1, data = Panel_Data, 
model = "within", index = c("country1", "year1"))

Unbalanced Panel: n = 84, T = 1-13, N = 595

Min.         1st Qu.     Median     3rd Qu.    Max. 
-0.21535918 -0.01843024 -0.00026806 0.01981962 0.20634836

                Estimate  Std. Error t-value  Pr(>|t|)    
lPopulation     2.3356e-01  6.0421e-02  3.8655 0.0001255 ***
Kyoto_Protocol -9.7904e-03  6.0673e-03 -1.6136 0.1072378    
Commitment      1.2848e-03  5.0589e-04  2.5396 0.0114006 *  
GDP1           -7.1970e-06  1.3890e-06 -5.1815 3.204e-07 ***
Urban           2.0946e-03  1.7962e-03  1.1661 0.2441247    
Technology1     6.5746e-04  4.9556e-04  1.3267 0.1852139    
Agriculture1    1.2312e-02  1.0782e-03 11.4185 < 2.2e-16 ***
Industry1      -5.5074e-04  7.8713e-04 -0.6997 0.4844519    
Machinery1      1.2645e-03  7.2196e-04  1.7515 0.0804701 .  
Women1         -1.3279e-03  6.1288e-04 -2.1667 0.0307319 *  
Primary1        7.0385e-04  3.8911e-04  1.8089 0.0710758 .  
Secondary1      2.8298e-04  3.0903e-04  0.9157 0.3602546    
Tertiary1      -1.2372e-03  3.6793e-04 -3.3625 0.0008318 ***
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares: 1.9703
Residual Sum of Squares: 0.94267
R-Squared: 0.52156
Adj. R-Squared: 0.42933
F-statistic: 41.7607 on 13 and 498 DF, p-value: < 2.22e-16

Similarly to before we find that ratification of the Kyoto_Protocol seems to have no effect on carbon dioxide emissions (p-value = 0.107). This time we controlled for country fixed effects.

Yet, the output of the model tells me interesting things. In the beggining of the output, the model indicates I am dealing with an unbalanced panel:

Unbalanced Panel: n = 84, T = 1-13, N = 595

The panel consists of 84 countries and 13 time periods. However, the panel is unbalanced. That means that some countries do not have data for all years, while others do. Moreover, the adjusted R-squared is 0.43 which is fairly good.

Other panel data models can be estimated depending on the nature of your panel data. The model= option in  the plm function can take the arguments “within”, “random”, “ht”, “between”, “pooling” and “fd”. In future blog posts I can go through these additional models.



Jaffe, A. B., & Palmer, K. (1997). Environmental regulation and innovation: a panel data study. Review of economics and statistics79(4), 610-619.

Welsch, H. (2006). Environment and happiness: Valuation of air pollution using life satisfaction data. Ecological economics58(4), 801-813.

Du, L., Wei, C., & Cai, S. (2012). Economic development and carbon dioxide emissions in China: Provincial panel data analysis. China Economic Review23(2), 371-384.

Egan, K., & Herriges, J. (2006). Multivariate count data regression models with individual panel data from an on-site sample. Journal of environmental economics and management52(2), 567-581.

Elsayed, K., & Paton, D. (2005). The impact of environmental performance on firm performance: static and dynamic panel data evidence. Structural change and economic dynamics16(3), 395-412.


1 Comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s