Given the current pandemic that occupies most of our thoughts lately, I thought I’d contribute somehow by illustrating some of the impacts of the current pandemic. In this blog post, I will use the ggplot2 package in R to illustrate graphically the impact of the pandemic and also use the lm function to build a simple counterfactual.
The COVID-19 pandemic has several consequences to our livelihoods as well as economic activity in general. Schools are closed, small business are forced to close doors and adapt to delivering products and services if possible, and international mobility is reduced as countries close borders. As a consequence, many people have lost their jobs, and researchers predict an economic recession as a result.
As for me, I am curious about the environmental impact of the pandemic. While I have heard reports of pollution decreasing due to reduced economic activity, in here I explore the impact of COVID-19 on electricity markets.
The main motivation for studying the electricity market is due to data sources. Both price and electricity consumption are updated on a daily basis, so it’s easier to figure out the immediate effect of the pandemic and of the measures that have been implementing to limit the spread of the new virus. Data on price and electricity consumption for the Nordic and Baltic market is available at NordPool.
NordicPool manages the largest electricity market in Europe. It reports a variety of historical data related to electricity consumption and production across Europe. Here’s an overview of the NordPool electricity market and the electricity price (in EUR) as of 10th April 2020:
Source: NordPool website (https://www.nordpoolgroup.com/maps/#/nordic)
Moreover, electricity is an important input for economic activity, so drop in its consumption might be a good indicator of the impact on the economy overall.
Due to data availability, I use data for seven countries: Nodic countries (Denmark, Norway, Finland and Sweden) and Baltic countries (Estonia, Latvia and Lithuania). I use both data on daily consumption and price. The database that I built can be downloaded here.
As usual, the first step is to import this dataset. The original file is an excel document, so I use the readxl package. I will call my dataset “database_electricity”.
library(readxl) database_electricity <- read_excel("DIRECTORY/database_electricity.xlsx")
The dataset contains a “Country” identifier, a time variable “Time” with the date, a “Month” indicator, and a Weekend dummy. I included data for 2019 and 2020 (up to 9th April 2020). The dataset also includes other variables that I will address in future blog posts.
The Consumption variable denotes the consumption in MWh for each day at country level. Likewise, the Price variable represents the electricity in Euros per unit of MWh for each country and per day.
Let’s illustrate graphically how consumption and price change over time. My expectation is that either price or electricity consumption decreased over time, especially when the pandemic became more serious in mid-March. To use the ggplot, I use the library function. Using the ggplot function, I create four graphs: Consumption and Price in 2020 and 2019. Each line represents a different country.
library(ggplot2) ggplot(database_electricity[database_electricity$Year==2019&database_electricity$Day_ID<101,], aes(Time, Consumption, colour = Country)) + geom_line() + ggtitle("2019") ggplot(database_electricity[database_electricity$Year==2020&database_electricity$Day_ID<101,], aes(Time, Consumption, colour = Country)) + geom_line() + ggtitle("2020") ggplot(database_electricity[database_electricity$Year==2019&database_electricity$Day_ID<101,], aes(Time, Price, colour = Country)) + geom_line() + ggtitle("2019") ggplot(database_electricity[database_electricity$Year==2020&database_electricity$Day_ID<101,], aes(Time, Price, colour = Country)) + geom_line() + ggtitle("2020")
On the left side there are the graphs illustrating 2019 data, and on the right side those of 2020. I would expect a similar pattern in 2019 and 2020, except after mid-March where the trend could differ. Around mid-March I was expecting a drop in consumption of electricity due to a slowdown in economic activity.
Instead, it seems that the years 2019 and 2020 are fundamentally different. For example, the price of electricity per MWh is significantly lower in 2020 and it exhibits higher volatility than 2019, even before the pandemic became more serious. Nonetheless, as we get more data, a more obvious pattern might be visible.
Instead of comparing 2019 and 2020, I would like to create a counterfactual for 2020. In other words, I would like to create a variable that represents what the consumption and price would have been in 2020 without the COVID-19 pandemic.
In other to create this counterfactual, I first transform the electricity consumption variable into its log form.
database_electricity$ln_Cons <- log(database_electricity$Consumption)
I want to use a linear regression to try to predict the natural logarithm of Consumption given a weekend, month, country and year dummies, as well as a day indicator (Day_ID). I simply use the lm function to run this regression.
> reg <- lm(ln_Cons ~ Weekend + as.factor(Month) + Day_ID + as.factor(Country) + as.factor(Year), data=database_electricity[database_electricity$Time<"2020-03-10",]) > summary(reg) Call: lm(formula = ln_Cons ~ Weekend + as.factor(Month) + Day_ID + as.factor(Country) + as.factor(Year), data = database_electricity[database_electricity$Time < "2020-03-10", ]) Residuals: Min 1Q Median 3Q Max -1.07576 -0.04554 -0.00110 0.05105 0.23682 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 11.6382805 0.0062256 1869.419 < 2e-16 *** Weekend -0.1060194 0.0030882 -34.331 < 2e-16 *** as.factor(Month)2 0.0045842 0.0071242 0.643 0.519969 as.factor(Month)3 -0.0198451 0.0107900 -1.839 0.065981 . as.factor(Month)4 -0.1359036 0.0156789 -8.668 < 2e-16 *** as.factor(Month)5 -0.1621473 0.0201237 -8.058 1.11e-15 *** as.factor(Month)6 -0.1850633 0.0247595 -7.474 1.01e-13 *** as.factor(Month)7 -0.2015539 0.0294279 -6.849 8.97e-12 *** as.factor(Month)8 -0.1366671 0.0342400 -3.991 6.72e-05 *** as.factor(Month)9 -0.0632528 0.0390136 -1.621 0.105057 as.factor(Month)10 0.0556144 0.0437811 1.270 0.204082 as.factor(Month)11 0.1700185 0.0485851 3.499 0.000473 *** as.factor(Month)12 0.1840807 0.0533786 3.449 0.000571 *** Day_ID -0.0007963 0.0001589 -5.011 5.73e-07 *** as.factor(Country)EE -1.4107335 0.0052114 -270.703 < 2e-16 *** as.factor(Country)FI 0.8998928 0.0052114 172.679 < 2e-16 *** as.factor(Country)LT -1.0193735 0.0052114 -195.605 < 2e-16 *** as.factor(Country)LV -1.5410006 0.0052114 -295.699 < 2e-16 *** as.factor(Country)NO 1.3813594 0.0052114 265.066 < 2e-16 *** as.factor(Country)SE 1.4020889 0.0052114 269.044 < 2e-16 *** as.factor(Year)2020 -0.0433779 0.0047960 -9.045 < 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.07677 on 3017 degrees of freedom Multiple R-squared: 0.996, Adjusted R-squared: 0.996 F-statistic: 3.75e+04 on 20 and 3017 DF, p-value: < 2.2e-16
The R-squared is very high (0.996), which gives me confidence that the model is very good at predicting the electricity consumption. As expected, electricity consumption decreases in weekend, and during the summer months. It also seems to be the case that electricity consumption has been lower in 2020 compared with 2019.
With the estimated coefficients estimated using historical data, I can predict the energy consumption for each day in 2020. To do so, I will create a new dataframe called “counterfactual” with several variables that I used to run the regression above. I will multiply these variables with the estimated coefficients to calculate my prediction of Consumption.
counterfactual <- data.frame(c(366:465), database_electricity$Month[database_electricity$Year==2020], database_electricity$Weekend[database_electricity$Year==2020], database_electricity$Country[database_electricity$Year==2020]) colnames(counterfactual) <- list("Day_ID", "Month", "Weekend", "Country")
I also need to create several variables which are dummies, taking the value 1 or 0. I will create these dummies as months and country indicators with the code below.
month.f = factor(counterfactual$Month) dummies = model.matrix(~month.f) counterfactual <- data.frame(counterfactual,dummies) country.f = factor(counterfactual$Country) dummies.c = model.matrix(~country.f) counterfactual <- data.frame(counterfactual,dummies.c)
With the counterfactual dataframe, I can create a new variable which comprise my predictions for the logarithm of electricity consumption.
counterfactual$ln_Cons <- coefficients(reg) + coefficients(reg) *counterfactual$Weekend + coefficients(reg) *counterfactual$month.f2 + coefficients(reg) *counterfactual$month.f3 + coefficients(reg) *counterfactual$month.f4 + coefficients(reg)*(counterfactual$Day_ID-365) + coefficients(reg)*counterfactual$country.fEE + coefficients(reg)*counterfactual$country.fFI + coefficients(reg)*counterfactual$country.fLT + coefficients(reg)*counterfactual$country.fLV + coefficients(reg)*counterfactual$country.fNO + coefficients(reg)*counterfactual$country.fSE + coefficients(reg)
I now create a new variable called res (for residuals) that is simply the difference between the expected logarithm of consumption and actual consumption in 2020. I also want to plot this res variable in a plot. The residual variable should be around zero. If however the predictions differ too much from the actual values, the res should take a value different than zero.
counterfactual$res <- counterfactual$ln_Cons - database_electricity$ln_Cons[database_electricity$Year==2020] ggplot(counterfactual, aes(Day_ID, res, colour = Country)) + geom_smooth() + ggtitle("Residuals 2020") + # facet_wrap(~ Country, nrow = 2) + geom_vline(xintercept = 437)
As we can observe in the graph, the residuals take a value different from zero even before the pandemic became more serious. While the confidence interval of the residuals overlaps zero for many countries, I conclude that the counterfactual that I created is not very good at predicting the consumption of electricity.
However, from looking at the graph, there is an interesting finding, that is, of a larger variance after the pandemic on the electricity consumption. All of the confidence intervals seem to increase, which means there is larger variation of residual values with the pandemic.
I created the same graph as before but using the price variable. I used the same code, but replacing ln_Cons for Price. This is the resulting graph:
The price counterfactual I have created seems to predict better the actual price of electricity. Before the pandemic (black line) the residuals are generally around zero (except for Norway).
However, here’s the interesting thing. It seems that after the pandemic, prices of electricity have fallen in all seven countries. The residuals are all positive and with an upwards trend, meaning that the expected electricity price is higher than the actual electricity price. And this seems to be the only impact of the pandemic so far on electricity markets: an overall decrease of the electricity price.
Of course, all of this analysis is premilinary and things might change with time. As we get more daily data, other trends might arise.