In this blog I will replicate the findings from Tang et al. (2018). This paper applied the hedonic pricing method. They data is publicly available at Mendeley Data.

The goal of Tang et al. (2018) is to estimate the value of mercury reductions to quantify the benefits of public policies such as the Mercury and Air Toxics Standards (MATS) in the United States. Mercury reductions might have benefits in different realms, including preventing 11,000 premature deaths, 4700 heart attacks, and 130,000 asthma attacks each year as claimed by the EPA as MATS benefits. One additional impact is that of mercury reductions on property prices, which is what the paper focus on.

Since mercury affects fish tissue concentrations, the authors decide to use fish consumption advisories (FCA) as their measure of mercury pollution. They argue in favor of the validity of this measure by saying that mercury is the primary chemical of concern for FCA designation (which I am not sure about upon reading the paper…). They analyze the impact of the FCA designation on property prices. They include data on 131 individual lakes in New York State and 83,000 property transactions occuring between 2004 and 2013. The authors find that a FCA designation decreases property values

by 6 to 7 percent. This applies to properties within one mile of the lakes considered.

The original dataset includes 82,715 property transactions and the dependent variable is the logarithm of the sales price. The regression they estimate is the following:

FCA is a dummy variable taking the value 1 if the lake was designated as polluted, and 0 otherwise. The vector X includes all other characteristics (including housing characteristics) that drive the property price.

I downloaded the data and prepared it beforehand with Stata. Since I still have a .dta file, I had to use the *haven* package to extract the data:

`library(haven) `

mercury <- read_dta("DIRECTORY/Prepared_data.dta")

The authors do some changes to the fish advisory variable. The “Advisory” variable is coded as “Yes” or “No”, and we must convert it into a dummy by creating a new variable which I called “FCA”:

`mercury$FCA <- 0 `

mercury$FCA[mercury$Advisory=="Y"] <- 1

mercury$FCA[mercury$advisory_diff<0] <- 0

The authors also correct the FCA variable given the condition of “advisory_diff<0”. While the paper does not explain this correction, in the original Stata code this variable is calculated by the difference between the year of the sale and the year of the advisory. Hence, the FCA variable should be 0 if the lake received FCA designation after the sale of the property.

The results reported in the paper include only a subset of all collected observations. That is, only properties within one mile of a lake are considered. I created a new dataset called “mercury2” that only includes properties within 1 mile to a lake.

`mercury2 <- mercury[mercury$Dis_Hglake<=1600,]`

This new dataset only includes 20194 observations, rather than the original 82715 observations.

The good thing about this paper is that all reported regressions were estimated by OLS. This means we only need the “lm” function in R rather than more complex estimation methods.

We want to explain the log of sales price given fish consumption advisories (FCA), the log of distance to the lake (log_Dis_Hglake), whether the property is located in front of the water (waterfront) and a dummy for a small pond nearby (lake_tag_f).

The authors do not estimate this regression with 4 covariates, but I will to check the results:

`model1 <- lm(log_std_sale_price ~ FCA + log_Dis_Hglake + waterfront + lake_tag_f, data=mercury2)`

` ``Estimate Std. Error t value Pr(>|t|)`

(Intercept) 12.685046 0.037172 341.251 < 2e-16 ***

FCA -0.095861 0.012752 -7.517 5.82e-14 ***

log_Dis_Hglake -0.193345 0.007039 -27.466 < 2e-16 ***

waterfront 0.237702 0.022599 10.518 < 2e-16 ***

lake_tag_f 0.166831 0.014333 11.640 < 2e-16 ***

The coefficient associated with FCA is -0.0958, which means that when the lake nearby is FCA-designated, property sale prices are expected to decrease by 9.586%. This is similar (albeit slightly higher loss) to the price reductions reported in the abstract of the paper.

Of course, one should account for additional covariates to avoid omitted variable bias. Tang et al. (2018) account for many house characteristics and distance to amenities. Table 2, model 1, reports the coefficients of the above estimation, plus year and month fixed effects, as well as house and neighborhood controls.

`model2 <- lm(log_std_sale_price ~ nbr_kitchens + nbr_full_baths + nbr_bed + nbr_fireplaces + bsmnt_garage_capacity + `

nbr_half_baths + finished_bsmnt + blt_his + log_sqft_living_area +

log_Dis_Hosp + log_Dis_POP + log_Dis_Sch + log_Dis_Univ +

central_air + factor(grade_f) + factor(prop_class_var_f) +

FCA +

log_Dis_Hglake +

waterfront +

lake_tag_f +

factor(sale_year) + factor(sale_month), data=mercury2)

summary(model2)

`summary(model2)`

Call:

lm(formula = log_std_sale_price ~ nbr_kitchens + nbr_full_baths +

nbr_bed + nbr_fireplaces + bsmnt_garage_capacity + nbr_half_baths +

finished_bsmnt + blt_his + log_sqft_living_area + log_Dis_Hosp +

log_Dis_POP + log_Dis_Sch + log_Dis_Univ + central_air +

factor(grade_f) + factor(prop_class_var_f) + FCA + log_Dis_Hglake +

waterfront + lake_tag_f + factor(sale_year) + factor(sale_month),

data = mercury2)

Residuals:

Min 1Q Median 3Q Max

-4.6395 -0.2805 0.0049 0.2914 3.2160

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 8.820e+00 2.302e-01 38.317 < 2e-16 ***

nbr_kitchens -1.597e-01 2.755e-02 -5.798 6.79e-09 ***

nbr_full_baths 1.809e-01 8.383e-03 21.581 < 2e-16 ***

nbr_bed -1.257e-02 5.134e-03 -2.449 0.014327 *

nbr_fireplaces 1.737e-01 6.950e-03 24.997 < 2e-16 ***

bsmnt_garage_capacity 3.090e-02 1.021e-02 3.027 0.002476 **

nbr_half_baths 1.086e-01 8.494e-03 12.790 < 2e-16 ***

finished_bsmnt 2.511e-05 1.600e-05 1.570 0.116541

blt_his -2.717e-03 1.102e-04 -24.663 < 2e-16 ***

log_sqft_living_area 4.864e-01 1.492e-02 32.595 < 2e-16 ***

log_Dis_Hosp 4.167e-02 4.066e-03 10.250 < 2e-16 ***

log_Dis_POP 7.192e-02 3.939e-03 18.261 < 2e-16 ***

log_Dis_Sch 4.619e-02 4.500e-03 10.265 < 2e-16 ***

log_Dis_Univ -6.566e-02 3.922e-03 -16.742 < 2e-16 ***

central_air 7.647e-02 1.203e-02 6.356 2.11e-10 ***

FCA -3.241e-01 1.563e-02 -20.742 < 2e-16 ***

log_Dis_Hglake -1.341e-01 5.256e-03 -25.517 < 2e-16 ***

waterfront 3.017e-01 1.580e-02 19.094 < 2e-16 ***

lake_tag_f 8.756e-02 1.007e-02 8.695 < 2e-16 ***

In the code right above, I omitted many dummy coefficients that were estimated (such as the year and month fixed effects). The FCA coefficient is -0.3241, meaning that the presence of a FCA advisory decreases property prices by 32.41%. This is the exact same estimate reported in the paper. However, it is significantly higher than subsequently estimated FCA coefficients.

In general, estimated coefficients look plausible. The more bedrooms, kitchens and bathrooms in the house, the higher the property sale price. The larger the property, the more expensive it is. Moreover, being further away from a lake is associated with a reduction in property prices. Property buyers enjoy waterfront properties close to waterbodies, especially if the water quality is good (i.e. no FCA present).

The paper reports other specifications that control for census-block-group-level characteristics as well as other covariates.

`model3 <- lm(log_std_sale_price ~ nbr_kitchens + nbr_full_baths + nbr_bed + nbr_fireplaces + bsmnt_garage_capacity + nbr_half_baths +`

finished_bsmnt + blt_his + log_sqft_living_area +

log_Dis_Hosp + log_Dis_POP + log_Dis_Sch + log_Dis_Univ +

central_air + factor(grade_f) + factor(prop_class_var_f) +

FCA +

log_Dis_Hglake +

waterfront +

lake_tag_f +

AreaSqKm + ADK_within_f + boatlaunch + fishing_access + AA_water +

factor(sale_year) + factor(sale_month) + factor(geoid_bg) , data=mercury2)

` Estimate Std. Error t value Pr(>|t|)`

(Intercept) 9.229e+00 2.934e-01 31.456 < 2e-16 ***

nbr_kitchens -1.830e-01 2.304e-02 -7.944 2.07e-15 ***

nbr_full_baths 1.151e-01 7.123e-03 16.163 < 2e-16 ***

nbr_bed 1.249e-04 4.357e-03 0.029 0.977131

nbr_fireplaces 1.060e-01 5.944e-03 17.838 < 2e-16 ***

bsmnt_garage_capacity -6.594e-04 8.604e-03 -0.077 0.938914

nbr_half_baths 6.454e-02 7.172e-03 8.998 < 2e-16 ***

finished_bsmnt 3.688e-06 1.353e-05 0.272 0.785256

blt_his -2.278e-03 1.020e-04 -22.329 < 2e-16 ***

log_sqft_living_area 4.857e-01 1.265e-02 38.388 < 2e-16 ***

log_Dis_Hosp -2.676e-03 1.684e-02 -0.159 0.873733

log_Dis_POP 9.567e-02 8.290e-03 11.541 < 2e-16 ***

log_Dis_Sch 4.743e-02 7.401e-03 6.409 1.50e-10 ***

log_Dis_Univ -7.935e-03 1.349e-02 -0.588 0.556414

central_air 4.506e-02 1.083e-02 4.162 3.17e-05 ***

FCA -7.362e-02 1.643e-02 -4.480 7.50e-06 ***

log_Dis_Hglake -1.540e-01 5.029e-03 -30.619 < 2e-16 ***

waterfront 2.910e-01 1.399e-02 20.803 < 2e-16 ***

lake_tag_f 4.190e-02 1.014e-02 4.131 3.63e-05 ***

This is the same specification that is reported in Table 2, specification 3. The FCA coefficient is -7.362, meaning that the presence of a FCA decreases property prices by 7.36%. This is in line with estimates reported in the paper’s abstract.

Overall, replicating this paper was fairly easy, but understanding the variables and specifications estimated was slightly complicated. It is partly due to the fact that there were a lot of covariates.

**References: **

Tang, C., Heintzelman, M. D., & Holsen, T. M. (2018). Mercury pollution, information, and property values. *Journal of Environmental Economics and Management*, *92*, 418-432.