The extent of the market in Hedonic Pricing Applications

This week I am focusing on the hedonic pricing method and its underlying assumptions.

Hedonic pricing is a valuation method that relies on the hypothesis that the price of a good depends on its characteristics. If the good is traded in a free competitive market, then its composite price reveals the implicit price of its attributes (Perman et al., 2003). For example, in the case of the real estate market, the price of a house should depend on its characteristics: number of bedrooms, location, amenities, as well as environmental quality nearby. Hedonic pricing is an environmental valuation method so long as the implicit price of environmental amenities is estimated. Many authors have found that air quality or tree cover affects housing prices in the expected way: improved air quality and tree cover generally increase housing price.

The hedonic method relies on the assumption that price data of the good under analysis is drawn from the same market (Perman et al., 2003). Markets can be distinct in geography or time, for example.

Donovan et al. (2019) apply hedonic pricing to the housing market in Tampa (Florida, United States). The “individuals” or rather unit of analysis in Donovan et al. (2019) are single-family homes in Tampa, Florida. The research question of this study is to estimate the effect of trees and their proximity on housing prices. Donovan et al. (2019) do find that trees within 125 meters from a residence have a positive effect on that residence’s price.

The authors have data for 2015 and 2016 for a total of 4848 houses sold in this period. The average selling price was 282,524 US dollars. The authors control for a variety of house characteristics, such as number of bedrooms and bathrooms, area of lot, year built, many architecture variables, and Tree cover within 152 m (500 feet) of house.

The authors use a variety of models that account for spatial correlation of the error term. They find that most coefficients have the expected sign. Regarding the variables related to the presence of trees, neither the tree cover on a house, nor within 30.5 meters were significantly associated with sales price. The effect becomes positive with tree cover within 152 m (500 feet) of a house.

My hypothesis in this blog post is to investigate whether the houses under scrutiny in Donovan et al. (2019) belong to different markets. Since Donovan et al. (2019) collected data for the period of one year, it does not seems to be the case that the market would change in that time span. Yet, there might be other reasons why it would make sense that the markets are distinct. The authors have a variable called architecture_style that I will use to potentially separate the dwellings. Let’s take a look at that variable.

After I imported the data (which can be found here), I tabulate the architecture_style variable. I can see how many dwellings have each of the different architecture styles.

> table(data_ufug_csv$architecture_style)
Basic 1-Story
Basic Multi-Story
Contemporary 1-Story
Contemporary Multi-Story
Pre-1940 1-Story
Pre-1940 Multi-Story
Unique Design
Updated Basic 1-Story
Updated Basic Multi-Story
Updated Contemporary 1-Story
Updated Contemporary Multi-Story
Updated Pre-1940 1-Story
Updated Pre-1940 Multi-Story
Updated Unique Design

Our hypothesis is that perhaps 1-Story and Multi-Story dwellings are in different housing markets. That is, individuals buying 1-Story dwellings are not actively searching in the Multi-Story market, and vice-versa. In other to test such an hypothesis, I will check if the implicit price for housing characteristics is any different in 1 and multi-story houses.

First, I created a dummy variable taking the value one if the dwelling is a 1-story dwelling, or zero if it is a multi-story dwelling:

data_ufug_csv$Dummy_1story[data_ufug_csv$architecture_style=="Basic 1-Story" | 
data_ufug_csv$architecture_style=="Contemporary 1-Story" | 
data_ufug_csv$architecture_style=="Pre-1940 1-Story" | 
data_ufug_csv$architecture_style=="Updated Basic 1-Story" | 
data_ufug_csv$architecture_style=="Updated Contemporary 1-Story" |
data_ufug_csv$architecture_style=="Updated Pre-1940 1-Story" ] <- 1

data_ufug_csv$Dummy_1story[data_ufug_csv$architecture_style=="Basic Multi-Story" | 
data_ufug_csv$architecture_style=="Contemporary Multi-Story" | 
data_ufug_csv$architecture_style=="Pre-1940 Multi-Story" | 
data_ufug_csv$architecture_style=="Updated Basic Multi-Story" | 
data_ufug_csv$architecture_style=="Updated Contemporary Multi-Story" |
data_ufug_csv$architecture_style=="Updated Pre-1940 Multi-Story" ] <- 0

I then look at the estimated coefficients using the two sub-samples:

model1 <- lm(sale_price_log ~ tBEDS_z + tBATHS_z + HEAT_AR_z + tSTORIES_z + 
             ACREAGE_z + ACT_z + Garage_01 + Carport_01 + 
             Porch_01 + pool + water_front + UTC_parcel_500ft_z, 
          data = data_ufug_csv[data_ufug_csv$Dummy_1story==1,])


modelM <- lm(sale_price_log ~ tBEDS_z + tBATHS_z + HEAT_AR_z + tSTORIES_z + 
             ACREAGE_z + ACT_z + Garage_01 + Carport_01 + 
             Porch_01 + pool + water_front + UTC_parcel_500ft_z, 
           data = data_ufug_csv[data_ufug_csv$Dummy_1story==0,])


Some of the estimated coefficients are quite different. For example, the variable of interest in Donovan et al. (2019), that is tree cover, is positive and significant for multi-story dwellings, and statistically insignificant for 1-story dwellings:

> model1$coefficients[13]
> modelM$coefficients[13]

Note that by estimating a standard linear regression, I am not accounting for spatial autocorrelation, which was found in Donovan et al. (2019). So I cannot replicate their exact findings. Nonetheless, the estimated coefficients suggest that there might be some differences between 1-story and multi-story dwellings.

In order to statistically test for differences in the two sub-samples, I want to use the log likelihood ratio test. The function I am using is lrtest in the lmtest package.


To implement the log likelihood ratio test, I estimate a model with all interactions of the dummy variable and the housing characteristics.

modelS <- lm(sale_price_log ~ tBEDS_z*Dummy_1story + tBATHS_z*Dummy_1story + 
                              HEAT_AR_z*Dummy_1story + tSTORIES_z*Dummy_1story + 
                              ACREAGE_z*Dummy_1story + ACT_z*Dummy_1story + 
                              Garage_01*Dummy_1story + Carport_01*Dummy_1story + 
                              Porch_01*Dummy_1story + pool*Dummy_1story + 
                              water_front*Dummy_1story + 
                              UTC_parcel_500ft_z*Dummy_1story + Dummy_1story, 
             data = data_ufug_csv[data_ufug_csv$Dummy_1story==1 | 
                                  data_ufug_csv$Dummy_1story==0 , ])

I want to compare the main model with the one with all interactions. The interactions of the dummy and housing characteristics that are statistically significant at 10% level are HEAT_AR, ACT , Carport and Tree cover. Hence, the implicit price of these housing characteristics differs for 1-story and multi-story houses.


The output looks like this:

Likelihood ratio test

Model 1: sale_price_log ~ tBEDS_z + tBATHS_z + HEAT_AR_z + tSTORIES_z + 
ACREAGE_z + ACT_z + Garage_01 + Carport_01 + Porch_01 + pool + 
water_front + UTC_parcel_500ft_z
Model 2: sale_price_log ~ tBEDS_z * Dummy_1story + tBATHS_z * Dummy_1story + 
HEAT_AR_z * Dummy_1story + tSTORIES_z * Dummy_1story + ACREAGE_z * 
Dummy_1story + ACT_z * Dummy_1story + Garage_01 * Dummy_1story + 
Carport_01 * Dummy_1story + Porch_01 * Dummy_1story + pool * 
Dummy_1story + water_front * Dummy_1story + UTC_parcel_500ft_z * 
Dummy_1story + Dummy_1story
#Df LogLik Df Chisq Pr(>Chisq) 
1 14 -1568.7 
2 27 -1537.5 13 62.284 2.044e-08 ***

We reject the null hypothesis of the log likelihood ratio test of equal models, thus 1-story and multi-story dwellings might indeed belong in different markets. I do not think this is conclusive evidence that the markets for these two types of houses are distinct, since such conclusions are context-specific. Yet, it is a nice sensitivity check to do in hedonic studies.



Donovan, G. H., Landry, S., & Winter, C. (2019). Urban trees, house price, and redevelopment pressure in Tampa, Florida. Urban Forestry & Urban Greening38, 330-336.

Perman, R., Ma, Y., McGilvray, J., & Common, M. (2003). Natural resource and environmental economics. Pearson Education.


1 Comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s