How our model determines an accurate price for your unit at ‘normal’ demand
As stated above, the Base Price Model is the foundation of each unit’s unique pricing recommendation. The goal of the Base Price Model is to discern an accurate median nightly price for each unit by analyzing the attributes of the property.
For example, we all know that a pool, a porch, parking and other attributes can impact the desirability of a given property.
And, while the value of these individuals attributes varies greatly over the course of a year (i.e. your pool is ‘worth’ more on July 4th than on January 4th), the Base Price Model is designed to determine the value of your unit for a day with average local demand.
To do this, the model analyzes:
Attributes (e.g. bedrooms, bathrooms, parking, sleeps, unit type, etc.)
Fees (e.g. cleaning fee, security deposit, extra guest fee, etc.)
Location
Booking performance (i.e. every booking, for your and nearby units)
To train our Base Price Model, we use a supervised machine learning model that incorporates all active units in each market, leveraging their unit details and the median nightly price over the last year. By using the median, we can reduce the signal from events or accidental outliers in the data to make sure we primarily capture pricing for ‘normal’ days.
Our model leverages gradient boosting to produce an ensemble of decision trees, which map a unit’s features to the median price. For our customers, this approach enables us to balance model complexity with interpretability — or the ability to show you how your unit’s attributes impact your Base Price recommendation.
Let us explore an example to learn more.
Detailed in the chart below are the five largest drivers of the Base Price in the San Francisco market (Note: real data is used throughout all examples in this write-up)
Perhaps unsurprisingly, bedrooms and sleeps (the number of people that can ‘sleep’ in a unit) have the largest impact on a given night’s price. In most markets, these two attributes are key drivers of the Base Price, as they are tightly correlated with the size (and hence the price) of a unit.
We can also see that the unit type (e.g. private room, entire apartment, full house, etc.) plays a significant role in determining an accurate Base Price in San Francisco, as do the extra fees (both cleaning and guest fees) associated with each stay.
While this graph offered us a high-level understanding of the value of the attributes, let us dive in more deeply to the most impactful attribute — bedrooms.
To do this, we will examine a series of sample units that have:
a variable number of bedrooms (From 0BR, i.e. a studio, to 4BR)
But, all have one bathroom and ‘sleep’ four guests
In fact, we can see that going from a studio (0BR) to a 1 BR barely impacts your recommended Base Price. Similarly, the impact of a 4th bedroom is de minimis, at least in San Francisco for units with 1 bathroom.
However, what you might have realized is that this example is constrained in the sense that all these hypothetical units are restricted to having 1BA and ‘sleeping’ 4. In reality, most 4BR units will both have more than one bathroom, and sleep more than 4 people.
Therefore, examining this constrained example illustrates a core aspect of our Base Price Model though, as it is intended to illustrate that an accurate Base Price must consider the value of attributes both individually and collectively.
Location Impact
How our model infers demand signals around each unit’s location
The location of a unit can dramatically impact the perceived and actual value of that unit.
To determine this impact, our model leverages our Base Price Model, by applying spatial kriging to our Base Price Model’s residuals. In our model, the ‘residuals’ refer to percentage differences between our predicted and the actual median prices for each unit, as illustrated below.
In the visual below, we can see that our model has identified a set of clustered units (yellow dots) that mostly have a median nightly price above our ‘predicted’ median price (‘predicted’ based on the unit’s attributes).
This collective pricing signal indicates that we are likely observing a high-value neighborhood, which in turn informs our location adjustment. In this case, a unit at the location of the purple dot would get a 10% increase in our Base Price recommendation.
Occupancy Impact:
How we leverage occupancy to verify our training data
One challenge in the short-term rental space is that many units are unintentionally (or intentionally!) ‘incorrectly’ priced.
Therefore, it is critical to leverage occupancy data (occupancy achieved over the prior year) to better understand how much our model should ‘weigh’ each unit’s pricing strategy. In combination with prices, the occupancy rate achieved by a unit can help us better understand whether a unit is overpriced or underpriced, relative to the market. Hence, we provide the occupancy as an input to our Base Price Model during training to tease out this relationship.
Of interest, despite units in the STR space being very unique, among the millions of units we have analyzed, the aggregate booking patterns reveal that the STR market is actually very efficient.
To explore this more, let us examine the chart below, which illustrates the relationship between a unit’s Base Price, and occupancy rates. As you can see, as the achieved occupancy increases, the Base Price associated with these units decreases.
Accuracy Evaluation
How we analyze the accuracy of our recommendations
For our model to be effective, we need to avoid overfitting and ensure that the out-of-sample error is minimized. For this, our model uses cross validation during the training process.
In our example of San Francisco, we train the Base Price Model using data from more than 5,000 units through the steps laid out above. A majority of units will have prices close to the right Base Price, while some will be priced too high or too low.
The below image is a histogram chart of a unit’s recommended prices relative to their current median price.
As you can see, most recommendations are pretty similar to the median price, i.e. have a value close to 1.0 (or 100%). However, a few units are currently underpriced and would get a recommendation as high as twice their current price, i.e. 2.0, while others are currently overpriced and would get a recommendation as low as 30% of their current price, i.e. 0.3.
The histogram shows that our Base Price Model
reflects the market accurately, i.e. most units are close to 1.0
is not biased, i.e. there is a similar amount of outliers to both sides
and does not overfit to the training data, i.e. the spread of the distribution is not too narrow.
The final output of this model is our “Attributes-Only” Base Price Model.
Now, let’s read about how bookings impact our Base Price Recommendation.