Garbage In = Busy Maids (Cleaning up the MLS mess)

On the soapboxAs promised in my previous post, I’d like to get on my soap box and complain about the state of NWMLS data. As an application developer, I’d rather spend my time developing new & exiting ways of visualizing data instead of developing new & exiting ways of correcting inaccurate data. Unfortunately, in order to accomplish the former, a lot of effort is spent on the later.

For example, of the 20,376 properties that were in the database when I started writing this blog entry, 32 have bogus zip codes. I’m not talking about hard to find errors like a Sammamish property with an Issaquah zip code. I’m talking about outright typos and easy to catch errors. Zip codes like 00000, WA, and other obvious errors, like zip codes smaller than 98001 (which is the smallest zip code in Washington state).

Another bone of contention, is that nearly 7% of the properties in the NWMLS database have a square footage of 0 square feet (1,389 properties). How hard is it to contact the county assessor’s office or the property owner and get the number? Can’t you just give an intelligent guess? Needless to say, this complicates compiling price per square foot statistics because computers have this thing about not wanting to divide a number by zero.

Even more annoying, nearly 18% of the properties in the NWMLS database have a 0° north latitude & 0° west longitude (3,637 properties). Can’t you just go to a map web site and enter an intelligent guess? If you can afford to be a competitive realtor, you can afford a cheap GPS receiver to put accurate data into the MLS when you list a property. I’m sorry, but you if you say your client’s property is located in middle of the Atlantic, 350 miles off the coast of Accra, Ghana in Western Africa, why should I believe anything else in your listing?

Perhaps most disappointing is that over 50% of the properties in the database don’t have elementary school, junior high, or high school information associated with the listing (10,419 properties)! How is a client supposed to make an intelligent decisions on the quality of schools, if that information isn’t available? I can only imagine how frustrated professional realtors must feel about this since their livelihood is dependent on the quality of this data!

Now, given the frequency of these errors, it astounds me that I have yet to find an instance in which a county, city or community name was misspelled. So obviously, it is possible to have high quality data in the database. But why is only some of it of consistently high quality? And why do we have so many errors of commission?

To paraphrase one of Murphy’s Law “If builders built buildings, the way the local MLS (and local realtors) compile data, the first woodpecker would’ve destroyed civilization“. Why is the data so bad? Are some realtors too lazy to bother with listing a property with complete and accurate information? Does the MLS not care about this? Are the MLS data collection tools so bad, that the fact we have any data (much less accurate data) is a feat worth celebrating? Perhaps most importantly, what can we do to improve this sad state of affairs? To quote General Beringer, from the movie WarGames “I’d piss on a spark plug if I thought it’d do any good!”

Robbie
Caffeinated Software

PS – Go Seahawks!

The Joys of Geocoding

In my last post, I was asked what the accuracy of the locations in our generated Google Earth files are. Before I divulge that information, I’d like to explain some of the challenges of getting accurately geocoded data. (I’ll get on my soapbox and complain about the state of NWMLS data in my next post).

GPS Signal WiggleNow, in partial defense of realtors and the MLS, it is unrealistic to expect perfect data. For example, consumer-level GPS receivers aren’t always as accurate as one might think. This weekend I loaded up Microsoft Streets & Tips 2006 on my desktop computer, hooked up my GPS receiver, turned on GPS tracking , created a GPS trail, and walked away for an hour. An hour later, my map had a line drawing that resembled the type my 3 year old son likes to create. So even if a realtor was to use a GPS receiver, to get a latitude & longitude reading, it’s entirely possible that the measurement would be off by a house or two (or four).

Another problem, is that most digital maps are created with data sold by companies like TeleAtlas or NavTeq. The companies compile their data by driving around previously unknown streets & neighborhoods, with computers & GPS receivers (kinda like how that annoying guy in the Verizon ads, test their network). I should note that in-vehicle navigation systems are more accurate than GPS receivers alone, because the vehicle’s navigation system can also use the vehicle’s steeling wheel position and the speedometer to determine what your location is.

Unfortunately, by the time the Microsoft’s, Yahoo’s and Google’s of the world get their hands on the data, it is at least 3-6 months out of date (and probably closer to 12-18 months out of date by the time it gets on the web or published on a CD). This is a problem because about 25% of the properties in the NWMLS are new construction (where new construction is defined as a property that was built in 2005 or later). Since new construction is often located near new roads, the giants of digital mapping may be unable to help and are always in a position of playing catch up.

Then when the companies convert the raw data into digital maps, they end up using multiple sources of data, and interpolating it into one set of data they are going to use for a map. However, the data sources don’t always agree on where a point of interest is.

For example, Google Earth thinks the top of the Seattle Space Needle is at 47.620367° north latitude & 122.349005° west longitude. Meanwhile, Microsoft’s Virtual Earth, seems to think it’s located at 47.620336° north latitude & 122.348515° west longitude. Now, a few ten thousand-enths of a degree means the difference between the tip of the needle & one of the air conditioning units on the roof (a few yards). But if they can’t agree on where the top of the Space Needle is, it’s likely they aren’t going to agree on where 742 Evergreen Terrace is either. However, a few yards of error is better than a few miles of error (which is what can happen when I use raw NWMLS data)

Because of this, I have to geocode every single property in the database because I don’t trust the NWMLS data. So I to call Yahoo! Maps Web Services – Geocoding API to get a latitude & longitude for everything. Although Yahoo is far from perfect, at least it’s free and try’s harder than the MLS. So without further delay, here is the current geocoding precision of the points on our generated maps.

Geocoding Precision No. of properties Percentage
address 16341 80.20
street 1975 9.69
zip+4 43 .21
zip+2 343 1.68
zip 1644 8.07
city 25 .12
state 5 .02

In closing, I’d like to ask real estate professionals to be as complete and as accurate as possible when submitting listing data to their local MLS. I’d also like to state even if the MLS was accurate, it’s unrealistic to expect prefect geo-coding from imperfect data. If digital mapping companies and GPS technology can’t get it exactly right, a house or two off, is probably as accurate as you can realistically hope for given the current state of the art.

Robbie
Caffeinated Software

The Future of MLS search is coming to Rain City Guide

Greetings fellow Rain City Readers! I’m a software engineer that has been working with Dustin to develop a better MLS search. Before I get started into what I’m doing, I thought I’d discuss the why I’m doing it…

My saga began when I had the opportunity to develop an NWMLS search web site for a local realtor. After spending several weeks, cutting red tape, determining what forms I needed to fill out, figuring out whom at my realtor’s broker I needed to bother, signing my life away and finally getting access to an NWMLS database, I was at the point where I could get real work done. Anyway, after I had spent over 40 hours developing standard search features (search summary with thumbnails, property detail page) and a few interesting ones (like customized HTML e-mail with property photos, customizable photo not available photos on search results), I sent her my bill.

Then things went south. Despite the fact that my client was warned ahead of time that my time isn’t free, she apparently expected that I would be price competitive with “canned” solutions such as those offered by iHouse & Superlative. On the one hand, I can’t blame her. A consultant can’t compete with a commercial product, because a commercial product has a lot more customers to help finance its development than a lone consultant does. Just because those companies sell solutions for $50/month doesn’t mean it only costs them $50 to design, develop & test the software! It still costs those organizations thousands of dollars (or more) to bring these products to market! However, if you plan on distributing software to 1,000 customers, you can charge a lot less per customer, than if you are only distributing it to one.

Anyway, after this failed business opportunity, I decided to contact Dustin and regroup. I wanted to develop a unique MLS service that would’ve given my client a competitive advantage (and she was more interested in price than value) and after reading Rain City Guide it became obvious that Dustin would see the value in what I could do. Besides, I’d rather continue to improve the code I was working on than send it to the hard drive in the sky.

Dustin & I, both share the belief that the real estate industry is in for some very interesting times as the reverberations of the internet revolution continue to change our society and business models. Dustin’s enthusiasm for the ideas I’m trying implement is contagious and we essentially worked out a deal in which I’ll continue to develop compelling MLS technology in my spare time, I’ll use him and his Rain City readers as a sounding board for ideas and beta testers (both marketing & development feedback), and in a few months time, ideally, I would have developed a really unique service that technology savvy realtors would be willing to pay for.

One of the cooler things I’ve done is turn MLS search results in to Google Earth files. Just download the Google Earth application, visit our BETA listings search page, click on the Google Earth icon, and see your search results on a 3D globe. Eventually, we’ll do similar stuff with AJAX style Mapping (although, right now I’m focusing more on things that haven’t been done yet) and other applications.

Google Earth Application

Most realtors have “me too” & “same old thing” web sites. One of the things I want to do, is give realtors the ability of exploiting the MLS data in way that is valuable and compelling to their clients and strengthens & reinforces their name/brand to their prospects. Having customized RSS feeds of MLS data, having proximity searches to points of interest (how far is this house away from a gas station?), and take advantage of all the cool location/mapping technology that the 3 giants of the internet are developing (Microsoft, Google & Yahoo), are just some of the things that could be done, but aren’t really done yet.

One of the reasons for this state of affairs is that currently only software engineers with access to MLS data can do these things. Unfortunately, we live in world in which most realtors don’t have the skills & knowledge that software engineers have and most software engineers don’t have free access to the raw MLS data that most realtors do, so things are moving slower than they otherwise might be. Obviously, waiting for the HouseValue’s of the world to develop this technology is an option. However, their business model seems to be marginalizing the value of a realtor instead of enhancing it. I’d rather take the opposite tack, since I suspect that my future customers would prefer to use technology to improve their competitive advantage against all comers rather than having it used against them and risk turning themselves into a bunch of “me-too” commodity realtors paying somebody else for random sales leads. (which is probably one of the reasons you blog!)

Right now, you can take a gander at the humble beginnings of our grand vision at http://listings.raincityguide.com/search.aspx. Granted we still have a few bugs that need to be fixed, and many, many more features need to get implemented. However, it’s my goal to turn this into something that would provide a compelling value for my future clients (realtors & their customers) and I welcome any comments that would help me, help you.

Robbie
Caffeinated Software