Garbage In = Busy Maids (Cleaning up the MLS mess)

On the soapboxAs promised in my previous post, I’d like to get on my soap box and complain about the state of NWMLS data. As an application developer, I’d rather spend my time developing new & exiting ways of visualizing data instead of developing new & exiting ways of correcting inaccurate data. Unfortunately, in order to accomplish the former, a lot of effort is spent on the later.

For example, of the 20,376 properties that were in the database when I started writing this blog entry, 32 have bogus zip codes. I’m not talking about hard to find errors like a Sammamish property with an Issaquah zip code. I’m talking about outright typos and easy to catch errors. Zip codes like 00000, WA, and other obvious errors, like zip codes smaller than 98001 (which is the smallest zip code in Washington state).

Another bone of contention, is that nearly 7% of the properties in the NWMLS database have a square footage of 0 square feet (1,389 properties). How hard is it to contact the county assessor’s office or the property owner and get the number? Can’t you just give an intelligent guess? Needless to say, this complicates compiling price per square foot statistics because computers have this thing about not wanting to divide a number by zero.

Even more annoying, nearly 18% of the properties in the NWMLS database have a 0° north latitude & 0° west longitude (3,637 properties). Can’t you just go to a map web site and enter an intelligent guess? If you can afford to be a competitive realtor, you can afford a cheap GPS receiver to put accurate data into the MLS when you list a property. I’m sorry, but you if you say your client’s property is located in middle of the Atlantic, 350 miles off the coast of Accra, Ghana in Western Africa, why should I believe anything else in your listing?

Perhaps most disappointing is that over 50% of the properties in the database don’t have elementary school, junior high, or high school information associated with the listing (10,419 properties)! How is a client supposed to make an intelligent decisions on the quality of schools, if that information isn’t available? I can only imagine how frustrated professional realtors must feel about this since their livelihood is dependent on the quality of this data!

Now, given the frequency of these errors, it astounds me that I have yet to find an instance in which a county, city or community name was misspelled. So obviously, it is possible to have high quality data in the database. But why is only some of it of consistently high quality? And why do we have so many errors of commission?

To paraphrase one of Murphy’s Law “If builders built buildings, the way the local MLS (and local realtors) compile data, the first woodpecker would’ve destroyed civilization“. Why is the data so bad? Are some realtors too lazy to bother with listing a property with complete and accurate information? Does the MLS not care about this? Are the MLS data collection tools so bad, that the fact we have any data (much less accurate data) is a feat worth celebrating? Perhaps most importantly, what can we do to improve this sad state of affairs? To quote General Beringer, from the movie WarGames “I’d piss on a spark plug if I thought it’d do any good!”

Robbie
Caffeinated Software

PS – Go Seahawks!