npemap,
I have compiled a list of entries which vary greatly from their outbound's standard deviation: (worst entries at the top)
http://code.firefishy.com/files/bad-postcodes.csv
I am no statistician. Please correct me if I am off the mark.
/ Grant
------------------------ SQL:
SELECT `postcodes`.`outward`, `postcodes`.`inward`, `postcodes`.`lat`, `postcodes`.`long`,`outbound_stats`.*, abs(`postcodes`.`lat`-`outbound_stats`.`lat_avg`) AS `lat_offset`, abs(`postcodes`.`long`-`outbound_stats`.`long_avg`) AS `long_offset` FROM `postcodes` LEFT JOIN `outbound_stats` ON `postcodes`.`outward` = `outbound_stats`.`outward` WHERE `outbound_stats`.`number` >= 10 AND ( (abs(`postcodes`.`lat`-`outbound_stats`.`lat_avg`) + abs(`postcodes`.`long`-`outbound_stats`.`long_avg`)) > (`outbound_stats`.`lat_stddev`*3 + `outbound_stats`.`long_stddev`*3 )) ORDER BY (`lat_offset`+`lat_offset`) DESC
I have amended the list. Fixed a sort error.
Updated: http://code.firefishy.com/files/bad-postcodes.csv
Basic SQL notes: http://code.firefishy.com/files/bad-postcodes-notes.txt
I believe these entries should be removed or at least marked as suspect bad / incorrect entries.
/ Grant
npemap-talk@lists.urchin.earth.li