A story that surfaced in TalkTalk forums a while back, and more recently in the Phoenix Broadband Advisory Community and the No DPI forums has now come to the attention of The Register (TalkTalk turns StalkStalk to build malware blocker).  This one’s interesting – under the guise of harvesting URLs for future malware protection TalkTalk have been following their clients around the web.  El Reg:

It’s less TalkTalk, more StalkStalk: the UK’s second largest ISP has quietly begun following its customers around the web and scanning what they look at for a new anti-malware system it is developing.

Without telling customers, the firm has switched on the compulsory first part of the system, which is harvesting lists of the URLs every one of them visits. It often then follows them to the sites to scan for threats.

[…]

The new system is provided by Chinese vendor Huawei, and customers can’t opt out of the data collection exercise. As they browse the web, URLs are recorded and checked against a blacklist of sites known to carry malware. They are also compared to a whitelist of sites that have been scanned for threats and approved in the last 24 hours.

If a URL appears on neither list, Huawei servers follow the user to the page and scan the code. According to measurements by webmasters, the TalkTalk stalker servers show up between about 30 seconds and two minutes after TalkTalk subscribers.

Isn’t this clear copyright violation?  On guy in the PABC forums has requested the TalkTalk cease visiting his sites: they have refused to stop doing this, claiming they “reserve our rights to check your site for the protection of our users”.

It would seem that the URL harvesting takes quite a bit of information along with it.  TalkTalk claim that their crawler obeys robots.txt instructions, but from the evidence provided in the PBAC forums this isn’t actually true.  It would also seem that the process interferes with gamers’ online activities and prevents computers from being able to access the iTunes store (see for example this thread).