Google's new penchant for webscraping

My understanding of webscraping is that it is a dodgy practice whereby one populates one's website with content simply retrieved from other sites (normally automatically, by software sometimes called "robots") instead of creating one's own. One interesting case in this regard was eBay v. Bidder's Edge, 100 F.Supp.2d 1058 (N.D. Cal. 2000) in which eBay obtained an injunction against a company that had basically copied the eBay auction content. The legal doctrine of trespass to chattels (possibly not well known in South Africa) applies. The recent Johannesburg High Court battle between News24 and Moneyweb touched on related issues.

This is actually very similar to what we do when we "share" items on Facebook but in that case, at least, it is clear why it happens and there is acknowledgement. What Google has started to do, I think, falls somewhere between these forms of sharing and scraping. 

Google searches have been returning more content and more targeted content from the top-rated search result in a special "Here's your answer!" box.

In short, Google Search tries to provide sufficient information to obviate the need for the user to actually visit the website indexed by the search engine. In the long run, this is obviously not really a model for sustaining a rich online environment but I would agree that the legal framework around intellectual property and copyright needs to adapt to the digital and online world. Google is certainly not afraid of murky water, given the rampant unlicensed redistribution happening on its YouTube platform.

It would be interesting to see how long it is before someone sues. French publishers, perhaps?


Viber robo-calling has started

I got my first robo-call on Viber today – ouch! A disembodied yet recognisably Anglo-Boer male voice wanting to sell... insurance. Please stop! ;-) The best solution is of course to move to a system where users can charge for incoming calls with differential rates.


Tripadvisor se hoërskoolopstelressensie-probleem

Baie van ons gebruik van tyd tot tyd Tripadvisor. In die afgelope tyd het berigte begin verskyn oor die probleem van onegte resensies op Tripadvisor en indien 'n mens in gedagte hou dat sowat 260 miljoen mense dit elke maand gebruik, dan kan 'n mens verstaan waarom daar 'n besigheid is in onegte resensies. Dink eenvoudig daaroor: wie gaan 'n resensie plaas? Sekerlik in die eerste plek mense wat 'n slegte ervaring gehad het. Amper niemand met 'n doodgewoon goeie ervaring gaan op Tripadvisor skryf, tensy hulle ly aan gevorderde restourantressensent-waan nie. Gewoonlik doen mense sulke goed op Facebook!

Ek het vinnig gaan kyk na die Tripadvisor-opmerkings vir 'n restourant in Pretoria (daar in die top-10) wat ek ken. Die lofsange van mense wat oënskynlik net één ressensie op Tripadvisor het, is opvallend. Dit lees ook soos hoërskoolopstelle: "[t]hey cater for young and old and they have a play park for kids with supervision." Die een hier regs lyk ietwat meer eg en minder generies maar daar is sekerlik baie maniere om hierdie te reël... Die hoofprobleem is dat daar amper geen insentief is om 'n normaal-positiewe resensie te skryf nie, tensy 'n mens geweldig verveeld is. Ek sou amper so ver gaan om myself to probeer oorreed om geen aandag aan goeie resensies te gee nie en slegs te kyk na die slegtes. Miskien is daar wel 'n gepaste markmeganisme om hierdie uit te sorteer maar solank mense vrywillig ressensies skryf of plaas (en skaars hoef aan te toon dat hulle eers wel by 'n plek was), gaan daar probleme wees. 


The US Computer Fraud and Abuse Act applies to YOU

I am researching an article about data and/as crime and the US Computer Fraud and Abuse Act is an important part of this. This Act was passed in 1986 and fairly clearly (then) attempted to limit its application to computers used by the US government or US financial institutions or computers affecting interstate commerce or communication in the USA. At the time, this would have been a very small fraction of the world's computers but if you are reading this online, it now applies to YOU since any device (all of them are computers, really) that is connected to the Internet is assumed to be covered by the definition of scope
‘the term “computer” means an electronic, magnetic, optical, electrochemical, or other high speed data processing device performing logical, arithmetic, or storage functions, and includes any data storage facility or communications facility directly related to or operating in conjunction with such device, but such term does not include an automated typewriter or typesetter, a portable hand held calculator, or other similar device;’
in the Act.  But it was clearly the intention of Congress to exclude devices in everyday use by the public and I think it would have been consistent with this wording to now restrict the Act's application to ‘high speed’ devices by which one would mean a small subset of general purpose computing devices.

To be clear: if you access someone's iPhone without there permission, you are committing a felony in the USA because that device is a ‘protected computer’ under this act. It specifically applies to anyone who ‘intentionally accesses a computer without authorization or exceeds authorized access, and thereby obtains... information from any protected computer.’ The sad case of the late Aaron Swartz who was arrested at MIT for downloading academic papers involved this (now) draconian law.


Sipping a Shanghai surprise

This ad appeared several times on websites I visited (e.g. xe.com) over a period of several days earlier this month. It illustrates a phenomenon which I have also observed on AirBnB and perhaps elsewhere. The website obviously determines two different locations: one for the currency symbol ("R") and another for the actual currency (displaying "1199") so that we see the price for (say) London but with the currency symbol for (say) Johannesburg. The result is a $79 intercontinental return flight.

It could be that this is a bug in a specific script language or that the content management system somehow uses different script languages for the currency symbol and for the currency amount and that these two return different location values. Anyway, I am a bit surprised that this has not been fixed yet... or that Lufthansa (not so surprised) have not updated their website software.