OCRed Kindle edition of Pinball Games

I just finished reading (on the beach) Pinball Games by George Eber, a book recommended (I think) by a year-end edition of Forbes magazine, dealing with the War and subsequent Russian occupation of Hungary. It was very informative and although it covers a sombre period, very amusing and sweet in places. Nevertheless, there were some very problems with the edition.

  1. Obvious OCR errors like "carne" instead of "came". This is not really acceptable in an e-book edition selling for more than $10 (US).
  2. Inconsistent and incorrect use of Hungarian diacritical signs with some place names having all, some only some and some none ("Mosonmagyarovar") of the correct accents. It is possible that this was the case in the printed edition as well.
  3. Inconsistent use of translations, especially of place and personal names and honorifics (sometimes "néni" without explanation and sometimes "aunt" and so on) which were probably inconsistent in the original manuscript and should have been fixed by the original editor.
The book was actually rather moving and one has to admire the fortitude of the author and other characters. It was also a beautiful insight into the lives of the bourgeoisie of pre-War Budapest and contained many interesting vignettes of economic life during the difficult years. Nevertheless, I think that the traditional publishing industry needs to produce a much higher-quality product if it wants to distinguish itself from new media but perhaps they have already given up...


TV licence system collapses on "Black Friday"

On the highly artificial (recently introduced) shopping event known as "Black Friday", the otherwise not very reliable SABC reports that "the TV Licence validation server was overwhelmed by the abnormal increase in validation requests from retailers due to the Black Friday deals which resulted in the server timing out between 07:00 and 10:30". Of course, there were no reports of point-of-sales systems or mobile networks going down... I will try to explain here why this breakdown was complete unnecessary.

Retailers have to see an SA ID document (or, sometimes, photocopy or photo of one) and verify that the person to whom the ID number belongs has a valid television licence. Now, this information is not updated frequently since a licence is valid for one year and, frankly, I don't think many new licences are being issued. It would therefore be sufficient to publish (and distribute) a list of ID numbers for which a valid licence exists but this presents two problems.

  1. It might be a bit too easy for someone buying a television set to just pick a number from the list and use it (in cooperation, probably, with retail staff).
  2. There might be (mild) privacy concerns.

Both issues would be addressed by publishing a list of hashed ID numbers and allowing retailers to store a local copy. A hash function is a one-way function f that would work as follows. First, the SABC publishes a list of all f(x) where x runs over the ID numbers of persons with a valid licence. For technical reasons, we might prepend the digit 1 to the ID number. The essence of the one-way function is that if you are given f(x) (the hashed value), it is in practice impossible to compute x from it although the forward calculation is quite easy.

The list of hashed ID numbers cannot be used therefore to extract any specific ID number but the holder of a valid licence could present their ID number x to the retailer who would quickly compute f(x) and compare f(x) to the published list, a copy of which the reailer will have. If this f(x) is on the list then there is a valid television licence and the retailer can go ahead to make the sale without consulting the SABC server.

It would be necessary to update the list with some (not very specific) frequency but this could be done at any time and will not disrupt sales to customers. In a follow-up post, I shall describe an example of such a one-way function. The image of the sometimes SABC CEO and high-school dropout above is used without permission but under the assumption that it constitutes fair use under SA copyright law. In fairness, I should say that I once had good service from the SABC when I needed to cancel a licence but my friends regard this as unusual and strange.



Hiperskakels en die saak Sanoma/Playboy teen GeenStijl

Die Europese Hof van Justisie het vroeër dié maand beslis dat die publiseer van skakels (dit wel sê, nie die inhoud self nie maar net 'n "link") na roofkopieë van foto's van 'n Playboy-model op die Nederlandse webwerf GeenStijl neerkom op inbraak op outeursreg deur GeenStijl. Die Hof het dus eintlik beslis dat die Internet-adresse van die foto's opsigself die eiendom van Sanoma, die uitgewers van Playboy, is. Die regter het bepaal dat "er sprake is van inbreuk, omdat GeenStijl als commerciële partij behoorde te onderzoeken of de foto's met toestemming online waren geplaatst". Dit is 'n problematiese besluit vir verskeie redes waarvan die meeste voor-die-hand-liggend is. Dit raak egter aan die basiese probleem van outeursreg vir digitaal verspreide media: elke gebruiker maak per definisie 'n perfekte (verder kopieerbare) afskrif van die materiaal. Slegs mense wat nie reeds bedruk is oor die agteruitgang van die Weste (en spelling) nie, word aanbeveel om GeenStijl te besoek.

'Uitspraak GeenStijl is een vervuiling van het auteursrechtelijke systeem'


Enkripsie is net wiskunde, behalwe dalk in La France

Volgens Frankryk se minister van binnelandse sake, Bernard Cazeneuve, is enkripsie 'n „sentrale” probleem in die geveg teen terrorisme. Gewilde kletstoepassings soos Telegram wat privaatgesprekke moontlik maak, word oënskynlik deur terroriste gebruik (net soos treine, skoene en ander normale dienste en voorwerpe) en maak onderskepping deur die owerhede onmoontlik. Ongelukkig vir die heer Cazeneuve (hiernaas afgebeeld) is enkripsie niks anders as 'n wiskundige algoritme nie en enige twee partye kan in beginsel 'n geënkripteerde verbinding bewerkstellig, ook sonder 'n spesifieke tussenganger soos Telegram. Trouens, Telegram is niks anders nie as 'n algoritme wat op beide se toestelle (toevallig, selfone) loop en die Internet gebruik om data oor te dra. Hy kan dus (i) die Internet; (ii) rekenaarprogramme of (iii) die basiese universiteitsvlak-wiskunde verban. Dan sal daar ook sommer geen GMail of Internet-bankdiens (of moderne Frankryk) wees nie. Blykbaar gaan dit minstens 'n spitsberaad met sy Duitse ampsgenoot verg om dié les te leer...

Bron: Bernard Cazeneuve veut une action internationale contre le chiffrement http://www.macg.co/ailleurs/2016/08/bernard-cazeneuve-veut-une-action-internationale-contre-le-chiffrement-95199


Financial services – a huge network effect?

According to Wikipedia, when "a network effect is present, the value of a product or service is dependent on the number of others using it". More precisely (for a positive effect): each new user increases, if only slightly, the value of the service for all existing users. Since I have been having a minor spot of trouble with a Bitcoin wallet provider, I have unfortunately realized that this network effects exists rather dramatically for financial services in the following obvious sense. If the remote and electronic financial service provider denies me access to my funds (which, admittedly, it does less frequently than my brick-and-mortar bank) then the immediate feeling is one of distinct discomfort that there will not be a substantial mob in my immediate vicinity to storm the (virtual) bank. Perhaps it matters little that the mob is distributed all over the planet but in this case one remains faced with the issue of (a) finding the other customers; and (b) finding something to storm.


Microsoft Azure "Hotel California" newsletter

It has been days now and I remain unable to unsubscribe from the Microsoft Azure newsletter and basically their response is to tell me that I should reboot...


Google's new penchant for webscraping

My understanding of webscraping is that it is a dodgy practice whereby one populates one's website with content simply retrieved from other sites (normally automatically, by software sometimes called "robots") instead of creating one's own. One interesting case in this regard was eBay v. Bidder's Edge, 100 F.Supp.2d 1058 (N.D. Cal. 2000) in which eBay obtained an injunction against a company that had basically copied the eBay auction content. The legal doctrine of trespass to chattels (possibly not well known in South Africa) applies. The recent Johannesburg High Court battle between News24 and Moneyweb touched on related issues.

This is actually very similar to what we do when we "share" items on Facebook but in that case, at least, it is clear why it happens and there is acknowledgement. What Google has started to do, I think, falls somewhere between these forms of sharing and scraping. 

Google searches have been returning more content and more targeted content from the top-rated search result in a special "Here's your answer!" box.

In short, Google Search tries to provide sufficient information to obviate the need for the user to actually visit the website indexed by the search engine. In the long run, this is obviously not really a model for sustaining a rich online environment but I would agree that the legal framework around intellectual property and copyright needs to adapt to the digital and online world. Google is certainly not afraid of murky water, given the rampant unlicensed redistribution happening on its YouTube platform.

It would be interesting to see how long it is before someone sues. French publishers, perhaps?


Viber robo-calling has started

I got my first robo-call on Viber today – ouch! A disembodied yet recognisably Anglo-Boer male voice wanting to sell... insurance. Please stop! ;-) The best solution is of course to move to a system where users can charge for incoming calls with differential rates.


Tripadvisor se hoërskoolopstelressensie-probleem

Baie van ons gebruik van tyd tot tyd Tripadvisor. In die afgelope tyd het berigte begin verskyn oor die probleem van onegte resensies op Tripadvisor en indien 'n mens in gedagte hou dat sowat 260 miljoen mense dit elke maand gebruik, dan kan 'n mens verstaan waarom daar 'n besigheid is in onegte resensies. Dink eenvoudig daaroor: wie gaan 'n resensie plaas? Sekerlik in die eerste plek mense wat 'n slegte ervaring gehad het. Amper niemand met 'n doodgewoon goeie ervaring gaan op Tripadvisor skryf, tensy hulle ly aan gevorderde restourantressensent-waan nie. Gewoonlik doen mense sulke goed op Facebook!

Ek het vinnig gaan kyk na die Tripadvisor-opmerkings vir 'n restourant in Pretoria (daar in die top-10) wat ek ken. Die lofsange van mense wat oënskynlik net één ressensie op Tripadvisor het, is opvallend. Dit lees ook soos hoërskoolopstelle: "[t]hey cater for young and old and they have a play park for kids with supervision." Die een hier regs lyk ietwat meer eg en minder generies maar daar is sekerlik baie maniere om hierdie te reël... Die hoofprobleem is dat daar amper geen insentief is om 'n normaal-positiewe resensie te skryf nie, tensy 'n mens geweldig verveeld is. Ek sou amper so ver gaan om myself to probeer oorreed om geen aandag aan goeie resensies te gee nie en slegs te kyk na die slegtes. Miskien is daar wel 'n gepaste markmeganisme om hierdie uit te sorteer maar solank mense vrywillig ressensies skryf of plaas (en skaars hoef aan te toon dat hulle eers wel by 'n plek was), gaan daar probleme wees. 


The US Computer Fraud and Abuse Act applies to YOU

I am researching an article about data and/as crime and the US Computer Fraud and Abuse Act is an important part of this. This Act was passed in 1986 and fairly clearly (then) attempted to limit its application to computers used by the US government or US financial institutions or computers affecting interstate commerce or communication in the USA. At the time, this would have been a very small fraction of the world's computers but if you are reading this online, it now applies to YOU since any device (all of them are computers, really) that is connected to the Internet is assumed to be covered by the definition of scope
‘the term “computer” means an electronic, magnetic, optical, electrochemical, or other high speed data processing device performing logical, arithmetic, or storage functions, and includes any data storage facility or communications facility directly related to or operating in conjunction with such device, but such term does not include an automated typewriter or typesetter, a portable hand held calculator, or other similar device;’
in the Act.  But it was clearly the intention of Congress to exclude devices in everyday use by the public and I think it would have been consistent with this wording to now restrict the Act's application to ‘high speed’ devices by which one would mean a small subset of general purpose computing devices.

To be clear: if you access someone's iPhone without there permission, you are committing a felony in the USA because that device is a ‘protected computer’ under this act. It specifically applies to anyone who ‘intentionally accesses a computer without authorization or exceeds authorized access, and thereby obtains... information from any protected computer.’ The sad case of the late Aaron Swartz who was arrested at MIT for downloading academic papers involved this (now) draconian law.