Why I'm not afraid of Google

Sure, Google may be beating up Yahoo and Microsoft and taking their lunch money, and it may be plagiarizing part of its Pinyin IME wordbank from Sogou, and it may be growing and spreading into every industry it can find — but I’m pretty sure that its machine translation systems aren’t going to be putting me out of a job any time soon, high BLEU scores or not. (Once the robot revolution comes, my parsing abilities will make me useful to our benevolent metallic overlords.)

Consider Google Translate’s rendition of the end of my latest Chinese-language blog post:

今天晚上在回家的路上,看到一位遛狗的老太婆边抖着缠在她腿上的狗链,边对身旁的小京巴儿骂骂咧咧:“你他妈好好儿走啊!你他妈不好好儿走我再也不带你出去!”

This evening on his way home, walk the dog to see a side Douzhao the old lady in her legs wrapped around the dog chain, Pin right side of the small Palestinian children Mamaleilei Beijing : “you damn thoroughly go ah you damn not properly take I do not take you away!”

Particularly precious to me: the rendering of 小京巴儿 (“small Pekingese [dog]“) as “small Palestinian children.” Also, the simply bizarre choices that the algorithm makes in its attempt to find matching passages — the interpretation of 边…边 (literally “side,” but a very common construction meaning “[doing something] while [doing something]“), for example, or the default to using pinyin for unfamiliar words.

Here’s what it should’ve said:

 On my way home this evening, I saw an old lady out walking her dog, which was winding its leash around her legs, and yelling at the little Peke at the same time: “Dammit, walk right! If you don’t walk right, I’m not going to take you out anymore!”

But to be honest, I prefer Google’s version, because a day without Dada is like fish ventricle capacitor.

18 thoughts on “Why I'm not afraid of Google

  1. I was talking to someone about this recently. Translation is one of the more futureproof jobs, I think, but by far not the most. I figure in 20 years or so they’ll have systems good enough that all that will be needed are good editors.

    Of course you write, as well. I think creative generation of content, rather than just translation, is a whole lot further down the road.

  2. What I worry about (as a reader) is that the time will come when machine translation is almost-but-not-quite-there, and some global media company will decide that it’s in its best interest to fire all of its translators and instead prepare a new style-guide that demands its writers tailor their compositions to the peculiarities of the software heuristics. Readers will be left with the feeling of something machined, even in the original language.

  3. Even if a computer is programmed for every possible grammatical, syntactical, and lexical possibility, it would never be able to handle creative usages of the language, such as a play-on-words, the intentional misuse of a word or phrase, or most slang.

    So, twenty years down the line, even if a computer did a make a functional translation of, say, a novel, I imagine that it would take just as much time–if not more–for an editor to fix all of the problems than it would take for or a skilled translator to just directly translate the material.

  4. “because a day without Dada is like fish ventricle capacitor.”

    LOL. I think I’m going to giggle for a week over that one. Great post.

    Although, I suspect we’ll see creative content out from machines BEFORE we see translation. The former is less demanding — critical judgment is more demanding a function than production of something new. After all, new write does not necessarily have to contain plays on words, partial references to common idioms, obscure cultural knowledge, etc. It may actually be easier for a machine to produce….

    Michael

  5. Credit where credit’s due — I stole that line from Jamie Zawinski.

    I remember seeing an article in Scientific American or somewhere years ago about computers generating content. There were a few sample texts included — short stories generated via Markov chains and other things I don’t understand. Don’t know if the technology’s advanced yet.

  6. I had tried many online translation service, most of which are free, Google is perhaps the most one i used .It’s seems more accurate , but sometimes there exists some mistakes inevitably , i have to correct them by myself (when i do papers in english).

  7. Tonight in goes home on the road, saw shakes to a dawdle dog’s old lady is entangling on her leg the dog chain, is foul-mouthed to nearby body small Beijing Pakistan: “Your his mother walks well! Your his mother walks I not to lead you not well to exit again!”

    Babelfish’s attempt. A little better here, a little worse there.

  8. My firm’s general mail box constantly receives e-mails in Chinese, Russian, Spanish, German, and Korean and I love translating them using an online translator. I can get the gist most of the time, but whenever I try to respond in the foreign language, even if it is just to say someone from my office will be contacting the sender tomorrow, those in my office who actually speak the language fluently tell me never to do it again. And I no longer do.

  9. Of course, now that you’ve put “Pekingese” and “小京巴儿” on the same webpage (and I’ve done it again in your comments), we’ve just pushed up the statistical likelihood that these are parallel translations.

    I am very, very impressed by the stuff that Franz Och and the MT labs at Google are doing. The problem is that they don’t know anything about grammar and are apparently uninterested in learning it since it makes it more difficult to create generalizable systems. So we do with second best machine translation between two dominant global languages.

    Someone else will come along and do a better job, but we’ll probably have to hold our breath. There aren’t any resources out there with readily available, massive parallel texts, and the people who are being funded to develop them aren’t interested in sharing. Industrial policy, see?

  10. I’d assumed that the problem was that they were running the live translation service on a much, much smaller corpus — and judging by the reading of 巴 as “Palestinian,” it must be a corpus of mostly news articles. I did actually go back in and suggest translations for large chunks of that blog post – doing my bit in the machine revolution – but I’ve never seen any evidence that the suggested translations get applied to the live translation engine.

  11. I will happily eat a pile of steaming dog poop on the day that a machine is capable of independently translating a work of literature fluently.

  12. Hey Brendon,
    Do you still take on translation jobs? Whats your going rate? I sent you an email, but got no reply.

    if you don’t translate anymore, do you have any recommendations?

    Thanks.

  13. Google need to do some marketing, I never even knew they offered text translations, independent of their webpage search translation.

    Google’s Version now:
    Today at the home of the road and saw a dog-old female detainees in her legs wrapped around the dog chain, while right next to the small Palestinian children Detroit Beijing: “you damn properly schedule! You damn not properly take I do not take you out! ”
    http://translate.google.com/translate_t

    And my usual source at Worldlingo:
    Tonight in goes home on the road, saw shakes to a dawdle dog’s old lady is entangling on her leg the dog chain, is foul-mouthed to one’s side small Beijing Pakistan: “Your his mother walks well! Your his mother walks I not to lead you not well to exit again!”
    http://www2.worldlingo.com/en/products_services/worldlingo_translator.html

    Pakistan, Palestinian, Detroit, strange days.

Leave a Reply