Why I'm not afraid of Google

Sure, Google may be beating up Yahoo and Microsoft and taking their lunch money, and it may be plagiarizing part of its Pinyin IME wordbank from Sogou, and it may be growing and spreading into every industry it can find — but I’m pretty sure that its machine translation systems aren’t going to be putting me out of a job any time soon, high BLEU scores or not. (Once the robot revolution comes, my parsing abilities will make me useful to our benevolent metallic overlords.)

Consider Google Translate’s rendition of the end of my latest Chinese-language blog post:


This evening on his way home, walk the dog to see a side Douzhao the old lady in her legs wrapped around the dog chain, Pin right side of the small Palestinian children Mamaleilei Beijing : “you damn thoroughly go ah you damn not properly take I do not take you away!”

Particularly precious to me: the rendering of 小京巴儿 (“small Pekingese [dog]”) as “small Palestinian children.” Also, the simply bizarre choices that the algorithm makes in its attempt to find matching passages — the interpretation of 边…边 (literally “side,” but a very common construction meaning “[doing something] while [doing something]”), for example, or the default to using pinyin for unfamiliar words.

Here’s what it should’ve said:

 On my way home this evening, I saw an old lady out walking her dog, which was winding its leash around her legs, and yelling at the little Peke at the same time: “Dammit, walk right! If you don’t walk right, I’m not going to take you out anymore!”

But to be honest, I prefer Google’s version, because a day without Dada is like fish ventricle capacitor.

Comments (17)

  1. John B wrote::

    I was talking to someone about this recently. Translation is one of the more futureproof jobs, I think, but by far not the most. I figure in 20 years or so they’ll have systems good enough that all that will be needed are good editors.

    Of course you write, as well. I think creative generation of content, rather than just translation, is a whole lot further down the road.

    Tuesday, May 15, 2007 at 12:58 pm #
  2. zhwj wrote::

    What I worry about (as a reader) is that the time will come when machine translation is almost-but-not-quite-there, and some global media company will decide that it’s in its best interest to fire all of its translators and instead prepare a new style-guide that demands its writers tailor their compositions to the peculiarities of the software heuristics. Readers will be left with the feeling of something machined, even in the original language.

    Tuesday, May 15, 2007 at 2:20 pm #
  3. kmm wrote::

    Even if a computer is programmed for every possible grammatical, syntactical, and lexical possibility, it would never be able to handle creative usages of the language, such as a play-on-words, the intentional misuse of a word or phrase, or most slang.

    So, twenty years down the line, even if a computer did a make a functional translation of, say, a novel, I imagine that it would take just as much time–if not more–for an editor to fix all of the problems than it would take for or a skilled translator to just directly translate the material.

    Tuesday, May 15, 2007 at 5:59 pm #
  4. Guan Yang wrote::

    zhwj: A lot of technical writing (manuals for computers, machines, and such) is already done that way.

    Tuesday, May 15, 2007 at 6:44 pm #
  5. Josh wrote::

    “because a day without Dada is like fish ventricle capacitor.”

    Thank you so much. That was my day.

    Tuesday, May 15, 2007 at 7:07 pm #
  6. “because a day without Dada is like fish ventricle capacitor.”

    LOL. I think I’m going to giggle for a week over that one. Great post.

    Although, I suspect we’ll see creative content out from machines BEFORE we see translation. The former is less demanding — critical judgment is more demanding a function than production of something new. After all, new write does not necessarily have to contain plays on words, partial references to common idioms, obscure cultural knowledge, etc. It may actually be easier for a machine to produce….


    Tuesday, May 15, 2007 at 11:06 pm #
  7. Credit where credit’s due — I stole that line from Jamie Zawinski.

    I remember seeing an article in Scientific American or somewhere years ago about computers generating content. There were a few sample texts included — short stories generated via Markov chains and other things I don’t understand. Don’t know if the technology’s advanced yet.

    Wednesday, May 16, 2007 at 10:14 am #
  8. AMY wrote::

    I had tried many online translation service, most of which are free, Google is perhaps the most one i used .It’s seems more accurate , but sometimes there exists some mistakes inevitably , i have to correct them by myself (when i do papers in english).

    Wednesday, May 16, 2007 at 4:11 pm #
  9. Altavista wrote::

    Tonight in goes home on the road, saw shakes to a dawdle dog’s old lady is entangling on her leg the dog chain, is foul-mouthed to nearby body small Beijing Pakistan: “Your his mother walks well! Your his mother walks I not to lead you not well to exit again!”

    Babelfish’s attempt. A little better here, a little worse there.

    Wednesday, May 16, 2007 at 6:35 pm #
  10. My firm’s general mail box constantly receives e-mails in Chinese, Russian, Spanish, German, and Korean and I love translating them using an online translator. I can get the gist most of the time, but whenever I try to respond in the foreign language, even if it is just to say someone from my office will be contacting the sender tomorrow, those in my office who actually speak the language fluently tell me never to do it again. And I no longer do.

    Thursday, May 17, 2007 at 8:25 am #
  11. trevelyan wrote::

    Of course, now that you’ve put “Pekingese” and “小京巴儿” on the same webpage (and I’ve done it again in your comments), we’ve just pushed up the statistical likelihood that these are parallel translations.

    I am very, very impressed by the stuff that Franz Och and the MT labs at Google are doing. The problem is that they don’t know anything about grammar and are apparently uninterested in learning it since it makes it more difficult to create generalizable systems. So we do with second best machine translation between two dominant global languages.

    Someone else will come along and do a better job, but we’ll probably have to hold our breath. There aren’t any resources out there with readily available, massive parallel texts, and the people who are being funded to develop them aren’t interested in sharing. Industrial policy, see?

    Thursday, May 17, 2007 at 11:37 pm #
  12. I’d assumed that the problem was that they were running the live translation service on a much, much smaller corpus — and judging by the reading of 巴 as “Palestinian,” it must be a corpus of mostly news articles. I did actually go back in and suggest translations for large chunks of that blog post – doing my bit in the machine revolution – but I’ve never seen any evidence that the suggested translations get applied to the live translation engine.

    Thursday, May 17, 2007 at 11:53 pm #
  13. yz wrote::

    “because a day without Dada is like fish ventricle capacitor.” what does this mean anyway?

    Friday, May 18, 2007 at 12:27 pm #
  14. Matt wrote::

    I will happily eat a pile of steaming dog poop on the day that a machine is capable of independently translating a work of literature fluently.

    Friday, May 18, 2007 at 6:42 pm #
  15. Mark wrote::

    That’s easy for you to say, Matt. You blog anonymously. Nobody will be able to hold you to it.

    I give it 20 years, max.

    Sunday, June 3, 2007 at 5:21 pm #
  16. Chubby wrote::

    Hey Brendon,
    Do you still take on translation jobs? Whats your going rate? I sent you an email, but got no reply.

    if you don’t translate anymore, do you have any recommendations?


    Wednesday, June 6, 2007 at 2:13 pm #
  17. Panda wrote::

    Google need to do some marketing, I never even knew they offered text translations, independent of their webpage search translation.

    Google’s Version now:
    Today at the home of the road and saw a dog-old female detainees in her legs wrapped around the dog chain, while right next to the small Palestinian children Detroit Beijing: “you damn properly schedule! You damn not properly take I do not take you out! ”

    And my usual source at Worldlingo:
    Tonight in goes home on the road, saw shakes to a dawdle dog’s old lady is entangling on her leg the dog chain, is foul-mouthed to one’s side small Beijing Pakistan: “Your his mother walks well! Your his mother walks I not to lead you not well to exit again!”

    Pakistan, Palestinian, Detroit, strange days.

    Saturday, September 29, 2007 at 3:48 pm #

Trackback/Pingback (1)

  1. Fucking Stationery at bokane.org on Monday, October 29, 2007 at 1:19 pm

    […] and (2) This is not a mistake that any human being would make, but it does seem to be the kind of weirdness that machine translation is so excellent at […]