From tony@math.sunysb.edu Fri Oct 27 14:39:55 2000 Date: Thu, 26 Oct 2000 20:15:14 -0400 From: Tony Phillips To: ja-goldsmith@uchicago.edu, tony@math.sunysb.edu Subject: information on information Hello John I came across your Royaumont article while compiling a list of web resources on information theory to put with my web column this month. I'm planning it to be "The Mathematics of Communication" (to appear in http://www.ams.org/new-in-math which I edit and usually write). I have been interested in natural languages all my life and thought they could come together with my love of mathematics during those crazy days in the 50s when I worked with Yngve & Co. on MT as an MIT undergrad. I was soon disabused (by Lees himself who took me aside one day -during my summer job at IBM- and said, as I remember it, that there was no real hidden math and that TM was a chimera). But I did learn about information theory and that stuck with me. They had me cook up an optimal code for Russian. In those days IBM had a contract with the Air Force to provide a hard-wired Russian-English translating device. What I'm hoping you can do for me, and soon if possible, is let me know if there have been any useful studies of relative information content, say of syllables, across languages. For example, since Mandarin has a relatively small set of possible syllables (even counting tones), compared to English, one might think that the information per syllable must be lower in Mandarin, and that Mandarin speakers could/would speak more quickly and still be understood. Or, would have to speak more quickly to transmit the same information in the same amount of time. My personal axiom is that all spoken languages are equally efficient, but this may be wrong. Does anyone know one way or the other? I mean efficient in general. Clearly some particular things are more pithily expressed in one language than in another. Here's the kind of joke we used to tell. "The most interesting thing about any language is the way it resembles Russian." Tony Phillips From ja-goldsmith@uchicago.edu Fri Oct 27 14:40:36 2000 Date: Thu, 26 Oct 2000 17:34:02 -0700 From: John Goldsmith To: tony@math.sunysb.edu Subject: RE: information on information Well, I'm very pleased to make your acquaintance! I'd be very interested to hear more of your stories. Vic Yngve is a colleague of mine -- he retired a couple of years ago, I guess -- and all those MIT folks, like Chomsky and Halle, were teachers of mine (I was in grad school at MIT from '72 to '76). There's a whole long saga of information theory and linguistics over the last few years. Do you want the short version or the long? You can get a sense of what I'm doing in this vein if you take a look at a serious paper of mine, also on my web page; the home-page url is humanities.uchicago.edu/faculty/goldsmith, or you can go right to the paper at humanities.uchicago.edu/faculty/goldsmith/Linguistica2000/Paper/paper.html To get right to your question, the answer is, I don't know. I've been meaning to do something like what you ask about for some time, and haven't gotten around to it. What is really interesting, though, from where I stand, is using concepts of information theory to drive automatic language learning-- the kind that we in linguistics became persuaded, back in the late 1950s, would be impossible. The work of Jorma Rissanen, a mathematician who worked for IBM, in developing the Minimum Description Length framework went a long way towards clarifying how information theory and linguistic analysis could speak to one another. I'll see if I can give you some better answers about the phonological information content of words in some short order. I've just gotten off the plane in Seattle from Chicago, and I may not be able to get at it as quickly as I'd like, but I"ll try. -- John Goldsmith -----Original Message----- From: tony@math.sunysb.edu [mailto:tony@math.sunysb.edu] Sent: Thursday, October 26, 2000 5:15 PM To: ja-goldsmith@uchicago.edu; tony@math.sunysb.edu Subject: information on information Hello John I came across your Royaumont article while compiling a list of web resources on information theory to put with my web column this month. I'm planning it to be "The Mathematics of Communication" (to appear in http://www.ams.org/new-in-math which I edit and usually write). I have been interested in natural languages all my life and thought they could come together with my love of mathematics during those crazy days in the 50s when I worked with Yngve & Co. on MT as an MIT undergrad. I was soon disabused (by Lees himself who took me aside one day -during my summer job at IBM- and said, as I remember it, that there was no real hidden math and that TM was a chimera). But I did learn about information theory and that stuck with me. They had me cook up an optimal code for Russian. In those days IBM had a contract with the Air Force to provide a hard-wired Russian-English translating device. What I'm hoping you can do for me, and soon if possible, is let me know if there have been any useful studies of relative information content, say of syllables, across languages. For example, since Mandarin has a relatively small set of possible syllables (even counting tones), compared to English, one might think that the information per syllable must be lower in Mandarin, and that Mandarin speakers could/would speak more quickly and still be understood. Or, would have to speak more quickly to transmit the same information in the same amount of time. My personal axiom is that all spoken languages are equally efficient, but this may be wrong. Does anyone know one way or the other? I mean efficient in general. Clearly some particular things are more pithily expressed in one language than in another. Here's the kind of joke we used to tell. "The most interesting thing about any language is the way it resembles Russian." Tony Phillips From tony@math.sunysb.edu Fri Oct 27 14:40:50 2000 Date: Thu, 26 Oct 2000 20:39:36 -0400 (EDT) From: Tony Phillips To: John Goldsmith Cc: Tony Phillips Subject: RE: information on information Wow. Thanks for your speedy answer. I'll look up the paper you mention and I'll be grateful for more if you can send it. Tony