Funny Natural Language

gettyimages-1195188037-170667a

There's no doubt that the world has made significant progress having machines learn how to speak, generate, learn, process, and understand natural language. Modern artificial intelligence and deep learning can extract important words, concepts and topics using Natural Language Processing (NLP).

It also can generate text or speech (though not always understandable by humans) using natural language generation (NLG), and understand written or spoken language using natural language understanding (NLU), to translate seamlessly between languages fast enough to provide natural language translation (NLT) while the words are being spoken.

 

One of the last bastions of human natural language, however, is humor. As the mischievous AI in the popular video game Portal2 once said: "Well, you know the old formula: Comedy equals tragedy plus time. And you have been asleep for a while. So I guess it's actually pretty funny when you do the math."                     

 Download    Play

 

To that end, I've collected a few humorous examples over the years that I generally categorize as funny misinterpretations using NLP. While it felt like a tragedy long ago when I couldn't get my university assignment to work correctly, or an embarrassing mistake in front of a customer, some of the examples are pretty entertaining if you do the math.  

 

Literal Translation

Many early natural language and machine translation programs were not able to recognize local or language-specific idioms. Elaine Rich's example relayed in [Artificial Intelligence, New York, McGraw-Hill 1984] describes one very famous problem. As the now-legendary example goes, the first sentence was translated from English into Russian and then back again.

 

  1. The spirit is willing but the flesh is weak
  2. The vodka is good but the meat is rotten

 

While some doubts on the provenance of the exercise remain, the example has lived as a cautionary example for those building natural language systems.

 

The Old Man and the Boats

My first experience with natural language parsing was in an undergraduate class on machine learning in the late 80s. The assignment was to write a natural language parser in Common Lisp. Our programs would then be tested on a series of well-formed, mal-formed, and trick questions.  

 

"I keep thinking about my favorite Hemingway novel. The sea. The old man. The boats."

 

It's definitely a well-formed, valid sentence in English. However, if your program cannot detect implicit subjects, verbs, and antecedents, it will fail on the sentence. Distilled down even more, who knew five little words and combinations thereof could be so ambiguous?

 

"The old man the boats."

 

Try giving that one to a natural language understanding program. Who is manning the boats?  The old. Most computers' modern natural language semantic parsers, and humans first assume the sentence is about "The old man" before re-evaluating the meaning. From my 7th grade English class, if you diagram the sentence, it's highly likely you'll have to use the eraser before you get to the final punctuation. 

     

      N                   V                     N
__old____|___man___|___boats____
     \                                             \
       The                                       the

 

Another way to look at it is to parse the sentence into CoNLL (Conference standard on Natural Language Learning) and Ascii-Tree format using Penn Treebank parts of speech tags.

 

Input: The old man the boats
Parse:
man NN ROOT
   +-- The DT det
   +-- old JJ amod
   +-- boats NNS dep
         +-- the DT det

 

If that's too difficult to parse, it's basically saying the noun-subject of the sentence is "man", which is incorrect. As mentioned above, "The old" are manning the boats and "man" in this context is a verb. 

 

Likewise, try diagramming the ones below.

"A sailing boat sails all boats."

"AI judges curb appeal."

 

The second one is funny as a friend of mine recently asked, "On seeing this title, did anyone else think it meant that a panel of judges declined to hear an appeal in litigation involving AI?" So which is it, "AI judges" curbing the appeal of a lawsuit? Or "AI" judging the curb appeal of a house? It's the second, but if you didn't know the context, I can see how it would be ambiguous.

 

Checking in with our off the shelf semantic parser, it again sees the verb as a noun. Here's the whole parse tree. (As an aside, you can generate your own parse trees on Algorithmia's Parsey McParseface demo.)

 

"Input: AI judges curb appeal.
Parse:
appeal. NOUN++NN ROOT
 +-- AI ADP++IN case
 +-- judges NOUN++NNS compound
 +-- curb ADJ++JJ compound
"

{
  "output": {
    "sentences": [
      {
        "words": [
          {
            "dep_relation": "case",
            "extra_deps": [
              ""
            ],
            "features": {
              "fPOS": "ADP++IN"
            },
            "form": "AI",
            "head": 4,
            "index": 1,
            "language_pos": "IN",
            "lemma": "",
            "misc": "",
            "universal_pos": "ADP"
          },
          {
            "dep_relation": "compound",
            "extra_deps": [
              ""
            ],
            "features": {
              "Number": "Plur",
              "fPOS": "NOUN++NNS"
            },
            "form": "judges",
            "head": 4,
            "index": 2,
            "language_pos": "NNS",
            "lemma": "",
            "misc": "",
            "universal_pos": "NOUN"
          },
          {
            "dep_relation": "compound",
            "extra_deps": [
              ""
            ],
            "features": {
              "Degree": "Pos",
              "fPOS": "ADJ++JJ"
            },
            "form": "curb",
            "head": 4,
            "index": 3,
            "language_pos": "JJ",
            "lemma": "",
            "misc": "",
            "universal_pos": "ADJ"
          },
          {
            "dep_relation": "ROOT",
            "extra_deps": [
              ""
            ],
            "features": {
              "Number": "Sing",
              "fPOS": "NOUN++NN"
            },
            "form": "appeal.",
            "head": 0,
            "index": 4,
            "language_pos": "NN",
            "lemma": "",
            "misc": "",
            "universal_pos": "NOUN"
          }
        ]
      }
    ]
  }
}

 

Once again ambiguity rules the day.  As one of my other friends mentioned to me recently, “A language is a dialect with an army and a navy” which is a famous quote from Max Weinreich, a specialist in Yiddish linguistics.  

 

It’s my hope that advances in natural language processing will lead to advances in natural language understanding and vice-versa.  For now they each to have their own edge-cases and tricky parts.   One of those tricky parts involves punctuation.  Consider the following sentences from an old riddle.

  1. There once was a horse that won great fame, what do you think was the horses name?
  2. There once was a horse that won great fame, what do you think was the horses name.
  3. There once was a horse that won great fame; what do you think was the horse’s name?
  4. There once was a horse, that won great fame, what do you think was the horse’s name?

Most NLP tools will try to strip out as munch punctuation as possible while preserving the semantics of the sentence.  “That won great fame”, “what do you think”, “There once”?  It quickly devolves into a game of “Who’s on first” depending on how well your natural language parsing technology is a preserving semantics.

 

The best resource for keeping up on all the techniques is a project by Elvis Saravia and Soujanya Poria who borrowed part of the material from Young et al. (2017).  You can find up to date links and promising materials here:

https://nlpoverview.com/ 

As technology advances to create better and better natural language processing computational techniques, the first thing I always do is throw the old “humorous” curveballs at it.  Idioms, riddles, tipping-point punctuation, informal language use, and dual-use noun-verbs-adjectives all make for interesting test cases. If you are using any type of computational natural language technique, it’s important to keep up on the latest developments as this is what we all face when dealing with natural language in the real world. 

 

2020-05-26_17-09-29

 

Greg Bolcer, CDO Bitvore

Greg Bolcer, CDO Bitvore

Greg is a serial entrepreneur who has founded three angel and VC-funded companies. He's been involved at an early stage or as an advisor to at least half a dozen more. Greg has a PhD and BS in Information and Computer Sciences from UC Irvine and a MS from USC.