Grammar is like a Type System

In an analogy between natural language and programming language, grammar can be compared to a type system, which provides a great way to learn about one if you know about the other1.

natural language ~ programming language
——————— ——————————
grammar ~ type system

For example, consider the following comparison of a sentence in English that has a grammar error with an expression in Haskell2 that has an analogous type error:

'This sentence has problematic' ~ [1,2,3] `contains` even

The sentence has a grammar error because the word ‘has’ expects to be followed by a noun phrase (like ‘problems’) but what actually followed was an adjective, and analogously the Haskell expression has a type error because the function ‘contains’ expects to be followed by an object-like thing, in particular an integer (like ‘2’), but what actually followed was a predicate on integers (i.e. a function Int -> Bool). In summary, what both grammar and type systems are doing is restricting the ways you are allowed to combine together the ‘words’ in the language, by putting them into categories (i.e. giving them types), and restricting how you are allowed to combine the categories.

A natural question now, to make this analogy more precise, is what are the types in natural language? Are the types just the word categories we learn about in school grammar like nouns, adjectives, verbs, connectives etc? To a first approximation, yes. Except sometimes it is more natural to think of phrases, as opposed to just words, as having types. For example, anywhere we have a noun in a sentence we can always replace it by a phrase where the noun is preceded by an adjective, and still have a grammatical sentence; so it makes sense for phrases like this to have the same type as nouns – this leads to the NounPhrase type3. Other than NounPhrases most other types correspond to ‘function’ types, of which a few of the most important ones are given below:

type Adjective = NounPhrase -> NounPhrase
type Verb = NounPhrase -> VerbPhrase
type Adverb = VerbPhrase -> VerbPhrase
type VerbPhrase = NounPhrase -> Sentence
type Connective = Sentence -> Sentence -> Sentence

Let’s see an example of how this can be used to write sentences ‘as programs’ (in Haskell, or more generally in the lambda calculus)4. We start with the NounPhrase ‘cats‘. Since Adjective is a function NounPhrase -> NounPhrase, we can apply the adjective ‘cute‘ to get the NounPhrase ‘cute cats‘ (here function application is represented by a space). We can apply the verb ‘like‘ to get the VerbPhrase ‘like (cute cats)‘. Finally we can apply the VerbPhrase to another NounPhrase like ‘students‘ to get a sentence (sort of): ‘like (cute cats) students‘. Except it doesn’t read like a sentence yet… but we can get round this by using the infix operator ‘&5, which switches the order of function application, to equivalently write the sentence in a way which mirrors how it is read: ‘students &like (cute cats)‘.

The idea is that this would work whatever NounPhrases, Adjectives, Verbs etc you pick – so from a finite supply of atomic words/phrases you can generate infinitely many grammatical sentences6. The following gives some more examples, and the atomic words/phrases used to generate them7:

cats, students, they, ideas, children, grades, problems :: NounPhrase
cute, studious, good, noisy, green, colourless, that_take_breaks, better, hard :: Adjective
have, get, like, put_up_with, wait_for, eat :: Verb
work, sleep, take_breaks :: VerbPhrase
hard, furiously, better :: Adverb
and, because, when, if, but :: Connective

Students &like (cute cats)
Studious students &get (good grades) &because (they &(work &hard))
Cats &put_up_with (noisy children)
Students &put_up_with (hard problems) &and (they &wait_for (good ideas))
Cats &have (good ideas) &when (they sleep)
Colourless (green ideas) &(sleep &furiously)8
Students &(work &better) &if (they &take_breaks)
Students &that_take_breaks &have (better ideas)
Ideas &eat (green children)

So through this analogy we can develop a precise definition of what it means for a sentence to be grammatical; a sentence is grammatical if it can be written as a program, in the above way (for some use of (),&,_), that type checks. The analogy isn’t perfect – this isn’t the full picture of grammar as there are sentences which are grammatical in the above sense but aren’t grammatical for other reasons (of which I have given some examples in this footnote9) – but I think this captures the core of what a grammar is.

One more thing this analogy could help us with, is understanding what the purpose/benefit of grammar in natural languages is (or of types in programming languages). I think the benefit that both grammar and types have in common is that, in restricting the valid combinations in the language, they increase the likelihood of the sentence making sense (or in the case of programs, decrease the chance of it having bugs). But I think purpose is actually another place where the analogy breaks slightly, as the purpose of grammar in natural language is much more than just this. The key difference is that since natural language is spoken there is no way to explicitly group the words, whereas in a programming language, since it is written, words can be grouped with brackets or sub-procedures. The grammar provides this role in natural language since words form a group when they match the type that the parser in the brain is expecting. It is this ability to group words which makes a language so powerful – although it would still be possible for the language to generate infinitely many sentences without grouping10, there would be a limit on composability and hence complexity of sentences, as you could not construct complex phrases by combining together simpler phrases each with independent meaning. Since a type system is not needed in a programming language to achieve this, there are many examples of successful expressive functional programming languages without type systems1112, but I think it would have been much harder for natural language to have been a success without grammar.

In conclusion, we have used this analogy to get a precise understanding of the essence of grammar in terms of a type system in a functional programming language, but have also acknowledged where there are differences; grammar has more rules than just this, and as an independent point its purpose to the natural language is significantly more than the benefit that types provide to a programming language.

 


  1. …and both are ultimately a good way in to Category Theory. 
  2. Technically this is pseudo Haskell – pretend the function ‘contains‘ is defined in the obvious way as ‘flip elem‘ – kind of annoyed this isn’t built into the Haskell prelude. (Edit: Actually ‘contains‘ would be the natural name of ‘elem‘ if infix order was reversed, as I suggest it should be in the post ‘Infix Order Reversal’). 
  3. I first learnt about this concept of a NounPhrase, and many other concepts from Chomsky’s linguistic theory, in Steven Pinker’s excellent book ‘The Language Instinct’
  4. Most of what is presented here is I think actually an alternative presentation of the Chomsky Hierarchy, in terms of the lambda calculus. 
  5. This operator ‘&‘ exists in Haskell in the Data.Function module. 
  6. This is really cool; I am always impressed by the way it is possible to say sentences that nobody has ever said before – we are not just regurgitating things we have heard before, we are endlessly creating new sentences all the time. This is the power of composition to generate the infinite from the finite. 
  7. In the examples, I am using _ to denote a space in a phrase I want to be considered as atomic with respect to the grammar. 
  8. Note that not all grammatical sentences make sense – this is Noam Chomsky’s famous example of a sentence which is grammatical and yet makes no sense. Conversely it is possible to be able to infer the meaning of a sentence which is not grammatical, as in the case of ‘This sentence is problematic’. So being grammatical is neither necessary nor sufficient for meaning. 
  9. For example ‘red amazing coat’ is a grammatical noun phrase in the above sense, but it still doesn’t sound right and that’s because there are additional implicit rules about the order of adjectives preceding a noun. A different example is ‘The postman like cute cats’ – this is again grammatical in the above sense as ‘The postman’ is a noun phrase, but it doesn’t sound right because ‘like’ is not correctly inflected as ‘likes’ to match the ‘number’ of the noun phrase. (I purposely chose the set of words to all use the same case markers to avoid encountering this in the main text). So, a more accurate way to define a grammatical sentence, would be to say that a sentence which can be written as a program that type checks is grammatical up to ordering of adjectives, inflection, … + a bunch of other rules. 
  10. Such a language would be an associative discrete combinatorial system. 
  11. There are many examples of successful functional programming languages which do not use static types, for example Clojure or symbolic languages like Mathematica and Lisp; these languages correspond to the untyped lambda calculus. 
  12. While a type system is not needed in functional languages, understanding grammar could help us to write more readable programs by exploiting how our brain’s parsing system naturally works. (An example of where I have applied this is in the post ‘Infix Order Reversal’). 
This entry was posted in Category Theory, Functional Programming, Linguistics. Bookmark the permalink.

Leave a Reply