Wednesday, 15 February 2023

5 - Generating Simple Sentences



In this example we'll practice using the power of unification, and see how easy it is for prolog to generate simple grammatically correct English sentences.


% Example 05 - Generating Sentences

% subjects, verbs and objects
subject(john). 
subject(jane).
verb(eats). 
verb(washes).
object(apples). 
object(spinach).

% sentence = subject + verb + object
sentence(X,Y,Z) :- subject(X), verb(Y), object(Z). 

% sentence as a list
sentence(S) :- S=[X, Y, Z], subject(X), verb(Y), object(Z).

Before we talk about the code, let's see how simple English sentences are constructed.


Simple English Sentences

English sentences can be as simple as “Emma drives buses” or “James eats plums”. The grammatical structure is subject, verb and object



The first part of the code creates simple facts, establishing that john and jane are subjects, eats and washes are verbs, and apples and spinach are objects. 


Testing Sentences Are Well-Formed

The rule sentence(X,Y,Z):-subject(X),verb(Y),object(Z) says that X, Y and Z form a sentence if X is a subject, Y is a verb, and Z is an object. We recognise this as a conjunctive rule where the head is only true of all the three parts of the body are true.

We can use this rule to test if a sentence is well-formed. For example, “john eats apples”.


?- sentence(john, eats, apples).

true

Prolog uses this rule to check if john is a subject, eats is a verb, and apples is a subject. Because they are, the sentence conforms to the grammar we've defined. 

Let's try the sentence “john apples eats”.


?- sentence(john, apples, eats).

false

Prolog tells us this sentence isn't well-formed. Although subject(john) is true, verb(apples) is not, so the rule defining sentence can't be satisfied.


Generating Simple Sentences

One of the benefits of logic programming is that our definitions are often strong enough not just to test candidate solutions, but also to generate them. 

We can ask prolog which X, Y and Z result in a well-formed sentence.


?- sentence(X, Y, Z).

X = john
Y = eats
Z = apples

Just like our previous examples, Prolog uses unification with the database of facts and rules to find which values of X, Y and Z satisfy sentence(X,Y,Z).

The first combination it finds is X=john, Y=eats, and Z=apples, corresponding to the English sentence “john eats apples”. As you would expect, there are more combinations that also satisfy sentence(X,Y,Z). We need to prompt prolog to give them to us. 

The following table lists all eight combinations prolog finds.


X
Y
Z
john
eats
apples
john
eats
spinach
john
washes
apples
john
washes
spinach
jane
eats
apples
jane
eats
spinach
jane
washes
apples
jane
washes
spinach


With only a single rule to define a simple grammar, and a little bit of vocabulary, prolog can generate well-formed sentences. That's quite impressive.


What vs How

Notice how our prolog program defined what a grammatically correct sentence is. It didn't describe how to build a sentence step-by-step. That is, we didn't write detailed code to fetch an object from a list of objects, then a verb from another list, then a subject, and finally join them together to form a sentence.

Prolog is a declarative programming language, encouraging us to describe a problem, and letting the language solve it. In contrast, many popular languages like C or python are imperative, requiring us to write in code step-by-step instructions on how to build an answer to a problem. 


Chronological Backtracking

Before we move on, it is worth looking again at these combinations and recognising the process of search and backtracking at work. 

The very first combination matches the order of facts found in the database. The remaining combinations are found by backtracking and finding new values for the variables X, Y and Z.

Backtracking works by undoing the binding of variables, with the most recently set variable unbound first. This strategy is called chronological backtracking

Here, this means it is the variable Z, most recently bound to apples, which is backtracked to being unbound, and then bound to spinach

There are no more options to try for Z, so the next backtracking is for the middle part of subject(X),verb(Y),object(Z), unbinding Y from eats and setting it to the next option washes. Having set Y to washes, Z can again be matched to apples, backtracked and matched to spinach. This gives us the first four combinations.

The last four combinations are from prolog backtracking all the way back to the first part of subject(X),verb(Y),object(Z), unsetting X from john and binding it to jane.


Asking Questions

There is even more we can do with this simple program. We can ask “what can Jane do with apples?”


?- sentence(jane, Y, apples).

Y = eats
Y = washes

Jane can eat or wash apples. Although this example is trivially small, we can start to see the power of asking questions of a sufficiently well described system.

Another example, “who can wash spinach?”


?- sentence(X, washes, spinach).

X = john
Y = jane

Both John and Jane can wash spinach.


Sentences as Lists

Trying to read or write sentences as X=john, Y=eats, and Z=apples as not very comfortable. A more comfortable way is to use lists.

In prolog, like many other languages, a list is just a collection of things. In code these things are separated by commas, and enclosed in square brackets. For example, [john, eats, apples] is a list of three things.

In a list, the order of things is important. The list [john, eats, apples] is not the same as the list [apples, eats, john]. Repetition also matters, so the list [john, eats, apples] is not the same as [john, eats, apples, apples].

The last line of code in our program appears to create a conflicting rule for sentence. Let's look again at the two rules.


sentence(X,Y,Z) :- subject(X), verb(Y), object(Z). 
sentence(S) :- S=[X, Y, Z], subject(X), verb(Y), object(Z).

The two rules don't clash because prolog sees them as different. Let's see why.

The first definition for sentence takes three parameters, X, Y and Z. We say it has an arity of 3. Prolog experts write sentence/3 to make this clear.

The second definition for sentence takes just one parameter, S. Writing it as sentence/1 makes clear it has an arity of 1.

Prolog sees sentence/3 and sentence/1 as entirely different because they have different arities. They may as well be elephant/3 and giraffe/1.

The bodies of the two definitions look similar. They both say that X must be a subject, Y must be a verb and Z must be an object. But in addition, sentence/1 says that S must be a list consisting of X, Y and Z - in that order, and with no duplicates.

Let's test a sentence in a list.


?- sentence([john, washes, spinach]).

true

Prolog retains its ability to find values for variables when they are in a list.


?- sentence([john, Y, spinach]).

Y = eats
Y = washes

Finally, let's ask prolog to find all the valid sentences, this time as lists.


?- sentence(S).

S = [john, eats, apples]
S = [john, eats, spinach]
S = [john, washes, apples]
S = [john, washes, spinach]
S = [jane, eats, apples]
S = [jane, eats, spinach]
S = [jane, washes, apples]
S = [jane, washes, spinach]

Much better! Sentences as lists are much more comfortable to read and write.


Key Points

  • Prolog tries to find multiple answers to conjunctive queries with chronological backtracking, from right to left.
  • Prolog lists are written as [a,b,c]. Order and repetition matters, so [a,b,c] is not the same as [c,b,a], nor [a,a,b,c].
  • Prolog sees rules with the same property name, but with different arity, as entirely different rules. Arity is the number of parameters a property takes. For example, sentence(X,Y,Z) is different from sentence(S).
  • It is traditional for properties to be written with the arity, for example sentence/3.