Thursday, 16 February 2023

6 - Query Order & Efficiency



Here we'll look at how two prolog queries can be logically equivalent, and yet have differences in how efficiently they work.

The following simple program establishes some facts about the hairiness and colour of cats and dogs.


% Example 06 - Query Order & Efficiency

% hairy facts
hairy(dog). 
hairy(cat).

% colour facts
colour(dog, brown). 
colour(dog, black).
colour(cat, grey). 
colour(cat, white). 

Cats and dogs are hairy. Dogs can be brown or black, and cats can be grey or white.


Logically Equivalent Queries

The following query asks which X is white and hairy.


?- colour(X, white), hairy(X).

X = cat

The next query asks which X is hairy and white.


?- hairy(X), colour(X, white).

X = cat

Even though the ordering of the two parts of the query is different, the two queries are logically equivalent. That is, they are not asking different questions. Unsurprisingly, X=cat is the answer to both.


Procedurally Different Queries

Like all computer code, these queries have to run on computer hardware which is not logical but procedural, ultimately carried out step by step, one after another.

Let's look at the first query. We already know that prolog works with conjunctive queries left to right, so it will first try to solve the goal colour(X,white). This is easily matched with the database fact colour(cat,white) leaving X=cat. There are no other rules that match this goal so there are no other options for X. Moving to the next part of the query hairy(X) means prolog has to test hairy(cat) because the first part set X=cat. This is straightforward because hairy(cat) is a simple fact in the database. So X=cat is the solution.

Now let's work through the second query. This time the left-most part of the query is hairy(X). As we've seen before, prolog works by trying to unify its current goal with facts in the database. The first one it finds is hairy(dog), leaving X=dog. Moving on to the second part of the query colour(X,white), which is now colour(dog,white), prolog finds there are no facts that confirm this. So X=dog is rejected as a solution. As we know, prolog backtracks to the point where X is unbound, and tries to unify hairy(X) again. This time prolog finds hairy(cat), leaving X=cat as a possible solution. It moves on to the second part of the query colour(X,white), which is now colour(cat,white). This second part is easily proven because it exists in the database. So X=cat is the solution to the full query.

We can see that, although both queries arrive at the same answer, the amount of work is different. The first query takes fewer steps. The second has more steps because it tries and then discards X=dog as a possible solution.


A Strategy For Efficient Queries

Our example, although simple, does show that logically equivalent queries can be procedurally different, with some more efficient than others. Our example also suggests a strategy for efficient queries. 

We should try to order the parts of our queries such that options for variables are closed down as soon as possible. 

That is, variables should be bound, or grounded, to the correct values as far left in a conjunctive query as possible. This reduces wasted effort testing, and backtracking from, later parts of a query with ultimately incorrect values.

Later we'll see queries where this difference makes a big impact on the efficiency, or even feasibility, of solving them.


Tracing

We can ask prolog to print out the steps it takes when trying to solve a query. This is called tracing. You might be asking why we didn't use tracing sooner as it seems like a very helpful thing to do. Reading and understanding traces needs some familiarity with how prolog solves queries, and we didn't have that before.

We can turn on tracing by adding trace/0 to the left of a query. Let's do this with the first version of our query. 


?- trace, colour(X, white), hairy(X).

 Call:colour(_3872,white)
 Exit:colour(cat,white)
 Call:hairy(cat)
 Exit:hairy(cat)
X = cat

Let's walk through what prolog is reporting. We can see the first goal prolog tries to solve is colour(_3872,white). The _3872 is prolog's name for a new variable temporarily taking the place of X. The actual number is random, and is different with every run of the query. That's because prolog wants to keep this variable for its own internal working, and is not something it normally shares with us. Once prolog is happy with the value this internal variable should have, it will report it back as the value of X.

The second line of the trace shows prolog immediately unified the goal with colour(cat,white). The next line shows prolog trying to solve the second part of the query, hairy(cat). Again, prolog succeeds quickly because hairy(cat) is in the database. We can think of this query taking four prolog steps.

Let's now do the same for the second version of the query.


?-  trace, hairy(X), colour(X, white).

 Call:hairy(_4086)
 Exit:hairy(dog)
 Call:colour(dog,white)
 Fail:colour(dog,white)
 Redo:hairy(_476)
 Exit:hairy(cat)
 Call:colour(cat,white)
 Exit:colour(cat,white)
X = cat

Compared to the previous trace with 4 lines, this one has 8 lines. Let's step through them.

The first goal is hairy(_4086), which is hairy(X) but with an internal variable for X. Prolog unifies this with hairy(dog) in the database, so the internal variable _4086 is set to dog. Prolog then moves on to the next part of the query, which means it tries to prove the goal colour(dog,white). The fourth line of the trace shows this goal fails because it doesn't match anything in the database. 

The fifth line of the trace makes clear prolog is backtracking to try other values for X in hairy(X). Notice how the internal variable for X is a new one _476. This is prolog's own way of managing how it keeps track of the different values it tries for X. The sixth line of the trace shows hairy(_476) immediately unifies with hairy(cat), leaving X=cat. After that, the trace shows prolog tests the second part of the query colour(cat,white), and succeeds immediately.

Tracing is a useful tool when trying to fix, or debug, prolog code that doesn't appear to be doing what we want it to do. Even if we're just curious, and not debugging, tracing lets us see how prolog actually works. 


Key Points

  • Two prolog queries can be logically equivalent, because they ask the same question, but procedurally different, because the steps prolog takes to solve them are different.
  • A good strategy for an efficient query is to order its parts so that variables are bound to the correct values as soon as possible, that is, as far left as possible.
  • Prolog has a built in tracing capability to show the steps it takes trying to solve a query.
  • When working with variables we've provided, prolog uses its own internal variables to keep track of its own search for possible solutions.
  • These internal variables have the form _123, an underscore followed by a random number.