Stay tuned ...

© Copyright Oriole Corporation, 2001

It's very surprising to see how normally logical people, often seem to leave their logic on the doorstep when they enter the world of SQL programming. There's no magic in tuning and while experience helps, you don't need decades of Oracle experience to write efficient queries - just a bit of logic and some thought about how you would do it by hand.

The library analogy

Let's take an example to illustrate our point and an example where we can have Oracle closely mimic hard reality - or the reverse, depending on your point of view.

Tables in a relational database management system can be compared to books in a public library. Datafiles are the place where everything is stored but where you cannot access information directly - just like the shelves. Memory is where everything happens - let's say that that's the table (the wooden type) in the library. To search for something in a book, you must ask the librarian, who, like the data dictionary in Oracle, knows where everything is located. The librarian or yourself arriving at the right shelf, looking for the book and bringing it back to the table is a kind of I/O.

If you are doing serious research, you will have to pore through many books. Quite obviously, if you have a big wooden table where you can spread everything out, you'll need fewer I/Os than if all you have is a stool in a corner and you have to hold the books on your knees - and Oracle will perform better if it's given more memory to hold the tables temporarily. But it is no less true that this is not the prime factor in performance. Most important is that the book has a good index, allowing you to check whether, in spite of an evocative title, what you want really is in the book. Some books even have several indexes, a general one and a secondary one for place names, for instance.

A simple practical example

Let's imagine that what we want to do is simply write down a list of all of Shakespeare's plays which are set in Italy. Unless you are Italian or a Shakespearean scholar already, you may be somewhat uncertain about geography, so we can say that what you need for your research is a book of Shakespeare's works as well as a European Atlas (we know that we are going to disappoint a number of Americans but the action of 'The Merchant of Venice' does not, actually, take place in California).

For the sake of demonstration, we are going to switch immediately to the Oracle equivalent, which we can describe, for Shakespeare's plays, as a table

PLAYS
   TITLE
   ACT
   SCENE
   LOCATION

(a bit crude as a representation of the Bard's works but sufficient for our purpose). Our symbolic Oracle European Atlas can similarly be represented by

EUROPEAN_CITIES
   TOWN
   COUNTRY

When asked the question ' which plays take place in Italy', many SQL programmers will write:

SELECT TITLE
FROM PLAYS
WHERE LOCATION IN (SELECT TOWN
                   FROM EUROPEAN_CITIES
                   WHERE COUNTRY = 'ITALY')

and will happily run it. It won't take ages, because Oracle works fast. Hmmm, not quite right, since as several scenes take place in Italy, you will get umpteen rows displaying Othello, The Taming of the Shrew or Romeo and Juliet, among others. Ugly, ugly.

So the programmers will very quickly 'improve' their query by writing:

SELECT DISTINCT TITLE
FROM PLAYS
WHERE LOCATION IN (SELECT TOWN
                   FROM EUROPEAN_CITIES
                   WHERE COUNTRY = 'ITALY')

and hey presto, we get the 'good' result. Well, that was an easy one. Next question?

Not so fast. Let's get back to our books and let's try to do the very same thing manually.

If we want to 'run' ('walk' would be more exact) the SQL query above by hand, the very first thing you'd do is to establish a list of all Italian towns, since this is exactly what the subquery does. Ready with your pen and paper? You'd better take a big stack of paper, actually, because there are a lot of them. Let's take this European Atlas. Doh! There's no entry for 'Italy' listing all the towns - the index is by town, and will say 'Padova (Padua), (I)', sandwiched between Padola, another Italian town, and Padragkut, a Hungarian one. You then have the choice between two methods, either locating the pages where the various maps of Italy are located, writing down every tiny spot, or search the index from beginning to end looking for every name followed by (I) and writing that down - probably a better solution since you will at least get an ordered list which will be easier to search. We hope that at this stage, at least a shadow of a doubt is beginning to cross your mind about the wording of the query.

Several hours later, when we have the full list of towns, we can take our Shakespeare volume and try to answer the question. Do we have an index of place names in the book? Nope! All we have is 'Contents' at the beginning, listing 'The Plays', starting with The Tempest followed by The Two Gentlemen of Verona down to Pericles. So all we can do is take all the plays one by one, look for location indication ('SCENE I - On a Ship at Sea' - we shall assume that we are not in Italian territorial waters), comparing it with our paper list of Italian towns, and making a note every time we find one. If we do it as stupidly as Oracle, we will note the same play as many times as a scene is located in Italy, then we'll do a final pass and remove every multiple entry.

Still feeling a bright SQL programmer? We can note in this case that if we are trying to do it on a stool, it won't be easy to handle the list of Italian towns. Asking for a very big table will certainly help. You could also buy an expensive fancy pen which writes faster (which is usually management's approach). Or feed the librarian with vitamins, so that he runs faster when bringing the books. But quite probably, doing it more cleverly would help much much more.

Let's get back to SQL. Well, we learnt something else during that SQL course, that good old relational join. What about:

SELECT DISTINCT TITLE
FROM PLAYS P,
     EUROPEAN_CITIES EC
WHERE EC.COUNTRY = 'ITALY'
  AND EC.TOWN = P.LOCATION

STOP! Instead of trying to remember SQL tricks and what the instructor said, how would YOU do it in practice, with books?

Since our Shakespeare book lacks an index, we shall have to scan it anyway (another solution would be to look for another edition which HAS an index). So, what we are going to do is to open the book at the first play, look for 'SCENE', check the name of the place and look for it in the Atlas. If it's an imaginary place we shall find no entry. If it's a real one, we'll look for the country. If the town is an Italian one, well, no need to go further with this play, we can jump to the next one.

This translates easily into SQL and uses a correlated subquery. What does this mean? We have both books open simultaneously and everytime we find a location specified, we check the atlas - our atlas search (a subquery) depends on, or is correlated with, every result from our play search. All we want to check is that for a given play there EXISTS a scene taking place in Italy:

SELECT TITLE
FROM PLAYS P
WHERE EXISTS (SELECT TOWN
              FROM EUROPEAN_CITIES EC
              WHERE EC.TOWN = P.LOCATION
                AND EC.COUNTRY = 'ITALY')

For a fuller explanation of correlated subqueries refer to Trevor Trotman's paper in the Library section of this site.

Believe us, this will go MUCH faster than the first solution, whether you do it by hand or with Oracle. The difference may not be striking on a small volume of data - if it takes 0.2 seconds instead of 2.5, it will feel better but not enormously. However, on large volumes, it is likely that 2 seconds instead of 20 will be noticeable (and quite often, the factor is closer to 100 than to 10 between a properly and a poorly written query. We have even seen one job reduced from 16 hours to 10 minutes). Also remember that the reason it works, is only because we have an index of names in the Atlas. Imagine, just a moment, that we have no Atlas, just a map of Italy. The solution above would be terrible, because for each scene we would have to look all over the map. It would certainly be better then, to establish our own list of Italian towns, to order it to make it more easily searchable and quite possibly to build a list of distinct scene locations as well, ordered too and then put the two lists side by side and look for matches.

Wrestling with the optimiser

The case above is a relatively simple one; nevertheless, thinking how you would do it from books may save the day. The fact is that Oracle proceeds almost no differently from yourself - only faster. Sometimes, however, its speed makes it use algorithms that are less than intuitive. And when you write a query which seems to you adequate and check under SQL*Plus (with SET AUTOTRACE ON) how it is executed, you may be surprised to discover that what Oracle actually did was definitely not what you had in mind. Quite often, it will be fast enough anyway. However, you will probably encounter a number of cases when the choice made by the optimiser is very bad indeed. To the credit of the optimiser, one must underline that it must find the 'best' execution plan very quickly - the time spent looking for it will be, in most cases (i.e. except for stored, precompiled procedures) part of the overall response time - and it's a devilishly complicated task (one consequence of this is that it is continually evolving and different Oracle releases sometimes exhibit different behaviours for the same query against the same data). Oracle implicitly recognizes the shortcomings of the optimiser by letting programmers provide 'hints' in the form of special comments. Opinions are divided about hints. The fact is that, since they are comments, if you mistype anything Oracle will not do what you want - and will not tell you, so you will have to check that it actually takes the expected path. More worryingly, many developers belong to the cut-and-paste school of development and, since hints are typically specified after the SELECT keyword, they unconsciously tend to associate them with the select list and add what they see as minor variations - a change in the FROM clause, a couple of new conditions in the WHERE clause - without even thinking about whether the hint still is appropriate. As a rule of thumb, it no longer is and this is all the more true if the hint is more specific and complex. Another snag is that when tuning time comes, the presence of a complex hint makes you think that the query has already been carefully tuned already.

If this is your first venture into tuning, there are three hints that you can safely use.

The first two are /*+ FIRST_ROWS */ and /*+ ALL_ROWS */. Those hints are very generic, and therefore even when cut-and-pasted should not cause any problem. You should read the first one as OLTP and the second one as BATCH. FIRST_ROWS tells Oracle to find a way to return some data as fast as possible - which is perfect for a user nervously hitting his/her keyboard. ALL_ROWS just tells it to return everything and be as quick as it can about it but nobody is waiting. Sometimes, a long preparatory step is required to do everything efficiently. Typically, the case mentioned above, when we have no Atlas but an index-less map, illustrates the point rather well: first establishing the ordered list of Italian towns may well be, overall, the faster solution but if you have somebody pretty agitated beside you demanding an answer, you'll probably look at scene locations, then search the map for each place. Even if it's slower overall, you are more likely to hit an answer faster and resist the temptation of crushing your colleague's skull with the book.

The third hint, which we like a lot, is ORDERED; it tells Oracle to proceed by searching the tables in order of appearance in the FROM clause. If you have thought carefully about how you would do it by hand and are confident that the best way is first to search table A, then table B, etc., just write tables in this order in the FROM clause and use /*+ ORDERED */.

Our example here needs to be a little more complicated. Let's say that we also have a table of movies

     MOVIES
          TITLE
          YEAR
           DIRECTOR
           LEADING_ACTOR
           LEADING_ACTRESS

(and, in the library, a history of cinema), and that what we ultimately want is a list of directors of movies of Shakespeare plays which are located in Italy. We are going to assume that the title of the movie is the same as the title of the play. Movie buffs are going to retort that this is a bold assumption, and that for instance Orson Welles' 'Chimes at midnight' was a medley of several Shakesperean plays. Shut up, movie buffs. So, the title is usable as a key, and is likely, in a book as in a table, to be indexed.

So let's teletransport to the library, and see how we would manually answer the question.

Quite obviously, the Shakespeare book is still the place to start from. Then, we have the choice: either we first check whether part of the action takes place in Italy, and if it does, then we look whether a film was made from this play. Or we check whether the play was turned into a film, and if it does we check for the location.

What is best depends on, chiefly, selectivity. In the pratical word, we would say that it depends on how handy the history of cinema is compared with the Atlas, and which is the smallest and most easily searchable index - the best strategy is to start with this one. Let's say that the one volume Atlas is easier to handle than the 12 volume 'History of Cinema since the Lunière brothers'. Our search path will therefore be :

  1. Plays
  2. European cities
  3. Movies

And if we want to get fast the names of Zeffirelli, Mankiewicz, Brannagh and al., the proper query is likely to be something like:

SELECT /*+ ORDERED */ M.DIRECTOR
FROM PLAYS P,
     MOVIES M
WHERE EXISTS (SELECT TOWN
              FROM EUROPEAN_CITIES EC
              WHERE EC.TOWN = P.LOCATION
                AND EC.COUNTRY = 'ITALY')
AND P.TITLE = M.TITLE

the hint (not really necessary here) ensures that whatever happens the plays will be searched first, even if we add another condition saying 'and the leading actor is James Mason' which could make looking first for movies starring James Mason more attractive.

Just think how you would do things by hand, add the suitable hints when you have doubts about the way the optimiser will behave, and you will build a reputation as a SQL guru very fast.

previous SQL Net Primer Back to Intro. next page