Sprezzatura :: Making Databases Happen

Indexing in Opensight Part 2 - How index transactions get created

By Sprezz | Friday, 18 December 2009 12:23 | 0 Comments

In the last article we provided a generic overview of how indexing works. In this article we look at how the index transactions get into the bang table, and in the soon to be published followup article we look at how they get moved from being transaction rows to actually being meaningful data in the indexes.

Index transactions are created by a modified filing system called SI.MFS – the Secondary Indexing MFS. The following description assumes that this is the first time that an index has been added to a table.

When the developer adds an index to a column, SI.MFS is added to the table to ensure that the index transactions are created. The actual mechanism by which this happens is that a flag is set in the dictionary row for the column indicating what kind of index is being created along with its case sensitivity. The MFS responsible for looking after dictionaries, DICT.MFS, sees this flag and calls MAKE.INDEX to ensure that SI.MFS is added to the table. In addition it creates the ! table and populates the initial control information describing the index structures.

The workhorse in the creation of the index transactions is a program stored in the ! file with a row id of ! – generally referred to as the bang code in the bang file/table. This program code is structured as a symbolic dictionary item – so the object code is on attribute 8 of the row. The code itself looks at the row being written and compares it to the row on disk. If indexed columns have changed then it generates index transaction rows and puts them into the ! table.

We can find out a lot about how this ! code works by looking at the source code for it and fortunately for us Revelation have made it very easy for us to examine it. Under normal circumstances the ! code is generated and compiled and written away without preserving the source code BUT if we as developers create a blank row with an id of !! in the ! table, the system will see this as an instruction to preserve the source code in the !! row. So if you wanted to examine specific ! code you would just take the following simple steps

• Delete the ! code from the ! table
• Create a blank row with an id of !! in the ! table
• Reattach the ! table
• Edit the data table

These steps will leave the source code for the ! row in row !!.

We generated some sample code ourselves and used it to create the following flowchart which explains the process in detail :-

So explaining this flow chart in English...

Assuming the table is indexed, when the write occurs, SI.MFS is called with a WRITE opcode.

Now normally MFSs just do what they need to then call the next MFS in sequence. This carries on until the operation has been executed by the Base Filing System (BFS) and then the chain is reversed out of. SI.MFS doesn’t work this way. It doesn’t actually do the updating of the index transaction rows itself – rather is calls the ! code we discussed earlier. As it’s stored as a dictionary item it can’t call it directly – rather it invokes it using CALCULATEX – the system function for resolving a dictionary item.

The ! code looks to see what it has been asked to do and if it has been asked to do an update it compares the old values with the new values to see what has changed. Normally it’ll try and use a cached copy of the old record but if it isn’t there it’ll reread the row. If it determines that index updates are required, the code reads the 0 row in the ! table and appends the newly generated transactions to this row.

If this forces the length of row 0 over 2000 bytes then it creates a row called XTRANS and writes this out to the ! table. It then grabs the next free transaction row number from the row 0 header and writes out the new transaction row to that number. Finally it updates the row 0 with the new number blanking down the transactions that were there and it deletes the XTRANS row.

If the row 0 isn’t over 2000 characters the new transactions are simply written back to row 0
Finally the ! code attempts to update the relational indexes if any are present by invoking RELATER.

Note that on the flowchart above I've marked a few elements in red. These are areas where it would seem that the index code could be optimised for performance reasons.

When updating a row in which there had been index changes you would expect there to be a read of row 0, a write of row 0 and possibly the creation of a new index transaction row. In fact what seems to be happening is that the 2/3 i/os occur but in addition if Row 0 exceeds 2000 bytes in length then an XTRANS row is created, written then deleted, thereby potentially doubling the disk i/o required for the update. This could have significant performance implications on mass updates.

In addition even if the Relational Index updates correctly the index transactions are still left if the ! file to be reprocessed at a later date. This seems like a waste of resource.

Labels: bang code, bang file, index, transaction