Sprezzatura :: Making Databases Happen

By Sprezz | Wednesday 28 March 2018 12:09 | 0 Comments

We recently had the pleasure of working with one of our favourite sorts of client - the type who are enthusiastic about their speciality field and who have trained themselves to program - to allow themselves to write MOST of the system they need to service their enthusiasm. Every now and again, they call us in to do some of the heavy lifting for them.

This particular time the client had an issue with a multi page form. Under most circumstances the data displayed correctly but under - as yet undetermined circumstances - the data display would become corrupt. Data would appear in edit fields that belonged in other edit fields and existing data would be overwritten by seemingly unrelated data.

Approaching the issue logically, we installed break points to ensure that what was read from disk was correct and sure enough @Record and the ATRECORD and RECORD properties all contained what they should. It was only when a certain field was filled in that the data became corrupt. Through much testing we were able to determine which field had to be filled in to cause the corruption but it still made no sense. To illustrate using the AVERY_LABELS table (the client's data is commercially sensitive so we can't use screenshots from their system). In the example below the Cross Reference column references another row in the AVERY_LABELS table and extracts the product description.

So the above is what we expected to see. The below is more like what we were seeing - data that SHOULD have been there ("Removable Laser..." in Cross Ref Product Desc) wasn't - being replaced with the contents of the field above. Data was missing ("Label" in Product Type) and data was incorrect ("6464" in Template Family).

Painstakingly we played with entering data into the client screen - all 6 pages of it - until the error manifested itself when we entered data into a prompt that provided the key for a symbolic to xlate off. We turned on engine logging to see if some event was being triggered that would explain the corruption, but there was nothing other than standard engine flow. On a hunch we went to the system monitor and typed the equivalent of

LIST 100 AVERY_LABELS PRODUCT_DESC XREF_DESC SEQUENCE and were rewarded with something like

which was NOT what we would have expected to see. So we decided that the XREF_DESC formula must have been to blame in some way. Looking at it, it SEEMED to be straightforward enough :-

Until we looked REALLY closely... the column is called PRODUCT_DESC not PRODUCT_DESCS$. When we changed the spelling to the correct one - AVERY_LABELS$PRODUCT_DESC$ everything was right with the world again - the screen displayed correctly and the LIST statement produced

So - what went wrong? To understand this we need to be familiar with how the Window processor actually works with the data from @Record.

A major change in OI over AREV is that when the system WRITEs the current screen contents to disk, it does not simply write the contents of @Record onto disk as in AREV, rather it takes the current row from disk and updates ONLY those columns that feature in the current Window. THEN it writes the row back. This allows the user to have multiple windows open updating separate sections of the same current row.

To do this the Window processor has to maintain its own pseudo @Record, just containing the contents of the current screen. To illustrate using the pseudo code for a READ and populate operation

pseudoRecord = ""

for loopPtr = 1 to controlCount

  control = controlList< loopPtr >

  dictColumn = get_Property( control , "COLUMN")

  pseudoRecord := xlate( dataTable, @Id, dictColumn, "X") : @Fm

next

pseudoRecord[-1, 1] = ""

for loopPtr = 1 to controlCount

  control = controlList< loopPtr >

  call set_Property( control, "DEPFROP", pseudoRecord< loopPtr >)

next

In the original dictionary item the XLATE destination column was incorrectly spelt, so at compile time in the Table Builder the equate would NOT have been replaced with a column number, but with a blank. The Table Builder compiler before OI 10 does not include the error checking that the System Editor has. So the XREF_DESC formula was resolving to

@Ans = Xlate("AVERY_LABELS", {CROSSREF}, "", "X")

which returns the entire row, not just an individual column. So instead of pseudoRecord containing

1 : @FM : 8 : @FM : 3 : @FM : Xlated 8 : @FM : 2 : @FM : 17 : @FM : 10 : @FM : 16

it contained

1 : @FM : 8 : @FM : 3 : @FM : Xlated 1 - 43 : @FM : 2 : @FM : 17 : @FM : 10 : @FM : 16

thereby pushing the subsequent SET_PROPERTY DEFPROP out of kilter.

Obvious with hindsight but not the easiest bug in the world to track down on a system written by someone else!

The more things change, the more they don't stay the same

By APK | Wednesday 7 March 2018 16:13 | 1 Comments

As we all know, computers are actually pretty dumb. They can't count past 1. On the other hand, they're very loyal. They do exactly what they're told to do. When you're debugging code, that's great. Enter in the same set of data, step through the debugger, and it's guaranteed to always work the same way ever time. Except for when it doesn't. We've all experienced the frustration of finding something wrong during testing, but no matter what you do, when you step through the debugger, it doesn't happen.

Sure, we usually chalk this up to something with the message queue or interaction with another program that's being interfered with because the debugger now has focus. But, sometimes, we know we're just fooling ourselves, and we start to wonder if maybe this machine has a mind of it's own.

Recently, we here at Sprezz Towers found ourselves with a machine where the only obvious conclusion was that the system had become sentient. Before welcoming our new silicon overlords, we decided to give things a second look. And a third. Probably a fourth as well.

The problem was reasonably simple. While attempting to calculate a shipping rate, we were receiving an on-screen message that the shipping rate could not be found. We expected that because we were testing how the system responded to missing shipping rates. This had been working for years before the modification we were testing. And by years, I mean decades. We don't know when the code was originally written because there's no date in the header, but there's a comment dated June 1993. Yes, 25 years ago. The most recent comment regarding a change is from 2001. The code has been recompiled since in the move from ARev to OI, but it's effectively unchanged for 17 years.

The change itself was relatively simple. We added a set of date ranges to indicate when the shipping rate profile was active. For the overly technical readers, basically, we added three MV fields, HIST_START_DATE, HIST_END_DATE, HIST_ACTIVE. The system is full of this type of code. There's a routine that's been in use for 10 years:

validRange = checkDateRange( checkDate, startDates, endDates, activeFlags )

which just returns TRUE$ or FALSE$ indicating if the passed checkDate is inside a valid date range. This isn't rocket science nor brain surgery. And it was working fine. The system displayed the invalid rate message when the date was invalid, and didn't display the message when the date was valid. That wasn't the problem. With one specific test case, the system was displaying a failure message we weren't expecting. The error was related to the customer's configuration, and not the changed code. It was a valid error, and the correct error was displaying on screen. However, the wrong error text was being placed into the error log.

Setting a debug break point just before the shipping rate code, we stepped through the code and verified that all the error flags are being set correctly. And they are. Everything is working as expected. Including logging the correct error text.

Going back to our opening premise, computers are dumb. There are no UI threads or messages interfering here. The error message bubbles through the system and is returned at the end of the process. But we're clearly getting different results when stepping through the debugger as opposed to just running the code. It's time to start thinking critically and logically about this and we start looking over all the code carefully.

When working with systems like this (RevG systems converted to ARev, then to OI), there are times where you feel a bit like you're practicing a programming version of anthropology. You look at the code and how the style has changed and how variables are named and you can see the evolution of Rev coding styles and standards. Generally, you can pinpoint about when this code was originally written, just on the use of CASE v IF/THEN statements or how things are capitalized.

As mentioned earlier, this code is old. Even though the earliest comment date is 1993, we know the code is much older by the style and function. It's been around since the origin of the system, a RevG application from 1982. And as RevG code it uses RevG style syntax. And RevG style calling parameters. And RevG style error handling.

Ahh..error handling. I could write a whole new post on inconsistencies in error handling, and whether FALSE$ means success or failure as we move from system to system (and programming epochs). But in this case, it wasn't how the status was returned. It was how the status was checked. Most of us probably forget now, because OpenInsight (in it's modern form) is 25 years old, but in the ARev days, we didn't have get_status() and set_status(). We had status(). And we had...well, that was pretty much it. There was @FILE.ERROR, but that always seemed too embedded with LH and the filing system to piggy-back on for your own use.

This code used status(). And it used get_status() and set_status() as well. Needless to say, it was a complicated mess. But, status() was the key and the source of our error.

It turns out, that when stepping through the debugger, when you return from a subroutine or function call, the contents of status() is cleared. During normal execution, the system found a value in status(), and jumped into the error routine related to a failed shipping rate call. However, stepping through the debugger, status()was set to 0, and the system continued on. Here it checked get_status() for additional errors, and we received the warning about the mis-configured client.

The OI programming documentation says

Be aware, however, that between the time you set the value of Status() and next check it, some other procedure might have set its own value.

This is not related to the Get_Status and Set_Status functions.

While we expect this when calling multiple functions and routines, we didn't expect it to be cleared by the debugger when popping back on the return stack.

The lesson to be learned here is just because your old ARev code recompiles in OI, doesn't mean it's still working. ARev is ARev and OI is OI, and while they might be very similar and compatible, they're not the same thing. Something to keep in mind for those hoping to just ARev/32 their way into OpenInsight. It's not all just cut and paste.