Home page Home page Home page Home page
Pixel
Pixel Header R1 C1 Pixel
Pixel Header R2 C1 Pixel
Pixel Header R3 C1 Pixel
Pixel
By APK | Tuesday 31 January 2023 16:04 | 0 Comments

Recently at a client site, we encountered an error that was a little more baffling than normal.  Under very specific circumstances,    RLIST would bail with no results.  Eventually, we had a duplicatable test sentence which was basically

    SELECT TABLE BY CALC_COL_1 WITH REAL_FIELD_1 = ‘XXX’ AND WITH CALC_COL_2 = ‘YYY’

When we removed the BY clause (CALC_COL_1) our sentence ran to completion.   This put the problem squarely on CALC_COL_1, which is where we started debugging.  

We changed CALC_COL_1 to a simple “@ANS=1” and ran the sentence again.  This worked correctly.  To prove to ourselves that CALC_COL_1 was the problem, we put back the original code and changed our sentence to 

    SELECT TABLE BY CALC_COL_1 WITH REAL_FIELD_1 = ‘XXX’ AND WITH CALC_COL_3= ‘ZZZ’

This also worked.   

Now we’re starting to get a little confused.  Modifying CALC_COL_1 solved the problem.  Replacing CALC_COL_2 also solved the problem.  It has to be some sort of interaction with the calculated columns, because there’s no obvious consistency lining up the errors with the sentence structure.

It’s time to break out the profile log and see if something jumps out, and it did.  Something appeared off with EXTERNAL.SORT.  It seems like it executed a few times, and then everything stopped.  We dusted off our copy of PROCMON.EXE 1 and ran the process again.  PROCMON showed that the sort file was being opened, but that it was not being closed.  It also showed a lot of file access  We’re now convinced this has something to do with sorting in general, and not with the specific field being sorted.

After trying well over a dozen different variations on the RLIST sentences, we finally worked out that we needed two things to happen for the error to occur.  The first was that we had to have CALC_COL_1 as the sort column.  The second was that the WITH clauses needed to return enough keys to force the system into using EXTERNAL.SORT instead of a simple in-memory sort.

Normally, when OpenInsight starts behaving very oddly, the culprit is usually a function that’s not clearing set_status() before calling get_status.  When that happens, the get_status() call could be returned an error from 10 programs back.  We reworked the calculated columns to ensure we clear set_status, and feeling very proud of ourselves, sat back and waited to bask in the glory offered to us by our client.  Obviously, the report failed again.

Sitting back, we start wondering if maybe there’s a small part of the RLIST code that’s still thinks it’s Advanced Revelation and it’s actually checking the pseudo STATUS() variable.  So, we reworked the calculated columns to ensure we clear set_status and status(), and feeling very proud of ourselves, sat back and waited to bask in the glory offered to us by our client.  Obviously, the report failed yet again.
 
Surely there can’t be three ways to set an error condition in OpenInsight?  It turns out that there is.  We suddenly remembered @FILE.ERROR.  So, we rework the calculated columns to ensure we clear set_status and status() and @FILE.ERROR, and feeling very proud of ourselves, sat back and waited to bask in the glory offered to us by our client.  And this time there was much rejoicing and basking.

With the solution in hand, we set out to find out exactly why this happened and where the system was registering the failure.  What we found was the calculated field was doing a lot of file access and would attempt to access records that didn’t exist.  This was what was setting @FILE.ERROR.  But, we hit a two-fer, as they say.  Some of the randomness had to do with the amount of data returned.  If enough sort data was returned the system would call EXTERNAL.SORT to sort the data.  EXTERNAL.SORT writes to temporary OS files, and the OS file commands use @FILE.ERROR to check for errors.  EXTERNAL.SORT would execute an OS operation, check @FILE.ERROR, find the pre-existing error, and abort the process.  But, this would only happen if the very last row processed set @FILE.ERROR.  Otherwise RTP57 would clear any prior @FILE.ERROR settings, and @FILE.ERROR would be null during the EXTERNAL.SORT call. 

1.  PROCMON and other useful tools are available as part of the Sysinternals suite of development tools, which no programmer should be without.



 

 

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]



<< Home

Pixel
Pixel Footer R1 C1 Pixel
Pixel
Pixel
Pixel