Home page Home page Home page Home page
Pixel Header R1 C1 Pixel
Pixel Header R2 C1 Pixel
Pixel Header R3 C1 Pixel
By APK | Tuesday 26 April 2016 15:28 | 0 Comments
One of our largest customers decided to use the Good Friday holiday to handle some server maintenance.  While normally a 24/7 shop, this is one of the few days of the year that not only are they on a skeleton crew, it's a skeleton of a skeleton crew and they generally receive no calls on this day.

After performing and verifying the backups, upgrading the OS with all the relevant patches, the server was rebooted.  A quick check, and the LH service was not running, which seemed a bit odd mostly because it's never failed to start before.  Starting the service immediately displayed an error stating that the service did not start in a timely manner.  While none of us was completely sure what's considered "a timely manner", we were all in agreement that "a timely manner" was not "immediately".

At this point, we thought it's time to examine the Window's event log for some clues.  However, this proved to be a problem because the event viewer wouldn't load.  This did not seem like a good sign, so we rebooted the server, but there were no changes.  Looking deeper into the system, we found that the event viewer service wasn't starting either.  Now we think there's something horribly wrong with the server and something went wrong with the patch installation.  While some people were checking on what was installed and what side-effects there might have been, others looked into why the event viewer wouldn't work.

Eventually, we worked out that the event viewer wouldn't start because the subdirectory storing the log files was read-only.  Once we set those files to read-write, the event viewer service started.  Once the event viewer started, the LH Service also started.

As near as we can work out, the LH Service couldn't update the event log and immediately failed.  This started an interesting discussion with the client.  How would we have coded for this particular error.  Revelation is correct in that the LH Service should halt when an OS feature so basic as event logging is unavailable.  This points to a fundamental error on the server which should be handled immediately.  There isn't a very good way to log the error.  Normally the error would go into the event log files, but the error was that it couldn't write to the event logs.  The LH service managed to avoid the perpetual loop of writing to the event log, generating an error, which writes to the event log, generating another error.

The most interesting thing in all this was that Windows never bothered to inform us of the error, and all (most?) Windows services loaded.  We really have no idea what started and what failed.  Had the LH service not acted correctly, a major production server could have run for weeks in a potentially inoperable state.

By Sprezz | 15:25 | 3 Comments
This is going to be one of those blog posts that will either have you saying "Well d'uh everybody knows that" or "Wow what a neat trick". I know that I fell into the latter camp which is why I'm blogging it.

Aaron and I are working together on a new tool for network administrators and it has been a fun ride. Recently we had cause to have Aaron TeamViewer into my machine to help debug a specific routine. So I watched as he took over and waited for him to fire up the System Editor, open the program, insert a DEBUG, compile and run. This is how I've traditionally done it and of course you have to be really careful to remember to remove the DEBUG before shipping the code to anybody. (Or run your release through a release checker that warns of the DEBUG opcode in object code). Instead he performed the neat trick documented below. When I expressed admiration he was incredulous. "I've been doing it this way since OI 2.0" quoth he.

So allow me to present an illustrated guide to debugging without DEBUG statements.

Firstly open the engine window

and click on the Debugger button. Now give focus back to the System Editor or any active OI screen in a way that causes it to trigger an event. The debugger will be invoked.

Now from within the debugger, from the File Menu choose "Open Stored Procedure" and select the routine you wish to debug

Scroll through the code and select your break point in the usual way by double clicking on a line of code

then click on the Run button or press F5 to continue.

The system will return to "Normal". Now invoke the program you wish to debug (for simplicity I will just illustrate by running from the System Monitor)

et voilĂ  - the program breaks at the nominated break point.

This is so cool - it's slightly faster than inserting a DEBUG and you don't run the risk of forgetting to remove the statement!

By Sprezz | 15:23 | 0 Comments
Of late at Sprezz Towers we've been all about taking apart Linear Hash frame headers to diagnose some, frankly bizarre, issues being experienced by one of our clients. Our utilities were working fine on "old style" Linear Hash files where the maximum frame size was 64K and the maximum file size was 4GB but on the newer UD3+ files (or "type 3" as they're referred to) we just didn't seem to be getting the consistent results we were expecting.

As an example consider this section of code for examining the "Alpha Space Usage" - the total bytes of data stored in the Primary frames :-

  lo_alpha  = file[alphaPtr$, 4]
  hi_alpha  = file[alphaPtrHi$, 4]

  If hiBitSet Then lo_alpha := hi_alpha

  value = lo_alpha
  Gosub Convert
  lo_alpha = value

  hlo_Alpha = Oconv( lo_alpha, "MX")

So this snippet takes the 8 bytes that comprise the Alpha value and converts them into a very large integer.

When the Alpha was a value of 3,349,830,462 the MX conversion returned C7AA5B3E which is indeed correct. The issue arose when the Alpha had increased to 7,644,806,759, then the MX conversion returned an erroneous value and got the hex conversion wrong.

Those of you with knowledge of typed languages can probably see where this is going! As OpenInsight is a 32 bit app, the largest value that can be represented in an unsigned integer is 4,294,967,295, which the 3,349,830,462 is less than - 7,644,806,759 however is over this limit so the MX conversion failed.

Fortunately for us the solution was simple (once we realised that something needed solving. The MX conversion works off an integer, but the HEX conversion works off an ASCII string. So all we had to do was modify our code to use this, and convert the actual string value not the number value as below :-

  lo_alpha  = file[alphaPtr$, 4]
  hi_alpha  = file[alphaPtrHi$, 4]

  If hiBitSet Then lo_alpha := hi_alpha
  hlo_Alpha = Oconv( lo_alpha, "HEX")

  value = lo_alpha
  Gosub Convert
  lo_alpha = value

and all was once again right with the world with the HEX conversion returning 677EAAC701000000.

Finally we were able to report on the corrupted file accurately.

Frame length 1,634,013,184 00 10 65 61
Modulo 2,342,159 0F BD 23 00
Alpha 7,644,806,759 67 7E AA C7 01 00 00 00
Threshold 80
Sizelock 0
Row Count 20,379,590 C6 F7 36 01
Alpha% .00

Header 32 0 0 0 0 0 0 0 0 15 189 35 0 0 16 103 126 170 199 204 0 0 198 247 54 1 101 97 1 0 0 0
Header 20 0 0 0 0 0 0 0 0 F BD 23 0 0 10 67 7E AA C7 CC 0 0 C6 F7 36 1 65 61 1 0 0 0

It should be noted that the code snippet above is not UTF-8 safe. We knew that the data we were working on was purely ANSI and had customised one of our routines accordingly. The square bracket operators, CHAR and SEQ are generally not safe when processing binary data so unless you're 100% sure you're in ANSI mode avoid them. In this case we should have been using GetBinaryValue.
Pixel Footer R1 C1 Pixel