|
|||||||
We've got a couple of clients with UDH installations and this weekend we upgraded one of them to 4.6 - the latest and greatest. There's so much to like about this release - not least the improved documentation and improved replay of journalling. But sometimes the unforeseen happens and you hit a "Well nobody would ever do that would they?" moment - the moment when just as you ask yourself the question you see several possible scenarios unfold where not only might they do that, they'd have damn good reasons for doing it.
So after the weekend was over we returned to base and expected a tranquil week. Until Monday morning that is. We seemed to have two unrelated problems :-
Well fixing the RTP32 problem was going to be easy - We just consulted our on-line REVMEDIA for what RTP32 actually does. OK FIELDSTORE. So we'd just look for where that was going into an infinite loop. For details of how we'd actually achieve this why not come to our talk on using logs to troubleshoot Revelation problems at RevCon 2010? Regretfully it didn't seem to be as easy as that as we weren't actually making much use of FIELDSTORE. So we'd come back to that issue and concentrate on the Event Viewer issue. As we said the Event Viewer on the (Windows 2008 r2) UDH Server was full of thousands of similar error messages. All of which made no sense whatsoever to us. To make matters worse, scattered amongst these errors were more meaningful ones. Thankfully as ever the tech team at RevSoft swang into action and decided that we was seeing two problems. One they thought might be related to messages (we'll come back to that) and one that might indicate that we did in fact have data errors in some of our tables. Tracking down the data errors was made difficult by the fact that with nearly 10,000 errors to choose from it's difficult to search for a specific message - in this case the text "REV" to find the actual DOS files affected. Initial attempts to export from the Event Viewer failed as selecting Tab delimited text resulted in a small subset of the log being exported. After several failed attempts we dropped back to CSV and lo and behold, a 70K file became a 12 MB file. With this we were able to open in notepad and identify the files in question. One was FUBAR, a 10K LK and a 4 GB OV - on a dictionary file. That was soon rectified. The second was a null key which caused the UDH to report a LHReadNextKeyIdGroup error with an unfeasibly large error code. Now we had the physical errors sorted it was time to address the logical. Revelation suggested that what we might be seeing could be a problem with overly long messages. We were slightly at a loss as to why this might be until they pointed out that in AREV when you call MSG it firstly assumes that the first passed parameter is a row id and tries first to read it from MESSAGES and then SYSMESSAGES before deciding that it was a literal and attempting to display it. The reason this becomes a problem is that in 4.6 maximum row id lengths are enforced and generate errors if exceeded. We set about creating proof of this hypothesis and wrote a program to display an increasingly long message. and sure enough we dropped to the debugger in RTP32! Hurrah! And what was the magic number? 553 - one more than the maximum allowed key length of 552. So now we had the problem the solution became obvious (well talking it over with colleagues helped - "by the power of Sprezz"....) - we'd just write a shell program to intercept message calls and if they were longer than the longest message key we'd pass over a message map that told MSG that what was being passed was a literal so it didn't need to try reading it. So $MSG was copied to $MSG_RTI and our replacement was created. And as you can see it worked... We implemented and all seemed to be going swimmingly but just when we thought it was safe to go back into the water we started getting a new batch of errors which seemed to be coming from input validation not direct message calls. Fortunately now that we had our own message shell it was easier to check the program stack and find out the culprit... It seems that when the user was entering an invalid key there were a number of possible patterns the key could match. So IN.PATTERN was called to see which, if any it passed. On determining that the pattern match had failed PAT_CHECK was called to construct an "English type" failure message. This message was passed to MSG as a msgRec with msg being left blank. So once again MSG_RTI looked at the script and tried to read it as a message - failing as before. So once again all we had to do was check any passed script and if it was longer than our maximum row id length set the literal flag. Once these modifications were made all that was left were legitimate day to day errors. We present below the code we used which you are free to use at your own risk - Sprezzatura makes no warranties etc etc etc. If you want warranties we'll happily come and help you for our usual daily rate! Subroutine Msg( msg, msgRec, resp, args ) /* Author AMcA Date 14th April 2010 Purpose To provide a shell to normal system MSG as with the new UDH there are issues if it does not know the first parameter is a literal. Given this system will not be changing we have simply found the longest message in SYSMESSAGES and MESSAGES (29 characters) and told MSG_RTI that if it is longer than 35 it is a literal. If your system is still in active development then you might want to set the MAXIDLEN$ equate to something larger to avoid shooting yourself in the foot. */ If Assigned( msg ) Else msg = '' If Assigned( msgRec ) Else msgRec = "" If Assigned( resp ) Else resp = "" If Assigned( args ) Else args = "" Equ MAXIDLEN$ To 35 saveMsgRec = msgRec If Len( msg ) GE MAXIDLEN$ Or Len( msgRec< 11 >) GE MAXIDLEN$ Then msgRec< 13 > = 1 ; * Literal - do no reads End Else msgRec< 13 > = "" ; * Normal - check MESSAGES and SYSMESSAGES End Call Msg_RTI( msg, msgRec, resp, args ) msgRec = saveMsgRec Return Labels: UDH |
|||||||
| |||||||
0 Comments:
Post a Comment
Subscribe to Post Comments [Atom]
<< Home