Those of you who've been in the industry for a while will remember the smugness with which we treated the Y2K issues the rest of the world were experiencing. It didn't affect us, we stored our dates in internal numerical format. I'm sure we all have favourite anecdotes. Mine was being called into the Press Office of a major Government department to certificate their AREV tracking system. I sat down and gave the system a cursory glance. "Ermmm - it doesn't seem to use dates?". "No, it doesn't". "Well, here's your certificate"... turns out they couldn't get the budget for system tweaks but they could for Y2K compliance. I spent the rest of the day implementing their desired changes.
Anyway, this week wiped the smile off of our collective faces (if you'd ignored the advice in KB 42 thirty or so years ago - which to be honest you could be forgiven for) with the advent of our very own variant - we'll call it the 20100 bug, although it's not a bug, it's an unfortunate feature.
Users began reporting that date searches were failing for values after the 10th of January 2023. At first we couldn't see an obvious reason. We built a database containing date values going back to the previous century with five rows per date and indexed it. We wrote a test program and ran it
RUN ZZ_POP 10/01/2023
RUN ZZ_POP 11/01/2023
RUN ZZ_POP 20099
What we've fallen foul of is discussed in the KB reference earlier. Basically BTREE.EXTRACT takes what it is given and tries to ICONV it to use for the look up. If it can't ICONV it (20099 isn't a viable date) it uses the value passed. If it CAN ICONV it, is uses the ICONVed value. And guess what? 20100 ICONVs to the 2nd of January 2000 as you can see above.
Of course, this won't be the first time this has happened in the wild - for example looking for the 7th of December 1998 using 11299 would have returned the 1st of December 1999 and so on.
Note that the same issue will be experienced when using internal date formats with RLIST statements.
The solution? When calling BTREE.EXTRACT use EXTERNAL values as it will try and ICONV the data before using it.
Addendum
We've been asked to share the code we used to locate btree.extracts in client systems to enable a manual inspection to determine possible failure points. This is a rough and dirty hack which met our requirements. Feel free to tailor to your own requirements.
0002 /*
0003 Author AMcA
0004 Date Jan 2022
0005 Purpose Quick hack to help identify system usage of btree.extract
0006 Provided as is with no warranty
0007 */
0008
0009 Gosub initialise
0010 Gosub process
0011
0012 Return
0013
0014 initialise:
0015
0016 filesToCheck = "SYSPROCS,SYSREPOSEVENTS,SYSTABLES"
0017 columnsToCheck = ",,8"
0018
0019 loopCtr = Dcount(filesToCheck, ",")
0020 resultString = ""
0021
0022 Return
0023
0024 process:
0025
0026 For loopPtr = 1 To loopCtr
0027 file = Field( filesToCheck, ",", loopPtr )
0028 column = Field( columnsToCheck, ",", loopPtr )
0029
0030 resultString := file : \0D0A\
0031
0032 if file = "SYSTABLES" Then
0033 starting = "%"
0034 Gosub processSysTables
0035 End Else
0036 starting = "@"
0037 Gosub processRest
0038 End
0039
0040 resultString := \0D0A0D0A\
0041
0042 Next
0043
0044 Oswrite resultString On "ZZ_BE.TXT"
0045 Call Set_Property("CLIPBOARD", "TEXT", resultString )
0046
0047 Return
0048
0049 processRest:
0050 Open file To v Then
0051 Gosub processV
0052 End Else
0053 Call FSMsg()
0054 End
0055 Return
0056
0057 processV:
0058
0059 Select v
0060 eof = 0
0061 Loop
0062 Readnext id Else eof = 1
0063 Until eof Do
0064 If id[1, 1] = starting else
0065 Read row From v, id Then
0066 ptr = 1
0067 If column Then
0068 saveRow = row
0069 row = row< column >
0070 Convert @Vm To @Fm In row
0071 end
0072 lenRow = Len(row)
0073 lineNo = 0
0074 If lenRow > 0 then
0075 Loop
0076 nextline = row[ptr, @Fm]
0077 ptr = Col2() + 1
0078 lineno += 1
0079 there = IndexC( nextLine, "btree.extract(", 1)
0080 If there Then
0081 variable = Trim( nextLine[ there + 14, ","] )
0082 If variable[1, 1] = "'" Or variable[1, 1] = '"' Then
0083 // passing literals - check the line
0084 resultString := file : \09\ : id : \09\ : lineNo : \09\ : nextLine : \0D0A\
0085 End else
0086 // ok we're now going to work back through the code until we find = for our var
0087 tempLineNo = lineNo
0088 Loop
0089 tempLineNo -= 1
0090 line = row< tempLineNo >
0091 If Trimf( line )[1, 1] = "*" Then
0092 line = ""
0093 End
0094 Convert " " To "" In line
0095 line = " " : line
0096 Until IndexC( line, " " : variable : "=", 1)
0097 Until tempLineNo = 0
0098 Repeat
0099 If tempLineNo = 0 then
0100 resultString := file : \09\ : id : \09\ : lineNo : \09\ : "Not found " : row< LineNo > : \0D0A\
0101 End else
0102 resultString := file: \09\ : id : \09\ : lineNo : \09\ : row< tempLineNo > : \0D0A\
0103 end
0104 end
0105 End
0106 While ptr < lenRow
0107
0108 Repeat
0109 End
0110 End Else
0111 end
0112 Call send_info( file : " " : id )
0113 end
0114 repeat
0115 return
0116
0117 processSystables:
0118
0119 Open file To v Then
0120 Select v
0121 saveV = v
0122 eof = 0
0123 dictCtr = 0
0124
0125 Loop
0126 Readnext id Else eof = 1
0127 Until eof Do
0128 If ID[1, 5] = "DICT." Then
0129 Open id To v Then
0130
0131 file = id
0132
0133 Call push.Select( v1, v2, v3, v4 )
0134 Gosub processV
0135 Call pop.Select( v1, v2, v3, v4 )
0136 eof = 0
0137
0138 End Else
0139 * Call FSMsg()
0140 End
0141 v = saveV
0142 end
0143 Repeat
0144 End Else
0145 Call FSMsg()
0146 End
0147
0148 return
We ran into this problem yesterday. Could Revelation write a patch to BTREE.EXTRACT to account for 01/11/2023 issues?
ReplyDeleteVery helpful blog posting. Please note as well that the same issue applies to r/list statements - if you are used to using internal dates for search/selection criteria, you will have the same problem with incorrect results being returned. You MUST use externally formatted dates for date field comparisons in r/list as well - if you want to use internal date values instead, you should create a dictionary field that does not have a D conversion code and use that for your selection/search criteria...
ReplyDeleteIt's a tough call. A fix was made to AREV when the code base split to OI and AREV (as alluded to in the KB article) but you'd be altering 30 years of behaviour and who knows what that might break down the line? Rock and a hard place I'd say. I just wrote a quick program to scan our client's code, identify all btree.extract calls, parse out the variable and scan backwards to its assignation. Put all of that info into a spread sheet and quickly identified that I only had a few programs to manually check.
ReplyDeleteI'm of the very strong opinion that you should never use ICONV values in selects.
ReplyDeleteThere's no reason for it.
The conversion code is only executed once so any perceived speed hit is inconsequential.
Creating shadow dictionaries on indexed fields is just asking for trouble, or duplicating the index. There's just no reason for it, other simply being able to pass internal values inside your code.
I've posted about this many times in the past, and always ask this same question. Why should the system jump through hoops to allow you to pass an internal date, when no one would ever think of passing an internal value for an MD2 field or other conversion. It's only dates people want to pass internal. What's so special about dates?
It's just a bad idea all around, but I've love to hear the reasons for doing it this way.
Revelation will net be writing a patch for this. We do not have a solution that works 100% of the time.
ReplyDeleteThanks-
Mike Ruane, Revelation Software
Is this the same bug that grounded all aircraft and affected the NOTAM system this week perhaps?
ReplyDeleteI started out doing it as oconv. But then I had the thought that I was wasting my time oconving it all the time, when I could just send in the raw, internal date. So, I tried that and it worked, so since then I have kept on not oconving it. So my instinct was to oconv it, but then I thought it was a waste of time, and because iconv worked, I went with it.
ReplyDelete"I'm of the very strong opinion that you should never use ICONV values in selects.
ReplyDeleteThere's no reason for it."
It's an arbitrary decision. But what's important is that what is allowed is well understood and that the system enforces it.
"What's so special about dates?"
ReplyDeleteBecause keywords and dates are 99% of what people search for in rlist/btree extract. I've never searched on a monetary value except 0.
Probably what should have happened here is that rev should have added a patch that retained the current behaviour but created a log whenever an rlist failed to convert a date. Then people could have found out about it ages ago and fixed it ages ago. But where would rlist log the data to. OI doesn't really have a standardised place where it puts logs. It probably should though.
ReplyDeleteI'm going to assume these were all written by the same author.
ReplyDelete"Because keywords and dates are 99% of what people search for in rlist/btree extract. I've never searched on a monetary value except 0."
We must work in very different industries.
"It's an arbitrary decision. But what's important is that what is allowed is well understood and that the system enforces it. "
"Probably what should have happened here is that rev should have added a patch that retained the current behaviour but created a log whenever an rlist failed to convert a date. Then people could have found out about it ages ago and fixed it ages ago. But where would rlist log the data to. OI doesn't really have a standardised place where it puts logs. It probably should though."
I can't promise that it's well understood, but Revelation did release a technical bulletin about this in 1995, so around 27 years. This issue has been my bugbear for quite some time. There was a message thread on Rev's forum about this around a year ago, and I gave a pretty detailed explanation. Trust me, I've not been at all shy on talking about this. Not shy at all.
Basically, the system works like this:
The selection (or reduction, as it's called inside the system) criteria are parsed and broken down field by field. When the reduction values and dictionary fields are linked up, the system reads the dictionary and checks if there's an OCONV value. If there is, the system performs an ICONV using the OCONV setting and uses that value. If the ICONV value is null or failed, then the system uses the passed in value.
The system always performs the check. If I remember correctly, the system stores off the passed value, does the iconv on the original, and then resets if it's a failure. So, if anything, the system is actually slower because of the additional assignment.
I think people look at this from the wrong way around, as if it's all designed for code. It's not. It's really designed for RLIST. The code is just there to make the sentence processing easier. That's why it's RTP18.ENGLISH. It's meant to happen in English. It's all meant to be
SELECT SOMEFILE WITH DATE_FIELD = "12-25-2020"
and not
SELECT SOMEFILE WITH DATE_FIELD = 20093
or whatever the date is, because I'm not bothering to look it up because it doesn't mean anything to anything other than a simple way to track dates numerically to make it easier to do date math.
The point is, they have this down at a low level of the system, so they don't have to do anything to the data until it gets all the way down to the reduction engine. At that point, it does the conversion. Otherwise, there would need to be separate code for the three basic selection syntax engines in Rev; RLIST (or ENGLISH, the internal name), the REDUCE routine and BTREE.EXTRACT. Even then, all index selects go into the various BTREE routines, which are all called through BTREE.EXTRACT, where the conversion occurs. (It's actually a bit more complicated than that, but this is a comment, not a tech article, and it's not really my place to give out the recipe for the secret sauce). It should be obvious that it's an unintended feature of the system that it ever works at all. I thought it was anyway, and that's before I looked through the code when I was at Rev.
I don't know. You can code your systems as you want as you feel best. All I can tell you is how the system works and you can decide how efficient or inefficient you want your system to be.
"I think people look at this from the wrong way around, as if it's all designed for code. It's not. It's really designed for RLIST. The code is just there to make the sentence processing easier. That's why it's RTP18.ENGLISH. It's meant to happen in English."
ReplyDeleteI disagree. If you allow humans to enter text into an rlist, and you don't check that text in your code, then you become vulnerable to rlist injection (like sql injection). So, you should never allow a human to enter anything into an rlsit without checking it in your code. T he simplest way to check the date is safe is to convert it to internal format to see if the conversion fails.
I don't think anyone is advocating for passing un-verified criteria to Rlist from an untrusted source. The point here is that Rlist is a human-readable sentence parser, so it stands to reason that the data it's intended to work on is also human-readable. That fact that someone made it support internal formats as well as the human-readable ones is the problem, because now we have added a dose of ambiguity into the mix.
ReplyDeleteAllowing humans to enter "raw" Rlist statements is perfectly acceptable in the correct context - allowing users access to TCL for example is a valid feature of many systems, but those are from trusted sources such as a user's workstation and not a web-browser.