Sprezzatura :: Making Databases Happen: Y2K Redux - Indexing Issues With Dates OR 20100

Thursday, 12 January 2023

Y2K Redux - Indexing Issues With Dates OR 20100 - A Search Oddity

Those of you who've been in the industry for a while will remember the smugness with which we treated the Y2K issues the rest of the world were experiencing. It didn't affect us, we stored our dates in internal numerical format. I'm sure we all have favourite anecdotes. Mine was being called into the Press Office of a major Government department to certificate their AREV tracking system. I sat down and gave the system a cursory glance. "Ermmm - it doesn't seem to use dates?". "No, it doesn't". "Well, here's your certificate"... turns out they couldn't get the budget for system tweaks but they could for Y2K compliance. I spent the rest of the day implementing their desired changes.

Anyway, this week wiped the smile off of our collective faces (if you'd ignored the advice in KB 42 thirty or so years ago - which to be honest you could be forgiven for) with the advent of our very own variant - we'll call it the 20100 bug, although it's not a bug, it's an unfortunate feature.

Users began reporting that date searches were failing for values after the 10th of January 2023. At first we couldn't see an obvious reason. We built a database containing date values going back to the previous century with five rows per date and indexed it. We wrote a test program and ran it

RUN ZZ_POP 10/01/2023

RUN ZZ_POP 11/01/2023

RUN ZZ_POP 12/02/2023

But then a throwaway remark from Martyn about pivot dates being discussed in the MV community led to a realisation. What if we used internal dates?

RUN ZZ_POP 20099

RUN ZZ_POP 20100

BINGO!

What we've fallen foul of is discussed in the KB reference earlier. Basically BTREE.EXTRACT takes what it is given and tries to ICONV it to use for the look up. If it can't ICONV it (20099 isn't a viable date) it uses the value passed. If it CAN ICONV it, is uses the ICONVed value. And guess what? 20100 ICONVs to the 2nd of January 2000 as you can see above.

Of course, this won't be the first time this has happened in the wild - for example looking for the 7th of December 1998 using 11299 would have returned the 1st of December 1999 and so on.

Note that the same issue will be experienced when using internal date formats with RLIST statements.

The solution? When calling BTREE.EXTRACT use EXTERNAL values as it will try and ICONV the data before using it.

Addendum

We've been asked to share the code we used to locate btree.extracts in client systems to enable a manual inspection to determine possible failure points. This is a rough and dirty hack which met our requirements. Feel free to tailor to your own requirements.

0001  Subroutine zz_find_Btree_Extract( void )
0002  /*
0003     Author      AMcA
0004     Date        Jan 2022
0005     Purpose     Quick hack to help identify system usage of btree.extract
0006                 Provided as is with no warranty
0007  */
0008    
0009     Gosub initialise
0010     Gosub process
0011     
0012  Return
0013  
0014  initialise:
0015  
0016     filesToCheck   = "SYSPROCS,SYSREPOSEVENTS,SYSTABLES"
0017     columnsToCheck = ",,8"
0018  
0019     loopCtr        = Dcount(filesToCheck, ",")
0020     resultString   = ""
0021     
0022  Return
0023  
0024  process:
0025  
0026     For loopPtr    = 1 To loopCtr
0027        file        = Field( filesToCheck, ",", loopPtr )
0028        column      = Field( columnsToCheck, ",", loopPtr )
0029  
0030        resultString := file : \0D0A\
0031  
0032        if file = "SYSTABLES" Then
0033           starting = "%"
0034           Gosub processSysTables
0035        End Else
0036           starting = "@"
0037           Gosub processRest
0038        End
0039     
0040        resultString := \0D0A0D0A\
0041  
0042     Next
0043  
0044     Oswrite resultString On "ZZ_BE.TXT"
0045     Call Set_Property("CLIPBOARD", "TEXT", resultString )
0046     
0047  Return
0048  
0049  processRest:
0050     Open file To v Then
0051        Gosub processV
0052     End Else
0053        Call FSMsg()
0054     End
0055  Return
0056  
0057  processV:
0058  
0059     Select v
0060     eof = 0
0061     Loop
0062        Readnext id Else eof = 1
0063     Until eof Do
0064        If id[1, 1] = starting else
0065           Read row From v, id Then
0066              ptr = 1
0067              If column Then
0068                 saveRow = row
0069                 row = row< column >
0070                 Convert @Vm To @Fm In row
0071              end
0072              lenRow = Len(row)
0073              lineNo = 0
0074              If lenRow > 0 then
0075                 Loop
0076                    nextline = row[ptr, @Fm]
0077                    ptr = Col2() + 1
0078                    lineno += 1
0079                    there = IndexC( nextLine, "btree.extract(", 1)
0080                    If there Then
0081                       variable = Trim( nextLine[ there + 14, ","] )
0082                       If variable[1, 1] = "'" Or variable[1, 1] = '"' Then
0083                          // passing literals - check the line
0084                          resultString := file : \09\ : id : \09\ : lineNo : \09\ : nextLine : \0D0A\
0085                       End else
0086                          // ok we're now going to work back through the code until we find = for our var
0087                          tempLineNo = lineNo
0088                          Loop
0089                             tempLineNo -= 1
0090                             line = row< tempLineNo >
0091                             If Trimf( line )[1, 1] = "*" Then
0092                                line = ""
0093                             End
0094                             Convert " " To "" In line
0095                             line  = " " : line
0096                          Until IndexC( line, " " : variable : "=", 1)
0097                          Until tempLineNo = 0
0098                          Repeat
0099                          If tempLineNo = 0 then
0100                             resultString := file : \09\ : id : \09\ : lineNo : \09\ : "Not found " : row< LineNo > : \0D0A\
0101                          End else
0102                             resultString := file: \09\ : id : \09\ : lineNo : \09\ : row< tempLineNo > : \0D0A\
0103                          end
0104                       end
0105                    End
0106                 While ptr < lenRow
0107                   
0108                 Repeat
0109              End
0110           End Else
0111           end
0112           Call send_info( file : " " : id )
0113        end
0114     repeat
0115  return
0116  
0117  processSystables:
0118  
0119     Open file To v Then
0120        Select v
0121        saveV = v
0122        eof = 0
0123        dictCtr = 0 
0124        
0125        Loop
0126           Readnext id Else eof = 1
0127        Until eof Do
0128           If ID[1, 5] = "DICT." Then
0129              Open id To v Then
0130   
0131                 file = id
0132                 
0133                 Call push.Select( v1, v2, v3, v4 )
0134                 Gosub processV
0135                 Call pop.Select( v1, v2, v3, v4 )
0136                 eof = 0
0137  
0138              End Else
0139              *   Call FSMsg()
0140              End
0141              v = saveV
0142           end
0143        Repeat
0144     End Else
0145        Call FSMsg()
0146     End
0147  
0148  return

13 comments:

Anonymous12 January 2023 at 15:06
We ran into this problem yesterday. Could Revelation write a patch to BTREE.EXTRACT to account for 01/11/2023 issues?
ReplyDelete
Replies
Anonymous12 January 2023 at 15:27
Very helpful blog posting. Please note as well that the same issue applies to r/list statements - if you are used to using internal dates for search/selection criteria, you will have the same problem with incorrect results being returned. You MUST use externally formatted dates for date field comparisons in r/list as well - if you want to use internal date values instead, you should create a dictionary field that does not have a D conversion code and use that for your selection/search criteria...
ReplyDelete
Replies
Sprezz12 January 2023 at 15:29
It's a tough call. A fix was made to AREV when the code base split to OI and AREV (as alluded to in the KB article) but you'd be altering 30 years of behaviour and who knows what that might break down the line? Rock and a hard place I'd say. I just wrote a quick program to scan our client's code, identify all btree.extract calls, parse out the variable and scan backwards to its assignation. Put all of that info into a spread sheet and quickly identified that I only had a few programs to manually check.
ReplyDelete
Replies
Aaron K (Sprezz)12 January 2023 at 16:23
I'm of the very strong opinion that you should never use ICONV values in selects.
There's no reason for it.
The conversion code is only executed once so any perceived speed hit is inconsequential.
Creating shadow dictionaries on indexed fields is just asking for trouble, or duplicating the index. There's just no reason for it, other simply being able to pass internal values inside your code.

I've posted about this many times in the past, and always ask this same question. Why should the system jump through hoops to allow you to pass an internal date, when no one would ever think of passing an internal value for an MD2 field or other conversion. It's only dates people want to pass internal. What's so special about dates?

It's just a bad idea all around, but I've love to hear the reasons for doing it this way.
ReplyDelete
Replies
Mike Ruane12 January 2023 at 20:05
Revelation will net be writing a patch for this. We do not have a solution that works 100% of the time.
Thanks-
Mike Ruane, Revelation Software
ReplyDelete
Replies
Anonymous12 January 2023 at 21:29
Is this the same bug that grounded all aircraft and affected the NOTAM system this week perhaps?
ReplyDelete
Replies
josh12 January 2023 at 23:11
I started out doing it as oconv. But then I had the thought that I was wasting my time oconving it all the time, when I could just send in the raw, internal date. So, I tried that and it worked, so since then I have kept on not oconving it. So my instinct was to oconv it, but then I thought it was a waste of time, and because iconv worked, I went with it.
ReplyDelete
Replies
Anonymous16 January 2023 at 13:02
"I'm of the very strong opinion that you should never use ICONV values in selects.
There's no reason for it."

It's an arbitrary decision. But what's important is that what is allowed is well understood and that the system enforces it.

ReplyDelete
Replies
Anonymous16 January 2023 at 13:08
"What's so special about dates?"

Because keywords and dates are 99% of what people search for in rlist/btree extract. I've never searched on a monetary value except 0.
ReplyDelete
Replies
Anonymous16 January 2023 at 22:59
Probably what should have happened here is that rev should have added a patch that retained the current behaviour but created a log whenever an rlist failed to convert a date. Then people could have found out about it ages ago and fixed it ages ago. But where would rlist log the data to. OI doesn't really have a standardised place where it puts logs. It probably should though.
ReplyDelete
Replies
APK17 January 2023 at 15:34
I'm going to assume these were all written by the same author.

"Because keywords and dates are 99% of what people search for in rlist/btree extract. I've never searched on a monetary value except 0."

We must work in very different industries.

"It's an arbitrary decision. But what's important is that what is allowed is well understood and that the system enforces it. "
"Probably what should have happened here is that rev should have added a patch that retained the current behaviour but created a log whenever an rlist failed to convert a date. Then people could have found out about it ages ago and fixed it ages ago. But where would rlist log the data to. OI doesn't really have a standardised place where it puts logs. It probably should though."

I can't promise that it's well understood, but Revelation did release a technical bulletin about this in 1995, so around 27 years. This issue has been my bugbear for quite some time. There was a message thread on Rev's forum about this around a year ago, and I gave a pretty detailed explanation. Trust me, I've not been at all shy on talking about this. Not shy at all.

Basically, the system works like this:

The selection (or reduction, as it's called inside the system) criteria are parsed and broken down field by field. When the reduction values and dictionary fields are linked up, the system reads the dictionary and checks if there's an OCONV value. If there is, the system performs an ICONV using the OCONV setting and uses that value. If the ICONV value is null or failed, then the system uses the passed in value.

The system always performs the check. If I remember correctly, the system stores off the passed value, does the iconv on the original, and then resets if it's a failure. So, if anything, the system is actually slower because of the additional assignment.

I think people look at this from the wrong way around, as if it's all designed for code. It's not. It's really designed for RLIST. The code is just there to make the sentence processing easier. That's why it's RTP18.ENGLISH. It's meant to happen in English. It's all meant to be

SELECT SOMEFILE WITH DATE_FIELD = "12-25-2020"

and not

SELECT SOMEFILE WITH DATE_FIELD = 20093

or whatever the date is, because I'm not bothering to look it up because it doesn't mean anything to anything other than a simple way to track dates numerically to make it easier to do date math.

The point is, they have this down at a low level of the system, so they don't have to do anything to the data until it gets all the way down to the reduction engine. At that point, it does the conversion. Otherwise, there would need to be separate code for the three basic selection syntax engines in Rev; RLIST (or ENGLISH, the internal name), the REDUCE routine and BTREE.EXTRACT. Even then, all index selects go into the various BTREE routines, which are all called through BTREE.EXTRACT, where the conversion occurs. (It's actually a bit more complicated than that, but this is a comment, not a tech article, and it's not really my place to give out the recipe for the secret sauce). It should be obvious that it's an unintended feature of the system that it ever works at all. I thought it was anyway, and that's before I looked through the code when I was at Rev.

I don't know. You can code your systems as you want as you feel best. All I can tell you is how the system works and you can decide how efficient or inefficient you want your system to be.

ReplyDelete
Replies
Anonymous17 January 2023 at 22:41
"I think people look at this from the wrong way around, as if it's all designed for code. It's not. It's really designed for RLIST. The code is just there to make the sentence processing easier. That's why it's RTP18.ENGLISH. It's meant to happen in English."

I disagree. If you allow humans to enter text into an rlist, and you don't check that text in your code, then you become vulnerable to rlist injection (like sql injection). So, you should never allow a human to enter anything into an rlsit without checking it in your code. T he simplest way to check the date is safe is to convert it to internal format to see if the conversion fails.
ReplyDelete
Replies
Captain C18 January 2023 at 10:09
I don't think anyone is advocating for passing un-verified criteria to Rlist from an untrusted source. The point here is that Rlist is a human-readable sentence parser, so it stands to reason that the data it's intended to work on is also human-readable. That fact that someone made it support internal formats as well as the human-readable ones is the problem, because now we have added a dose of ambiguity into the mix.

Allowing humans to enter "raw" Rlist statements is perfectly acceptable in the correct context - allowing users access to TCL for example is a valid feature of many systems, but those are from trusted sources such as a user's workstation and not a web-browser.
ReplyDelete
Replies

Pages

Thursday, 12 January 2023

Y2K Redux - Indexing Issues With Dates OR 20100 - A Search Oddity

Addendum

13 comments: