Serious Security: The decade-ending “Y2K bug” that wasn’t

A curious Naked Security reader alerted us to what they thought might be a “Y2K-like bug” in Java’s date handling.

The cause of the alarm was a Twitter thread that started with a headline tweet saying, “PSA: TIS THE SEASON TO CHECK YOUR FORMATTERS, PEOPLE.”

As @NmVAson points out, the problem comes when you ask the Java DateTimeFormatter library to tell you the current YYYY, a commonly-used programmers’ abbreviation meaning “the year expressed as four digits”.

For example, when programmers abbreviate the world’s commonly used date formats, they often use a format string to denote the layout they want, something like this:

Layout                      Format string    Example
------------------------    -------------    ----------
US style (Dec 29, 2019)     MM/DD/YYYY       12/29/2019
Euro style (29 Dec 2019)    DD/MM/YYYY       29/12/2019
RFC 3339 (2019-12-29)       YYYY-MM-DD       2019-12-29

In fact, many programming languages provide code libraries that help you to print out dates using format strings like those above, so that you can automatically adapt the output of your software to suit each user’s personal preference.

The problem here is that there are many different date-handling functions, such as GetDateFormatEx() on Windows, strftime() on Unix and Linux systems, all the way way to Java’s all-singing, all-dancing DateTimeFormatter module.

The abovementioned Java library, amongst others, lets you conveniently format dates with the three strings shown above, giving you plausible results like this:

import java.time.LocalDate;
import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;

public class CarefulWithThatDateEugene {
   private static void tryit(int Y, int M, int D, String pat) {
      DateTimeFormatter fmt = DateTimeFormatter.ofPattern(pat);
      LocalDate         dat = LocalDate.of(Y,M,D);
      String            str = fmt.format(dat);
      System.out.printf("Y=%04d M=%02d D=%02d " +
                        "formatted with " +
                        "\"%s\" -> %s\n",Y,M,D,pat,str);
   }
   public static void main(String[] args){
      tryit(2020,01,20,"MM/DD/YYYY");
      tryit(2020,01,21,"DD/MM/YYYY");
      tryit(2020,01,22,"YYYY-MM-DD");
   }
}

//---------------

Y=2020 M=01 D=20 formatted with "MM/DD/YYYY" -> 01/20/2020
Y=2020 M=01 D=21 formatted with "DD/MM/YYYY" -> 21/01/2020
Y=2020 M=01 D=22 formatted with "YYYY-MM-DD" -> 2020-01-22

So far, so good!

But if you try this in the middle of the year, you get:

Y=2020 M=05 D=17 formatted with "MM/DD/YYYY" -> 05/138/2020
Y=2020 M=05 D=18 formatted with "DD/MM/YYYY" -> 139/05/2020
Y=2020 M=05 D=19 formatted with "YYYY-MM-DD" -> 2020-05-140

An easily spotted bug

What?!?

Note the weird day numbers that are way greater than 31, even though the longest months in the year only have 31 days.

This should get you scrambling back to the documentation, or at least to your favourite search engine, where a cursory glance will reveal that the abbreviation DD actually means day of the year rather than day of the month.

So DD and dd only produce the same answer in January, after which the day of the year goes to 32 while the day of the month resets to 01 for the first of February. (To be clear, on New Year’ Eve – the last day of the year, 31 December – the day of the year is 365, or 366 in a leap year, while the day of the month is 31.)

In other words, even cursory testing of dates outside January will show up this format-string bug, so not many people make it.

What you want is the format string dd, as follows:

Y=2020 M=05 D=17 formatted with "MM/dd/YYYY" -> 05/17/2020
Y=2020 M=05 D=18 formatted with "dd/MM/YYYY" -> 18/05/2020
Y=2020 M=05 D=19 formatted with "YYYY-MM-dd" -> 2020-05-19

A bug that’s harder to spot

Except that you’re still wrong, because YYYY doesn’t mean “the Christian-era year in four digits”.

That’s denoted, in the Java library (and other full-fat date libraries, too) as the lower-case text string yyyy.

In contrast, YYYY denotes what’s known as the week-based year, something that accountants rely on to avoid splitting weeks – and thus the company payroll – between two different years.

The week-based year number and the Christian-era year number are almost always the same, so it’s easy to look at a few outputs from Java’s DateTimeFormatter module and assume that they are always the same…

…but you’d be dangerously wrong.

Inconveniently for farmers, priests, astronomers and businesspeople alike, the solar year doesn’t divide exactly into days, and therefore doesn’t divide neatly into weeks or months. (The lunar month isn’t commensurate with the solar year, either, to make things even more complicated.)

As every bookkeeper knows, there aren’t exactly 52 weeks in a year, because there are always one or two days left over at the end.

That’s a consequence of the fact that there are 365 (or 366) days in a year (or a leap year); that there are 7 days in a week; and that 365/7 = 52 remainder 1 (or 366/7 = 52 remainder 2).

For accounting convenience, therefore, it’s conventional to treat some years as having 52 full weeks, and others as having 53 weeks, which keeps things like weekly revenue plans and weekly payroll evened out in the long run.

In other words, in some years, “payroll week 01” actually starts just before New Year’s Day; in other years, it doesn’t start until a few days into the first week of the New Year.

There’s a standard for that

There’s a standard for that, defined in the ISO-8601 calendar system, described in Java’s documentation as “the modern civil calendar system used today in most of the world.”

ISO-8601 makes some assumptions, including that:

  • The first day of every week is Monday.
  • If a week is split at the end of the year then it is assigned to the year in which more that half of the days of that week occur.

The second assumption seems reasonable, because it means that you always have more payroll days in the correct year than in the wrong one.

For 2015, for example, the there were four days left over after week 52, so the first three days of 2016 were “sucked back” into the payroll year 2015:

Sun 2015-12-27  -> Payroll week 52 of 2015

Mon 2015-12-28  -> Payroll week 53 of 2015 
Tue 2015-12-29  -> Payroll week 53 of 2015 
Wed 2015-12-30  -> Payroll week 53 of 2015
Thu 2015-12-31  -> Payroll week 53 of 2015 
-------------NEW YEAR---------------------
Fri 2016-01-01  -> Payroll week 53 of 2015
Sat 2016-01-02  -> Payroll week 53 of 2015
Sun 2016-01-03  -> Payroll week 53 of 2015

Mon 2016-01-04  -> Payroll week 01 of 2016

But in 2025, it’s the other way round, with just three days left over at the end of 2025 that get “shoved forwards” into the payroll year 2026:

Sun 2025-12-28  -> Payroll week 52 of 2025 
 
Mon 2025-12-29  -> Payroll week 01 of 2026 
Tue 2025-12-30  -> Payroll week 01 of 2026 
Wed 2025-12-31  -> Payroll week 01 of 2026 
-------------NEW YEAR---------------------
Thu 2026-01-01  -> Payroll week 01 of 2026 
Fri 2026-01-02  -> Payroll week 01 of 2026 
Sat 2026-01-03  -> Payroll week 01 of 2026 
Sun 2026-01-04  -> Payroll week 01 of 2026
 
Mon 2026-01-05  -> Payroll week 02 of 2026 

Big date bug ahead!

Can you see where this is going?

If you’ve got a date format string like MM/dd/YYYY or YYYY-MM-dd at any point in any software in which you are using ISO-8601 date formatting libraries…

…you will inevitably encounter bugs that print out dates with the wrong year, either at the end of one year or the start of the next, except in years when New Year’s Day is on a Monday.

(When 31 December is on a Sunday and 01 January is on a Monday, the ISO-8601 “week splitting” process works neatly, with 0 days at the end of the year left over.)

If you use YYYY where you should have written yyyy, your dates will regularly but infrequently be wrong, so your code will make mistakes, even though you might not easily notice them.

Here are the dud dates you’d see for 2018:

Y=2018 M=12 D=30 formatted with "YYYY-MM-dd" -> 2018-12-30  +correct+
Y=2018 M=12 D=31 formatted with "YYYY-MM-dd" -> 2019-12-31  *WRONG* (one year ahead)
-------------------------------NEW YEAR------------------------------
Y=2019 M=01 D=01 formatted with "YYYY-MM-dd" -> 2019-01-01  +correct+

For 2019:

Y=2019 M=12 D=28 formatted with "YYYY/MM/dd" -> 2019/12/28  +correct+
Y=2019 M=12 D=29 formatted with "YYYY-MM-dd" -> 2019-12-29  *WRONG* (one year ahead)
Y=2019 M=12 D=30 formatted with "YYYY-MM-dd" -> 2020-12-30  *WRONG* (one year ahead)
Y=2019 M=12 D=31 formatted with "YYYY-MM-dd" -> 2020-12-31  *WRONG* (one year ahead)
-------------------------------NEW YEAR------------------------------
Y=2020 M=01 D=01 formatted with "YYYY-MM-dd" -> 2020-01-01  +correct+

For 2020:

Y=2020 M=12 D=31 formatted with "YYYY-MM-dd" -> 2020-12-31  +correct+
-------------------------------NEW YEAR------------------------------
Y=2021 M=01 D=01 formatted with "YYYY-MM-dd" -> 2020-01-01  *WRONG* (one year behind)
Y=2021 M=01 D=02 formatted with "YYYY-MM-dd" -> 2020-01-02  *WRONG* (one year behind)
Y=2021 M=01 D=03 formatted with "YYYY-MM-dd" -> 2020-01-03  *WRONG* (one year behind)
Y=2021 M=01 D=04 formatted with "YYYY-MM-dd" -> 2021-01-04  +correct+

If the Twitter thread started by @NmVAson is anything to go by, quite a few programmers still seem to be making this sort of mistake, which implies that they’re not testing their code very well.

As we mentioned above, writing DD by mistake instead of dd seems to be an unusual bug, presumably because the error shows up for about 85% of the year, and stands out dramatically for more than 70% of the year thanks to the three-digit days.

Admittedly, writing YYYY by mistake instead of yyyy produces bugs on fewer than 1% of the dates in a year, and not at all in one year every seven, but even with a lesss-than-1% error rate, there’s not really any excuse for failing to spot that you have made this blunder.

You could probably excuse not catching a one-in-232 error as bad luck; you might even get away with “bad luck” to excuse an error rate of one-in-a-million…

… but at 1% (especially when those one-percenters are right at the proverbial year-end), you really ought not to be letting this sort of bug escape your notice.

What to do?

If you are a programmer or a project manager reponsible for code that handles dates – and anything that does any sort of logging almost certainly needs to do just that – then please make sure that you:

  • Don’t make assumptions. Just because upper-case YYYY denotes the calendar year in some places doesn’t mean it always does.
  • Read the full manual, or RTFM for short. Sadly, TFM for ISO-8601 is extremely complicated, but that should be your problem, not your users’ problem – with power comes responsibility.
  • Review your code properly. Remember, the reviewer needs to RTFM as well.
  • Test your code thoroughly. Anyone with ISO-8601 YYYY bugs really doesn’t have a good and varied test set, given that this bug shows up around the end of six years in every seven.

We take calendars for granted, but their design and use of is a mixture of art and science that has been going on for millennia, and still requires great care and attention.

Especially in software!