With the linux-5.6 merge window, a project ends that has kept me busy for nearly six years: preventing the “Epochalypse” by changing every single instance of a 32-bit time_t in the kernel to a type that does not roll over on 2038-01-19.
While both John Stultz and I had been thinking about and prototyping partial solutions even earlier, the year 2014 is when we started discussing more openly in Linaro and the wider kernel community about what needed to happen. In a team effort, John started rewriting the core timekeeping support of the kernel, working his way out, while I would work my way down from the outside, starting with file systems and then system calls and device drivers with the goal of getting this done by the end of the year.
As chronicled on lwn.net , it turned out to take a bit longer. In order to address over 1000 files referencing time_t, timeval or timespec as of linux-3.15, we recruited help from a number of places.
The Outreachy program was a great resource for getting a lot of simple changes in drivers done, while internship candidates learned about contributing to the mainline kernel. Tina Ruchandani was my first intern and contributed 25 patches for the y2038 work in 2014/2015. For the 2015/2016 round, Deepa Dinamani joined as the second Outreachy intern and ended up implementing some of the most important bits all the way until the end with hundreds of patch submissions.
Within Linaro’s Kernel Working Group, I assigned simple driver conversions to new assignees from member companies to get them started on contributing to the upstream kernel while getting the conversion done one driver at a time, before moving on to more review intensive work in the kernel. Baolin Wang worked on converting real-time clocks and the audio subsystem, Firoz Khan’s first contribution was to rewrite the system call tables across all CPU architectures and many others contributed to device drivers.
Usually, getting y2038 fixes included was really easy, as maintainers are generally happy to take an obviously correct bugfix that they don’t have to implement themselves. However, some cases turned out to be much more time and labor intensive than we had imagined.
Converting the VFS code to use 64-bit inode timestamps took countless rewrites of the same patches, first from me and then from Deepa who finally succeeded. We wanted to avoid having to do a “flag day” change, which is generally considered too invasive and risks introducing regressions, and we wanted to minimize the changes for existing 64-bit users and for existing 32-bit applications. Doing this step-by-step change however turned out to add a lot of complexity as well. In the end, Deepa worked out a process of many non-invasive changes over multiple merge windows, followed by an automated conversion using coccinelle . The same series also fixed unrelated issues in the way some file systems generated their timestamps which reviewers had complained about.
This is an effect that can be observed a lot in kernel development: when you work on a simple bugfix, there is a good chance that development or review finds a much larger issue that also wants to be addressed, at which point it becomes near impossible to get the simple change merged without also addressing the wider problem. Issues that we addressed along the way include:
With all the VFS and system call changes out of the way during early 2019, the kernel was basically working, but a number of smaller issues still remained. In the summer I set out to make a list of everything that was still missing and revisited patches I had done in the previous years. Instead of creating the list I ended up writing the remaining ~100 patches: alsa and v4l2 were still lacking ABI changes, the NFS implementation and a few other file systems still needed changes, and there were still users referencing the time_t type. The resulting branch was basically ready for linux-5.4, and with the usual bug fixes and testing this has now all but made it into the ongoing linux-5.6 merge window. The last patch in the series hides the traditional time_t definition from kernel space and removes all the now unused helper functions that use it to prevent new references from getting merged.
After the time64 system call ABI was finalized in linux-5.1, work on using this in the C libraries got a lot more serious. The release of musl-1.2 is now imminent and will provide time64 for all newly compiled code. Adelie Linux is already migrating to this version and has a list of known issues . I expect the bugs to also get fixed in upstream projects soon. The first preview release of a time64 Adelie Linux is available for testing now . Most other distributions based on musl are likely to do the same conversion over the next months, depending on their release cycles.
For glibc, work is still ongoing, the plan at the moment is to move over to 64-bit time_t as an option in glibc-2.32 later this year. However, the default is still a 32-bit time_t, and as glibc based distributions tend to have a larger number of packages, there is a very significant effort in rebuilding everything in a coordinated way. Any library that exposes an interface based on time_t must be recompiled along with all applications and other libraries using this interface, so in the end the result is typically a completely incompatible distribution. The Debian “armhf” port for ARMv7 CPUs is an obvious candidate that will have to go through this transition, but I expect most of the other distributions on 32-bit CPUs to stay with 32-bit time_t and then stop support before this becomes a problem.
So far it is looking good for the distro port, as most of the y2038 problems have already been found by the various BSD Unixes that changed over years ago (thanks guys!), so a lot of the remaining problems are either Linux specific, or in applications that have never been ported to anything other than Linux. I expect that once we get into larger scale testing, we will find several sets of problems:
The biggest challenge will be to find and update all the devices that are already being deployed without the necessary bug fixes. The general move to 64-bit hardware even in deeply embedded systems helps ensure that most machines only run into the last set of problems, but 32-bit hardware will be deployed for many years to come, and will increasingly run on old software as fewer developers are motivated to work on them.