COVID-19 has proven to be one of the greatest black swan events of our time. What does this mean for the future of AI, Machine Learning and Data Science?
Well, maybe not the actual orbit and spin of our planet, but rather the economy and way of life as most of us know it. The COVID-19 pandemic has changed the way that everything from the individual to governments at every corner of the world operate, and at dizzying speeds. It seems almost impossible to believe that in early February Coronavirus was a new buzz word which only existed in news reports from overseas for the US population. Now, obviously, it is much more than a headline but a threat both direct and indirect to every human being and business around the world.
Luckily, as human beings we have a unique ability to display resilience in times like this. Even though our outlooks and beliefs changed on a dime, we were still able to jump from the conference room to the Zoom Video meeting without an extra moment of contemplation. This is, of course, how we will make it through this extreme challenge which seems to be a direct opponent to humanity itself. Its been amazing to see the world remain generally civilized throughout mass lockdowns, quarantines, and shortages. I believe this is because an enemy like COVID-19 allows us to see that it is every human which is on our team rather than the typical country or business disputes that push humans against one another.
Something that may not fair as well as us humans, is the prediction and time series models which have become a focus of Artificial Intelligence and Data Science applications over the last 10 years. I’m going to make a bold guess and say:
Every single projection or prediction model for 2020, be it Finance, Sales, Anomaly, Traffic, and even Climate, has failed miserably at this poin t.
I think anyone who claims theirs did not may be a culprit of hindsight bias or data tampering. Hindsight is literally 2020 this time…
Anyone in the Data Science community can understand why models could not predict such a change in dynamic. Mainly because modern statistical modeling techniques predict based on inferences from past trends. At the same time, most models remained shuttered in a remote cottage with only their knowledge of the current domain specific data. This means that unlike us, most models did not get word of a growing Pneumonia outbreak in late December which turned out to be a novel virus that would quietly explode into a global pandemic. Unprecedented phenomena like COVID-19 may prove to be the kryptonite of modern Artificial intelligence approach that are aimed at predicting the real world.
Because previously successful models relied on months or years of granular data, it will take quite some time to adjust to the new world dynamic to become useful again. Due to this many industries are going to be pulling the humans back into the forecasting chair that had been taken from them by the models. It will be important during this time to search for ways to allow models to continue operation during unforeseen events, while at the same time fixing this massive anomaly in every dataset for the next training.
Finance is an easy place to see these models fail. As seen in the chart above every stock had sat near an all time high and suddenly was struck down before even the smartest firms could predict the next move. As discussed above nearly every earnings and price target prediction was taken over by humans who had been replaced or assisted by algorithms a short time ago. New predictions are now often prefaced with “COVID adjusted…”
Beyond finance, there are many more examples of never before seen anomalies. Let’s take a look at a few.
A major example of an anomaly that COVID has caused is changes in air quality. The global shutdown, starting with China, of factories, business travel, and commutes has inevitably caused a hard stop to NOx, CO2 and PM output. Without any knowing how long things will remain shutdown this will prove to be a big dip on any time-series dataset forever.
One of the hardest hit industries is the travel industry. Due to the efforts to restrict the global spread of the virus, most travel has been shut down. This is clearly an anomaly for these industries given the near bankruptcy that the air and cruise lines are facing in the current moment. Most of these financial breakdowns are due to the fact that no algorithm or human could have predicted a complete shutdown of their cash flow operations.
Similar to travel, most governments have forced the closure of restaurants or at least take-out only operations. This again caused a shock to everything from restaurants to reservation and review apps to delivery services.
On a lighter note, items of interest and group think buying as was seen for toilet paper, are funny and interesting changes which move in the other direction in times of massive change. Do you think Charmin’s market analysts had this factored into their models for 2020?
The list of these anomalies continues across nearly every industry who collect data except maybe the weather. Soon we may see changes in internet activity like never before with the shift to virtual workplaces along with massive shifts in electrical grid activity. The question is — did these pieces of critical infrastructure have plans in place for this, because we know that those were yet another set of failed forecasts. We will soon enough make it through these wild times though and many industries will settle back to their normal. The next task then will be to find use of the new time based dataset that each hold their own massive dip or spike for a length of time we still do not know.
One morning every data scientist, analyst, and business leader will and make the journey to their place of work without a mask or itching for hand sanitizer after each thing they touch. The overcoming of this pandemic is an inference from past pandemics that I know will hold true.
As business gets back to normal there will be a time when models can be built again in the algorithmic fashion they had become accustomed to. The only problem now is that there will be a massive anomaly that could make up a portion of the set too large to simply consider an outlier. The biggest question for AI, ML, and Data models going forward is: “What should we do with this period of time where the world changed before bouncing back to normal?”
Two simple options come to mind but are littered with inconsistencies. The first and easiest is to cut this section of data out if applicable. This could obviously only work for datasets that track trends, rather than impact of the underlying data. For climate data this could work(China has almost completely rebounded), but for models such as sales figures this fails. The reason for failure is that the time which would be cut out hold an entire period of impact on both the consumer and the business which is meaningful for the next step. This is true even when it vastly changed. For example, as seen in toilet paper purchases, any consumer who stocked up will have supply for a longer period than normal leaving a decline in sales from the normal for the next timestep. This is a common trend amongst business who have benefitted from hoarding or virtualization.
Another technique is to use the adjustment methods similar to the finance industry. Because of the scramble so far, these calculations have been heavily reliant on human intuition to basically guess where number may have held if the world had not changed. More in depth statistical efforts can be used, such as inference based feature engineering. This method can compare pre-COVID projections for each COVID true tilmestep to statistically project averages across the anomaly period. This could be thought of as smoothing in its own way. Of course, this comes with its own set of downfalls. In the climate example, it would introduce a period of time with more or less artificial data. Every model for the rest of time would be trained to model a dataset that contained points that were no more than educated guesses. Modeling this can throw too much human bias into the models supposed indifference.
For a much more in depth analysis of mathematical methods to work with missing data check outthis article.
I want to preface this final section by saying that most non-time series based “AI” applications should continue forward with no issue. This would include Computer Vision, NLP, and many other closed systems. On the other hand, the real-time series models which businesses have become financially dependent on have provided a wake up call for the Machine Learning and Data Science community. While we as humans snapped into our new dynamic relatively quickly, these applications are more or less useless at this point. I’ve ran a few of my experimental stock prediction models and they still continue to predict a sharp rebound to normal prices that next day.
I don’t blame these models or their creators for this almost funny inconsistency. The models only ever knew the normal of the world and the creator(ie me) could have never guessed the world would stop as it did. The best thing we can do in this time for the statistical modeling world is to think harder of ways to build models as resilient as our own selves.
Maybe the best thing we can do with the data discussed above is pull the anomaly period out to create a unique model for stoppages like COVID has created. This could create some of the first high-tech data science models of troubled times. At the same time we can provide systems for our normal models to digest global information the way we do, always ready to switch into crisis model modes. This sounds much easier said than done, but for a society that plans to rely on algorithms more and more it is necessary to see the value in the data we now have along with the hindsight.
I like to find the nuggets of positivity embedded in any struggle. I think that the shocks caused by COVID-19 will lead people, technologies, and businesses to plan for the future much better. It is unfortunate that progress seems to always come with chaos such as wars, infrastructure failures, and pandemics, but the reality is that they provide the nudge to many aspects of human life to become better. With AI, Machine Learning, and Data Science being the most promising technology of our time, it is important to see the nudge this shock has given us. It has allowed us to see the human resilience we should work to embed in our systems to allow them to continue to assist us to an even quicker journey to the light at the end of crisis.