Is Deep Reinforcement Learning the Next Phase of AI Adoption?

1033310_Bitvore images_11_041321Over the last few years, you may have read about AI making significant progress in the world of games. Research divisions from various universities and companies have created AI that can compete with (and often beat) grandmaster-level players at Gopoker, and the video game Dota 2. For most, these victories in the arena of games might seem impressive, yet not of much practical value. It might seem challenging to take skill at video games and transfer it to success in business.

However, after a few years of fine-tuning, it seems that the time has come for video gaming AI to make its impression on the business world. This AI, powered by what’s known as Deep Reinforcement Learning, or deep RL, can now simulate corporate business units under various real-world conditions and plan strategies for success under various market conditions. The question is whether the fidelity of these simulations can help businesses out-compete under challenging market conditions.


What is Deep Reinforcement Learning?

Reinforcement Learning is the process of learning through trial and error as applied to artificial intelligence. Deep Learning, meanwhile, is the process of creating artificial neural networks that can learn using all three major machine learning schemas—reinforcement learning, supervised learning, and unsupervised learning.

Therefore, Deep Reinforcement Learning is the process of using neural networks—and only neural networks—that can learn via trial and error. The resulting algorithms are ideal for recognizing patterns of data in large unstructured environments.

So, how does Deep RL work when it’s applied to board games and video games? If you’re trying to teach a game like chess, you create a digital chessboard with pieces that obey the rules of chess, and you give the algorithm a goal—defeat the opposing player by trapping their king. That’s it. The model is given nothing else. Any winning chess strategies—such as blitzkrieg or queen’s gambit—must be derived on its own from first principles. To encourage the development of these strategies, the algorithm is given a reward every time it wins a game and a greater reward every time it wins a difficult game.

The advantage of starting from a blank slate is that the AI does not depend on any previous mode of thinking, allowing it to develop completely novel strategies. AlphaGo, DeepMind’s game-winning Go algorithm, was able to come up with entirely new game-winning strategies for the game of Go.


Adapting a Game-Winning AI to the Business Environment

Using Deep RL, AI researchers can create algorithms that generate previously unseen successful strategies using familiar elements. While Deep RL has once been used to simulate and win games of skill, it is slowly being adapted to the business environment.

The first challenge has been simulating the multidimensional corporate environment. For all their apparent complexity, game boards have relatively simple rules to abstract into a simulation. Businesses are vastly more complex, so the initial attempts to apply Deep RL have focused on modeling simpler components, such as manufacturing processes. As the process has evolved, simulating business units has become a discipline in its own right, known as creating “digital twins.”

By creating digital twins of process-driven business units, Deep RL models can optimize them in the same way that they optimize game play, creating unexpected strategies that allow businesses to find new ways to compete. Not only can these models simulate and optimize business units, but they can also simulate ways in which the business could react to changing market conditions. For example, the algorithms could simulate ways to remain functional amidst a market crash, a natural disaster, or a renewed trade war. This allows companies to prepare contingencies for adverse and unexpected conditions.


What are the Caveats of Deep Rl?

As always, any seemingly miraculous new AI implementation comes with a large grain of salt. Here, there’s nothing wrong with the tried-and-tested Deep RL approach, but the potential downfall comes from the digital twin. Essentially, digital twin simulations are new to the market, and if the simulation is flawed, then the algorithm will make unsound predictions—garbage in, garbage out. Because the digital twin is the weak link in the chain, it should be thoroughly vetted before it’s used to train a model.