Stage 1 - Initial
In the first stage the goal is to foster collaboration between Data Scientists.
Stage 0 is Data Scientists working in isolation on their own machines. Stage 1 is having a shared environment to collaborate.
What this shared environment looks like will depends on your needs and what you’re creating.
For smaller teams, a JupyterLab server may be all you need to share experimentation. For teams with lots of models you might want to leverage something like AWS Sagemaker Studio for better organisation.
While deployments can happen at this stage they are usually manual and experimental. The primary goal of this stage is to establish a culture of collaboration and communication to align the team to produce focused outputs.
Stage 2 - Repeatable
The next stage begins to speed up the development cycle by introducing automated workflows.
In Stage 1 a team may be sharing code but they wont be orchestrating workflows to preprocess data and train models.
Stage 2 introduces tools such as Apache Airflow or Sagemaker Pipelines to make preprocessing & training hands off.
As well as introducing workflows the use of development, staging & production environments occurs. While this is a step towards improving reliability of deployments it's still a manual process.
This move towards automation lays the groundwork for improved efficiency and repeatability. For business impact this means models delivered quicker and safer, increasing buy in to integrate ML.
Stage 3 - Reliable
Stage 3 looks to improve the safety rails of deployments. Improving safety allows the team to deploy more often, providing more value to the business.
One of the key ways to improve reliability is automated testing. Catching issues in preprocessing, training and deployment before they hurt production.
As well as automated testing, the automatic deployment of approved models between environments is introduced.
The use of automatic deployment allows for more focus on experimentation without fear of breaking production systems.
The increasing of reliability allows the capability of the team to exponentially increase. As risk of failure and business impact is mitigated it increases team confidence in deploying faster.
Stage 4 - Scalable
Finally, we reach the pinnacle of MLOps maturity: a fully scalable ML workflow.
At this stage, your workflow can support many teams and manage tens or even hundreds of models.
It's here that the true power of MLOps is realised. Driving an exponential increase in productivity without sacrificing quality. This stage is usually only seen in organisations that have the manpower and infrastructure to support it.
The main goal of this stage is to have a templated solution that is deployable instantly to enable an ML team. Whereas in Stage 1 the deployment of JupyterLab is the norm. Stage 4 sees a whole ML system deployed with the same amount of consideration.
Maturity models aren’t gospel
Looking at this maturity model you might see your team reflected in it.
You might see yourself in one stage or you might see yourself in many stages at once. That’s okay!
Maturity models are a generalisation of how systems evolve. Depending on your specific needs you may have skipped Stage 1 and gone straight to Stage 2. What’s important is that you recognise where your team and systems need to grow and taking action.
Moving from one stage to the next is a complex and gradual process.
Each stage is an iterative step towards achieving a sophisticated, efficient, and robust MLOps pipeline.
With careful planning, continuous learning, and rigorous implementation, you can reach an enviable state of MLOps maturity. |