Maturity, Scale, and Speed Driving Leading ML/AI Vendors

Written by ETR Research | Jun 16, 2023 12:30:00 PM

ETR Observatory for Machine Learning / Artificial Intelligence Tools

The ML/AI sector today tracks vendors targeting organizations in the full spectrum of maturity with regards to data science programs. Organizations with more mature programs largely built data science teams that embraced a variety of tools, most certainly open-source programming languages like R and Python and open-source tools like Jupyter Notebook and TensorFlow. But as programs evolve, reproducibility and reuse of ML models becomes important, as well as all of the governance and data integration activities and monitoring that are part of end-to-end MLOps frameworks. Organizations that have more recently begun their data science journeys may leap-frog these phases given a variety of turnkey, business user-friendly solutions from today’s vendors focused on “democratizing” data science. No matter the stage of maturity, however, there is no denying the impact of generative AI’s recent emergence in the zeitgeist with OpenAI’s popular ChatGPT tool. It seems most organizations are now at least exploring possible use cases for ML/AI, and vendors in a variety of sectors – from security to RPA to enterprise applications – are weaving AI capabilities into their core product offerings.

The ML/AI market reflects this diversity of maturity and use cases, with classes of vendors that speak to the needs of different organizations. Large public cloud platforms like Microsoft, AWS, and Google offer full MLOps capabilities for enterprise-scale data science programs, competing alongside other popular end-to-end offerings like Oracle, IBM Watson, and Domino and open-source packages like TensorFlow and Anaconda. Another class of tools aims to simplify the complexity of data science work by offering business-user-friendly products like pre-trained ML models to speed time to business value, such as DataRobot, C3.ai, H2O.ai, ElectrifAi, and Hugging Face. Still others have broadened their appeal as ML/AI platforms by focusing earlier in the data and analytics pipeline, such as Databricks with its popular data lakehouse paradigm for data management or Dataiku with its finesse as a data preparation tool. And a final class of ML/AI vendors occupy niche use cases and specializations, such as Anyscale’s excellence in scalability, Labelbox’s specialty for labeling multimedia data for ML model training, or Character.ai’s inventive capabilities for creating personas for conversational AI and chatbots. Across these many ML/AI vendors, we see varying levels of spending intention and utilization across enterprises, with the more robust MLOps offerings occupying the leading positions and niche applications trailing.

Positioning for the ETR Observatory on ML/AI was determined by ETR’s two core, syndicated surveys. Full methodology and graphic explanation are available on our About the ETR Observatory page.

This ETR Observatory report examines the vendors within a subsector grouping by triangulating data from ETR’s Technology Spending Intentions Survey (TSIS), Emerging Technology Survey (ETS), commentary from ETR Insights interviews with IT decision makers (ITDM) from the ETR Community, and proprietary industry analysis by our research staff.

TSIS data measures spending velocity on a vendor or product based on ETR’s proprietary measures of Net Score and Pervasion. ETR Insights interviews provide qualitative context and vendor evaluation to complement quantitative data. This specific report encompasses ML/AI vendors tracked within the TSIS, and smaller, private players that are tracked in our Emerging Technology Survey (ETS).

ML/AI Vendors Offer Mature Organizations Scale, Management, and Quality

As ML/AI matures and more enterprises invest in large-scale use cases for the technology, the focus for ML/AI has shifted toward quality, scale, management, and speed. Early proofs of concept become enterprise-wide ML/AI deployments baked into day-to-day operations only with investment in better ML workflows (MLOps), faster computing, and bigger and more reliable data sets. Vendors in the ML/AI market seem to be focused either on delivering excellence in a niche, such as ease of creating model training data; delivering convenience with easy off-the-shelf applications and pre-trained models that lead to quick business gains; or providing full, end-to-end MLOps platforms. ETR’s TSIS data show all sectors declining in spend year-over-year, but ML/AI remains relatively strong, with the second-highest sector Net Score at 42%, behind only the Container Orchestration sector at 45%.

The roots of ML/AI remain firmly in an open-source ethos, and many data scientists prefer open-source tools and free programming languages to do their work. The enterprise ML/AI market is a complicated one, then, with nearly every tool providing integration and support for languages such as Python and platforms such as Jupyter Notebook at minimum. Some of the vendors tracked in ETR’s ML/AI sector are indeed open-source tools themselves that have expanded to offer paid, premium-level services like managed infrastructure alongside their popular free products.

Compared to the fates of other sectors, the ML/AI sector does not appear to be quite as affected by the gravity of the big public cloud platforms and organizational pushes to consolidate onto a single vendor. In the video conferencing and productivity applications sectors, for instance, it is increasingly common to see organizations moving to align their choices to the public cloud platform in which they have made large-scale investment, such as an organization switching to Teams and Project when they commit to a broad Microsoft license. In the ML/AI sector, however, data science teams still seem to have a lot of latitude and independence to experiment and to choose the environment that works best for their data science needs. An organization that is largely a Microsoft shop, for example, may have robust ML/AI activity in another platform, such as Google or AWS. Time will tell whether consolidation pressures will extend to the ML/AI realm, but as data science programs mature and establish tighter integrations with an enterprise’s security, data and analytics, and infrastructure array, we will likely see more concerted effort to consolidate ML/AI and align to the organization’s dominant public cloud platform.

Figure 1. Microsoft is the stand-out leader in the ML/AI sector in both Net Score and Pervasion. The other two major public cloud players, AWS and Google, follow behind in Pervasion, with AWS holding the second-highest Net Score. Databricks is third in Net Score and fourth in Pervasion, and a cluster of other vendors hover behind. SparkCognition, OpenText Magellan, and Oracle have just single-digit Net Scores, and IBM Watson is the only vendor in the sector with a negative Net Score.

I. Generative AI’s Moment

In late 2022, OpenAI made headlines when it made ChatGPT widely available, an interface where users could interact with a kind of chatbot to produce sophisticated AI-generated responses. Though in development for several years prior, the arrival of ChatGPT caught the attention of the press, business, and popular culture in a way no other previous generative AI or large language model (LLM) tool had. ChatGPT has been called a “Sputnik moment” in how it has launched a new AI race, and pundits have been quick to polarize into the two typical camps we see anytime a new technology breaks onto the scene: those seeing the technology as a revolutionizing force for utopian transformation, and those seeing it as a serious threat to the established order and calling for regulation. In a recent ETR Insights interview, the IT manager for a midsize municipal government remarked that this enthusiasm will ultimately be good for AI in general: “One of the good things about OpenAI, ChatGPT, and things like that is the increase of public awareness. People that I wouldn’t expect to be asking me about AI are now asking about AI […] which is great because it gives everyone a chance to have those types of discussions and potential use cases.” Just a few months in, and we already see many companies weaving ChatGPT and similar products into their enterprise tech offerings, such as Microsoft with its $10 billion investment in OpenAI and its rollout already of ChatGPT in its Microsoft 365 products under the Copilot branding.

As with any new technology, the hype around generative AI and LLMs will give way to more evenhanded discourse and practical achievements at scale tempered by rational controls. Until then, however, the hype for generative AI in general and for OpenAI in particular remains high. ETR added OpenAI to the February 2023 ETS, and right away it posted the highest evaluation rate in the entire survey. It remained atop the heap in the May 2023 survey, with a whopping 46% evaluation rate, which tracks plans to evaluate and current evaluation indications. Despite the high evaluation rates captured in the ETS, most organizations have still not shifted to utilization of generative AI tools, suggesting that clear business use cases are not yet evident for many businesses. In the April 2023 Macro Views survey, ETR asked respondents for what business use cases they were considering generative AI and LLM tools. More than half indicated they were not even evaluating these tools yet, but among those who were considering generative AI, customer support, text and data summarization, and code generation and documentation were the most common uses.

II. ML/AI at Enterprise Scale with Robust MLOps Capabilities

As an organization shifts from experimentation with its first data science projects to full-scale deployment of ML/AI, the focus becomes ML model management and reuse, monitoring, governance, and ultimately automation. Many of the vendors that provide these capabilities – collectively called MLOps, which draws from DevOps principles of continuous delivery – are leaders in ETR’s spending intentions data, demonstrating deepening investment in complete and scalable platforms. The big three cloud platforms offer these complete capabilities in the form of Microsoft’s Azure Machine Learning, Google’s Vertex AI, and Amazon SageMaker, and these platforms are striving toward greater ease of use and accessibility. The VP of Business Intelligence and Analytics for a midsize financial services enterprise remarked in a recent ETR Insights interview that he liked Azure ML because it was “self-service” and did not require “a DevOps competency” to navigate. He said with “Azure, I was able to get my data scientists, plug them in, and just let them start working with the team, and we were able to go from training to deployment in a couple of weeks.” Microsoft, AWS, and Google have the highest Pervasion in the ML/AI sector, according to TSIS data. Microsoft’s Net Score looms large in the sector at 67%, followed by AWS at 55% and Google at 47%. Large legacy vendors like Oracle and IBM also offer full MLOps capabilities through Oracle Machine Learning and IBM Watson Machine Learning, respectively. Both have low Net Scores in recent TSIS cycles and Pervasion under 20%. Oracle has slowly gained ground in Pervasion in recent surveys and held relatively steady with single-digit Net Score, while IBM Watson has dropped considerably in Net Score into negative territory and is showing contracting Pervasion. Some smaller vendors, such as Domino, focus on enabling robust MLOps and collaboration without organizations having to worry about managing infrastructure. Domino has seen Net Sentiment increase year-over-year in the ETS, but it remains low in absolute terms at just 8% as of the May 2023 survey.

Open-source platforms Anaconda and TensorFlow offer support for ML model deployment and end-to-end management in both free licenses and in paid tiers with added support and storage. Anaconda provides a way to manage versions of Python and packages and deploy projects, while TensorFlow offers a more complete MLOps tool set but is known for its specialty in training for deep learning use cases. In a recent ETR Insights interview, the VP and Chief Data Architect for a large consumer goods company noted his organization uses platforms like Google and Databricks “for a certain class of problems,” but will turn to “more generic tools like Python and TensorFlow to build more complicated regression models.” In another interview, the CIO and CISO for a nonprofit medical research institute added that TensorFlow is “a solid name” with a “great community ecosystem,” an important attribute for the success of any open-source project. May 2023 ETS data shows TensorFlow and Anaconda with the third and fourth highest Net Sentiment in the ML/AI sector, trailing stand-out OpenAI as well as Databricks. TensorFlow’s Net Sentiment declined a bit from May 2022 levels, down four percentage points to 34%, while Anaconda’s Net Sentiment has remained steady at 28% for multiple surveys.

III. Democratizing Data Science to Speed Time to Business Value

Several ML/AI companies have tried to simplify the complexity of data science and offer shortcuts for organizations to see business value in data science programs quickly. Often these vendors use terms like “democratizing data science” in their marketing materials, and this view of democratization might include low-code interfaces for building, training, and deploying models or ready-to-use pre-trained ML models fit for a variety of common business use cases.

H2O.ai is one such vendor, focusing on ML applications accessible to business users who do not have expertise in tuning models, model deployment, or infrastructure. Similarly, DataRobot offers automated machine learning, or AutoML, an approach to ML that uses AI to parse the signal from the noise in a large data set to fit models to clear business problems. C3.ai and Dataiku have similar offerings, with C3.ai focused on speed to business value and Dataiku known for its finesse as a data preparation tool and ease of collaboration between technical and business users. In a recent ETR Insights interview with the Head of Data Science and Analytics for a large healthcare/manufacturing enterprise, all four of these tools – H2O.ai, DataRobot, Dataiku, and C3.ai – were appraised as being “kind of a fun thing” to explore in proofs of concept, primarily because of their accessibility to business users. However, he noted that “our purists – our data scientists in the company who I could purists – don’t want to use these tools. ‘I want to use my Python and R, and all my open-source capabilities to build the models.’” However, he cited the organization’s leadership, who wanted shareability and a common platform where everyone could understand and learn from it, ultimately pushed for alignment on one of these easier-to-use platforms. In another interview, a data scientist for a large financial institution pointed out that the “positive things” about this class of tools is that “if the use cases are relatively straightforward, all of these tools are good,” adding that they offer “opportunities to get quick models and results.” However, he noted the “critical challenge in terms of the work product is that it is very difficult to customize because it’s a black box.” The trade-off with tools accessible to non-technical users is that it is often difficult to get to a technical understanding of what the system is producing, which is limiting.

A CIO shares his thoughts about names in the ML/AI sector. Listen to which vendors he calls "legacy names" and what his says about "next-movers".

Indeed, mature data management and preparation discipline is a key feature in enterprise-scale ML/AI programs, and this explains why Databricks continues to garner strong spending intentions and rave reviews from ETR’s community of IT decision makers. Databricks’ data lakehouse architecture, which blends the strengths of the data warehousing and data lake paradigms into a practical all-in-one data management solution, has brought the company not just into prominent standing as an ML/AI platform but also as a data warehousing alternative for more run-of-the-mill business analytics use cases. In an ETR Insights interview, the VP and Principal Software Engineer for a large IT services enterprise expects Databricks “to stay at the number one spot for whatever level of ML/AI” an organization needs or “whatever data processing tasks,” noting that “Databricks seems to be the way to go and the way to run your models for any of your analysis.” He adds that he “sees them getting bigger and bigger in the space” and taking on Snowflake and the big three cloud platforms for data warehousing too.

Finally in this class of ML/AI tools are vendors like ElectrifAi and Hugging Face. ElectrifAi’s value proposition focuses on pre-built ML and natural language processing (NLP) models ready for deployment in popular business use cases. Hugging Face, on the other hand, showcases the power of its community and the wealth of resources community members are willing to share with each other, from fully trained models to data sets. In an ETR Insights interview, a data scientist from a large financial services enterprise said “Hugging Face is a must in NLP. Their models, the pretrained models, are extremely helpful. An amazing product based on [a technique for NLP pre-training] BERT, for mostly NLP purposes in using neural networks.” With both ElectrifAi and Hugging Face, the aim is for businesses not to have to reinvent the wheel when it comes to ML/AI and realize quicker time to business value.

This Insights Interview features an interview with the VP of Business Intelligence and Analytics who has centralized all data under Snowflake and leverages Azure for Infrastructure-as-a-Service and AI/Ml. Watch to hear his feedback on tools such as Azure, Databricks, Dataiku, and ChatGPT.

IV. Tools with Unique Value

A cluster of other ML/AI vendors can be categorized based on the kind of unique value or specialized niche use case they offer to the broader ML/AI landscape. Anyscale, which emerged from a university lab, focuses on scalability, helping organizations make the leap from development to enterprise-level, sustainable production on ML/AI. Despite the name, it is not to be confused with Scale AI, a vendor focused on providing high quality training data for training ML models and AI applications. Further into a niche use case is Labelbox, a collaborative platform for labeling training data in a variety of formats, including text, images, and video. Lastly, Character.ai allows users to develop their own chatbot characters and personas and hold full conversations with these different characters powered by generative AI. Still in its early stages, the tool could have a number of uses from customer service chatbots to digital assistants to therapy and counseling.

Conclusion: ML/AI Tools Suited to Different Organizational Needs

As with any enterprise IT tool market, ML/AI tools have evolved to fit the needs of a variety of organizations that are all maturing their advanced analytics initiatives at different rates. What is clear, however, is that the early stages of exploration in ML/AI have, for many organizations – and especially larger enterprises – reached a point of widespread enterprise-scale adoption, requiring robust end-to-end data architectures and sophisticated management and governance programs. Smaller organizations or those just now dipping their toes in the waters of data science are finding early value in free, open-source offerings or tools with pre-built, pre-trained ML models ready to accelerate time to business value. And with the broad awareness of generative AI technology in just the last several months, vendors and organizations alike are scrambling to imagine new possibilities with AI to leverage the power of LLMs.

To respond to these organizational stages, the ML/AI vendor landscape is dotted with big players offering full MLOps capabilities for mature data science operations, with large public cloud players like Microsoft, AWS, and Google leading the way alongside open-source stalwarts TensorFlow and Anaconda and popular open libraries and languages like Jupyter Notebook, Pyton, and R. Other vendors have built reputations for strength in particular areas, like Databricks and Dataiku, which offer sophisticated data management setups that feed data science use cases as well as more line-of-business-focused reporting and self-service business intelligence use cases. A separate set of vendors aim for democratizing data science, with off-the-shelf, turnkey ML solutions alluring to business executives and less technically-savvy business users. Still, yet another group of vendors remain in a niche, specializing in nuanced techniques like model training, chatbot refinement, or rapid scaling that specific industries may find more appealing at the moment.

View full post