Maturity, Scale, and Speed Driving Leading ML/AI Vendors

Written by ETR Research | Apr 24, 2024 2:30:00 PM

About This Report

16-minute read • Based on data collected April 2024

This Observatory features the most comprehensive and current end-user and evaluator data and feedback about the ML/AI marketplace. In addition to relying upon ETR’s industry-leading evaluation and spending intentions data, this report also leverages ETR’s proprietary data set: the ETR Market Array. The ETR Market Array for ML/AI was designed specifically to capture usage and evaluation metrics across a wide swath of professionals representing the end user and evaluator buying demographic. The study offers data and analysis around spending trends, vendor usage, return on investment (ROI), churn, product feature rankings, Net Promoter Scores (NPS), and more for the ML/AI players encompassed in this Observatory. This report utilizes some of that market intelligence data; however, the full ML/AI Market Array study is available separately.

While structuring a grouping of disparate vendors with varying functionalities is subjective, the ETR Observatory for ML/AI vendors categorizes by placement in the Observatory Scope primarily, which breaks down the data-driven plotting of each vendor into four vectors. It is important to remind our readers that all ETR Observatory reports are based solely on evaluator data and feedback, not vendor involvement. This report examines the below selection of ML/AI vendors by triangulating data from ETR’s Market Array for ML/AI Tools, Technology Spending Intentions Survey (TSIS), Emerging Technology Survey (ETS), commentary from ETR Insights Interviews with IT decision makers (ITDMs) from the ETR Community, and industry analysis by our research staff. TSIS data measures spending velocity on a vendor or product based on ETR’s proprietary Net Score and Pervasion measures. ETR Insights interviews provide qualitative context and vendor evaluation to complement the data.

Executive Summary

The ML/AI sector today tracks vendors targeting organizations in the full spectrum of maturity with regard to data science programs. Organizations with more mature programs largely built data science teams that embraced a variety of tools, most certainly open-source programming languages like R and Python and open-source tools like Jupyter Notebook and TensorFlow. But as programs evolve, reproducibility and reuse of ML models become important, as well as all of the governance and data integration activities and monitoring that are part of end-to-end MLOps frameworks. Organizations that have more recently begun their data science journeys may leap-frog these phases given a variety of turnkey, business user-friendly solutions from today’s vendors focused on “democratizing” data science. No matter the stage of maturity, however, there is no denying the impact of generative AI’s recent emergence in the zeitgeist. It seems most organizations are now at least exploring possible use cases for ML/AI, and vendors in a variety of sectors – from security to RPA to enterprise applications – are weaving AI capabilities into their core product offerings.

The ML/AI market reflects this diversity of maturity and use cases, with classes of vendors that speak to the needs of different organizations. Large public cloud platforms like Microsoft, AWS, and Google offer full MLOps capabilities for enterprise-scale data science programs, competing alongside other popular end-to-end offerings like Oracle and IBM, and open-source packages like TensorFlow and Anaconda. Generative AI-focused offerings like Meta Llama, Anthropic’s Claude, and OpenAI’s ChatGPT offer large language models (LLMs) to help accelerate organizations’ generative AI use cases. Another class of tools aims to simplify the complexity of data science work by offering business-user-friendly products like pre-trained ML models to speed time to business value, such as DataRobot, C3.ai, H2O.ai, and Hugging Face. Still, others have broadened their appeal as ML/AI platforms by focusing earlier in the data and analytics pipeline, such as Databricks with its popular data lakehouse paradigm for data management or Dataiku with its finesse as a data preparation tool. Acknowledging Snowflake’s growing presence in this space, ETR intends to begin tracking the company in our ML/AI sector beginning with the next TSIS period. Snowflake’s positioning in this market will be included in future reports. Across these many ML/AI vendors, we see varying levels of spending and utilization across enterprises, with the more robust MLOps offerings and the generative AI-focused products occupying leading positions.

The Observatory

The plotting of vendors across the Observatory Scope is supported wholly by ETR’s exclusive market intelligence and spending intentions data sets (see Figure 1 above). The Leading vector in this period is populated by OpenAI (ChatGPT), Microsoft Azure Machine Learning, Amazon SageMaker, Meta Llama, Google (Vertex AI), Databricks, Hugging Face, and TensorFlow. ChatGPT leads in overall Presence in the survey, while Microsoft Azure Machine Learning leads in Momentum. Anthropic (Claude) is the sole vendor in the Advancing vector, showing relatively high Momentum but lower Presence than the Leading vector.

IBM Watson, Oracle, and Anaconda sit in the Tracking vector, with high Presence but relatively lower Momentum. Finally, DataRobot, Cohere, H2O.ai, Jasper, Dataiku, and C3.ai occupy the Pursuing vector, with Presence and Momentum that are relatively lower than those of their sector peers. However, the tight range near the center of the vector axis demonstrates the favorable positioning of most players included in this report. This report will break down the four vectors and the vendors in more detail in the following sections.

Spending Intentions

Figure 2. ETR’s Market Array spending Net Scores for ML/AI vendors were derived from a survey of 300 ML/AI users and evaluators.

Figure 2 above shows Market Array Net Score for vendors within the ML/AI marketplace, tracking forward-looking spending trajectory for each vendor’s ML/AI-specific offerings. This differs from ETR’s TSIS, which tracks overall spending projections at the company- and sector-wide levels. The data visualized in this figure will be referenced throughout this Observatory report.

Microsoft Azure Machine Learning holds the highest Net Score in this subsector at 86.1%, driven by the highest portion of Increase spending indications in the survey and a healthy Adoption rate of 16%, with zero users reporting plans to Replace the vendor in the next year. After Microsoft, three generative AI-focused vendors have the next highest Net Scores: OpenAI (84.3%), Anthropic (75.0%), and Meta Llama (70.6%). Cohere, Amazon SageMaker, Google (Vertex AI), Databricks, and Hugging Face follow after these top four in Net Score. Each of these vendors show minimal churn indications combined with elevated positive spending plans.

Toward the middle of the pack are vendors with sizable flat spending indications but still positive spending indications and minimal negative indications. The Net Scores for H2O.ai (53.3%), Dataiku (50.0%), Jasper (50.0%), and TensorFlow (45.6%) are robust and healthy.

More negative spending plans combined with sizable flat indications and fewer positive spending plans explain lower Net Scores for vendors like DataRobot (41.7%), IBM Watson (40.6%), C3.ai (31.3%), Oracle (20.4%), and Anaconda (18.8%). Though these vendors have the lowest Net Scores relative to their subsector peers in this survey, the Net Scores for DataRobot, IBM Watson, and C3.ai are elevated and in line with spending plans in many other sectors. This speaks to the general health of the ML/AI market at this moment, where many organizations are pursuing investment in these technologies.

I. Generative AI’s Moment

In late 2022, OpenAI made headlines when it made ChatGPT widely available, an interface where users could interact with a search chatbot to produce sophisticated AI-generated responses. Though in development for several years prior, the arrival of ChatGPT caught the attention of the press, business, and popular culture in a way no other previous generative AI or LLM tool had. ChatGPT has been called a “Sputnik moment” in how it has launched a new AI race, and pundits have been quick to polarize into the two typical camps we see any time a new technology breaks onto the scene: those seeing the technology as a revolutionizing force for utopian transformation, and those seeing it as a serious threat to the established order and calling for regulation. In one ETR Insights interview, the IT manager for a midsize municipal government remarked that this enthusiasm will ultimately be good for AI in general: “One of the good things about ChatGPT is the increase of public awareness. People that I wouldn’t expect to be asking me about AI are now asking about AI […] which is great because it gives everyone a chance to have those types of discussions and potential use cases.” Just a few months in and we already see many companies weaving ChatGPT and similar products into their enterprise tech offerings, such as Microsoft with its $10 billion investment in OpenAI and its rollout of ChatGPT in its Microsoft 365 products under the Copilot branding.

Some are already growing weary of the generative AI hype. One Director of Technology for a small industrials/manufacturing enterprise said that “when ChatGPT really took off, everything in the world was promised with AI. And that is potentially the future, but it’s not quite there. At this point, the amount of dot-AI companies that every decision maker is seeing or asked to invest in, it’s just overkill at this point.” As with any new technology, the hype around generative AI and LLMs will give way to more evenhanded discourse and practical achievements at scale, tempered by rational controls. Until then, however, the hype for generative AI in general and for OpenAI in particular remains high. In the April 2024 Generative AI Drill Down Survey, more than a quarter of respondents said they had generative AI in a full production environment across four major categories of business use cases: code generation and documentation, customer support, writing marketing and sales content, and text and data summarization. As a VP of Information Technology for a large tech enterprise put it, “by far, the number one priority for many organizations” at the moment is “applying ChatGPT-like solutions to all sorts of problems, driving better automation, saving time, and leading to better outcomes.” However, there is cautious optimism around generative AI, with more than a third of respondents in that same Drill Down Survey citing data security, privacy, legal, compliance, and regulatory concerns as top barriers to implementing generative AI.

In the Market Array for ML/AI, respondents cited an innovative technical roadmap as the top product strength for each of the three major generative AI-specific vendors – Meta Llama, Anthropic, and OpenAI. Ease of use was also noted as a top-three product strength for each of these vendors. Respondents were also given the opportunity to freely write in the name of the vendor they see as most innovative, and more than a quarter (27%) indicated OpenAI, which garnered the most write-in support. Microsoft was seen as second-most innovative with 24% of respondents writing the company’s name in (see Figure 3 below).

Most Innovative

Figure 3. ETR’s Market Array tracks the “Most Desired” and “Most Innovative” vendors. The above depicts a small section of the most innovative vendors. The full analysis is available via the Market Array data set.

Despite the love for generative AI products, respondents are still hesitant to commit to lengthy investments in these tools. Perhaps this reflects a new and rapidly evolving technology set against a backdrop of constrained IT budgets. Figure 4 shows the anticipated length of use of ML/AI products. OpenAI, Anthropic, and Meta Llama do not rank well in this chart. Less than half of respondents for OpenAI (47%) anticipated using the vendor for three or more years. About a third of respondents for Anthropic (34%) indicated three or more years of use, and only 30% of respondents for Meta Llama foresaw three or more years of commitment to the product.

Expected Churn

Figure 4. This chart visualizes the Market Array data for ETR’s Expected Churn metric, which measures how long customers expect to use a product. The full analysis is available via the Market Array data set.

II. ML/AI at Enterprise Scale with Robust MLOps Capabilities

As an organization shifts from experimentation with its first data science projects to full-scale deployment of ML/AI, the focus becomes ML model management and reuse, monitoring, governance, and, ultimately, automation. Many of the vendors that provide these capabilities – collectively called MLOps, which draws from DevOps principles of continuous delivery – are leaders in ETR’s spending intentions data, demonstrating deepening investment in complete and scalable platforms. The big three cloud platforms offer these complete capabilities in the form of Microsoft’s Azure Machine Learning, Google’s Vertex AI, and Amazon SageMaker, and these platforms are striving toward greater ease of use and accessibility. The VP of Business Intelligence and Analytics for a midsize financial services enterprise remarked in an ETR Insights interview that he liked Azure ML because it was “self-service” and did not require “a DevOps competency” to navigate. He said that with “Azure, I was able to get my data scientists, plug them in, and just let them start working with the team, and we were able to go from training to deployment in a couple of weeks.”

Open-source platforms Anaconda and TensorFlow offer support for ML model deployment and end-to-end management in both free licenses and paid tiers with added support and storage. Anaconda provides a way to manage versions of Python and packages and deploy projects, while TensorFlow offers a more complete MLOps tool set but is known for its specialty in training for deep learning use cases. In an ETR Insights interview, the VP and Chief Data Architect for a large consumer goods company noted that his organization uses platforms like Google and Databricks “for a certain class of problems,” but will turn to “more generic tools like Python and TensorFlow to build more complicated regression models.” In another interview, the CIO and CISO for a nonprofit medical research institute added that TensorFlow is “a solid name” with a “great community ecosystem,” an important attribute for the success of any open-source project.

III. Democratizing Data Science to Speed Time to Business Value

Several ML/AI companies have tried to simplify the complexity of data science and offer shortcuts for organizations to see business value in data science programs quickly. Often, these vendors use terms like “democratizing data science” in their marketing materials, and this view of democratization might include low-code interfaces for building, training, and deploying models or ready-to-use pre-trained ML models fit for a variety of common business use cases.

H2O.ai is one such vendor, focusing on ML applications accessible to business users who do not have expertise in tuning models, model deployment, or infrastructure. Similarly, DataRobot offers automated machine learning, or AutoML, an approach to ML that uses AI to parse the signal from the noise in a large data set to fit models to clear business problems. C3.ai and Dataiku have similar offerings, with C3.ai focused on speed to business value and Dataiku known for its finesse as a data preparation tool and ease of collaboration between technical and business users. In an ETR Insights interview with the Head of Data Science and Analytics for a large healthcare/manufacturing enterprise, all four of these tools – H2O.ai, DataRobot, Dataiku, and C3.ai – were appraised as being “a fun thing” to explore in proofs of concept, primarily because of their accessibility to business users. However, he noted that “our purists – the data scientists in the company who I call purists – don’t want to use these tools. They want to use Python and R, and all the open-source capabilities to build the models.’” Still, he cited the organization’s leadership, who wanted shareability and a common platform where everyone could understand and learn from it, ultimately pushed for alignment on one of these easier-to-use platforms.

In another interview, a data scientist for a large financial institution pointed out that the “positive things” about this class of tools are that “if the use cases are relatively straightforward, all of these tools are good,” adding that they offer “opportunities to get quick models and results.” However, he noted the “critical challenge in terms of the work product is that it is very difficult to customize because it’s a black box.” The trade-off with tools accessible to non-technical users is that it is often difficult to reach a technical understanding of what the system is producing, which is limiting. Figure 5 shows how the actual use of products is changing, regardless of expected spending. Vendors focused on this easy-to-use democratization of data science tend to have some of the most stagnant rates of use compared to the rest of the ML/AI market. Nearly half of respondents (47%) said usage was flat or decreasing for H2O.ai. Fully 59% of respondents indicated flat or decreasing usage for C3.ai. And more than two-thirds of respondents indicated flat or decreasing usage for DataRobot (67%) and Dataiku (75%).

Usage Change

Figure 5. Regardless of spending intentions, ETR’s Usage Change analysis measures organizational changes in utilization levels of products. The full analysis is available via the Market Array data set.

Indeed, mature data management and preparation discipline is a key feature in enterprise-scale ML/AI programs, and this explains why Databricks continues to garner strong spending intentions and positive reviews from ETR’s community of IT decision makers. Databricks’ data lakehouse architecture, which blends the strengths of the data warehousing and data lake paradigms into a practical all-in-one data management solution, has brought the company not just into prominent standing as an ML/AI platform but also as a data warehousing alternative for more run-of-the-mill business analytics use cases. In an ETR Insights interview, the VP and Principal Software Engineer for a large IT services enterprise expects Databricks “to stay at the number one spot for whatever level of ML/AI” an organization needs or “whatever data processing tasks,” noting that “Databricks seems to be the way to go and the way to run your models for any of your analysis.” He adds that he “sees them getting bigger and bigger in the space” and taking on Snowflake and the big three cloud platforms for data warehousing, too. Likewise, Snowflake is moving into the ML/AI space, and ETR will begin tracking Snowflake in this sector beginning with the next TSIS cycle.

Finally, in this class of ML/AI tools are vendors like Hugging Face. Hugging Face showcases the power of its community and the wealth of resources community members are willing to share with each other, from fully trained models to data sets. In an ETR Insights interview, a data scientist from a large financial services enterprise said “Hugging Face is a must in NLP. Their models, the pre-trained models, are extremely helpful. An amazing product based on [a technique for NLP pre-training] BERT, for mostly NLP purposes in using neural networks.” With Hugging Face, the aim is for businesses not to have to reinvent the wheel when it comes to ML/AI and realize a quicker time to business value.

Conclusion: ML/AI Tools Suited to Different Organizational Needs

As with any enterprise IT tool market, ML/AI tools have evolved to fit the needs of a variety of organizations that are all maturing their advanced analytics initiatives at different rates. What is clear, however, is that the early stages of exploration in ML/AI have, for many organizations – and especially larger enterprises – reached a point of widespread enterprise-scale adoption, requiring robust end-to-end data architectures and sophisticated management and governance programs. Smaller organizations or those just now dipping their toes in the waters of data science are finding early value in free, open-source offerings or tools with pre-built, pre-trained ML models ready to accelerate time to business value. With the broad awareness of generative AI technology in the last year, vendors and organizations alike are scrambling to imagine new possibilities with AI to leverage the power of LLMs.

To respond to these organizational stages, the ML/AI vendor landscape is dotted with big players offering full MLOps capabilities for mature data science operations, with large public cloud players like Microsoft, AWS, and Google leading the way alongside open-source stalwarts TensorFlow and Anaconda and popular open libraries and languages like Jupyter Notebook, Python, and R. Other vendors have built reputations for strength in particular areas, like Databricks and Dataiku, which offer sophisticated data management setups that feed data science use cases as well as more line-of-business-focused reporting and self-service business intelligence use cases. A separate set of vendors aim for democratizing data science, with off-the-shelf, turnkey ML solutions that are alluring to business executives and less technically savvy business users.

View full post