The Observatory
The plotting of vendors and products across the Observatory Scope is supported wholly by ETR’s exclusive market intelligence and spending intentions data sets (
see Figure 1). Vendors were spread fairly evenly across the vectors. Occupying the Leading vector, which represents vendors with high Momentum and Presence, are public cloud vendors Microsoft and Amazon and their cloud data warehousing products Azure Synapse and Redshift, respectively. Databricks and Snowflake round out the Leading vector.
The Advancing vector represents vendors with relatively high Momentum, but lower Presence. In this vector, and just shy of crossing over into the Leading vector based on Presence, is the third major public cloud player, Google, with its data warehousing tool BigQuery. Also in this vector are Cloudera and SAP Data Warehouse Cloud.
The sole vendor in the Tracking vector, which has relatively high Presence but lower Momentum, is Oracle. And in the Pursuing vector are vendors with relatively lower Momentum and Presence. Pursuing vendors are OpenText, Teradata VantageCloud, and IBM.
It is critical to note again that ETR’s positioning is based wholly on survey responses from IT decision makers with direct utilization and knowledge of Cloud Data Warehouse vendors. The plotting does not reflect, nor does it intend to opine on, the efficacy of these tools and vendors. For more information,
check out the full Observatory methodology.
This report will first break down the overall spending intent Net Score for these Cloud Data Warehouse vendors and will then analyze each of the four Observatory Scope vectors and the vendors in more detail in the following sections; however, the best way to view this data is through the full ETR Market Array, which is available on the ETR Research platform.
Marketplace Overview: Evaluating User-Level Metrics
ETR’s Market Array for Cloud Data Warehouses surveyed 324 IT decision makers. More than two-thirds (69%) represent Large enterprises (1,200 employees or more), and the remaining 31% of respondents are split roughly between Midsize and Small organizations. The most representative industry verticals are Financials/Insurance, Services/Consulting, and IT/TelCo, collectively accounting for more than half of the sample. Geographically, 76% of respondents are located in North America.
Figure 2. ETR’s Market Array spending Net Score for Cloud Data Warehouses was derived from a survey of 324 IT decision makers with direct utilization and evaluation knowledge of the specific tools listed.
Figure 2 (above) shows spending intentions from the Market Array for cloud data warehouse vendors, with a black line noting Net Score, or ETR’s proprietary metric for tracking overall spending momentum. Teradata VantageCloud has the highest overall Net Score in this analysis at 63%, albeit on the lowest citation base in this vendor set (N=16). Despite its high Net Score, Teradata VantageCloud’s low Presence in the survey keeps the vendor firmly in the Pursuing vector of the Observatory Scope. Microsoft Azure Synapse is close behind with a 62% Net Score, followed by Databricks (59%) and Snowflake (58%). These high Net Scores, coupled with high Presence in the survey, position these vendors in the Leading vector.
Next in line is Google BigQuery with a 56% Net Score, though its relatively lower Presence compared to other Leading vector vendors fixes it just inside the margins of the Advancing vector. Amazon Redshift’s 48% Net Score coupled with a relatively high Presence in the survey rounds out the Leading vector vendors.
Interestingly, Teradata VantageCloud, OpenText, and Cloudera have the highest Adoption rates in the survey at 25%, 14%, and 13%, respectively. Teradata VantageCloud, Google BigQuery, OpenText, and IBM all have a Replacement rate of zero. Elevated rates of negative spending plans (Decrease and/or Replacement indications) offset the positive Momentum of IBM and Oracle, which have the lowest Net Scores in the survey at 12% and 11%, respectively.
Snowflake and Databricks on a Collision Course
Snowflake and Databricks have been prominent vendors in the enterprise tech space for several years now, both with some of the highest Net Scores across the entire universe of vendor coverage in ETR’s quarterly Technology Spending Intentions Survey. Increasingly over the past few years in particular, the two vendors appear to be jockeying for the same market share in both the data warehousing space and the ML/AI market. Snowflake first made a splash as a powerful cloud data warehouse, drawing a large roster of enterprise customers and certainly influencing the technical roadmaps and broader strategy of other data warehousing vendors in the market. As it has evolved, Snowflake has extended its technical reach and made strategic acquisitions to offer more sophisticated data management capabilities, analytics and applications offerings, and ultimately ML/AI tools. Databricks, on the other hand, started as an ML/AI platform and then developed a data lakehouse offering to support the data needs for those advanced analytics and data science projects. Ever since, Databricks has been extending its reach into the data warehousing market. Indeed, many organizations are committed to jointly using the two vendors across a mix of data warehousing and ML/AI use cases, while other organizations are beginning to figure out which one they will invest more in going forward.
In the ETR Market Array for Cloud Data Warehouses, the same percentage of respondents (66%) said they expected a return on investment (ROI) within the first three years for both vendors. In terms of expected length of use, again the two vendors are neck-and-neck. Nearly two-thirds (66%) of respondents say they plan to use Databricks for at least three more years, and 64% said the same for Snowflake. However, 29% of respondents anticipated at least a five-year commitment to Databricks, compared to 24% for Snowflake. A separate July 2024 ETR Drill Down Survey of Snowflake and Databricks’ joint customers found that Databricks had a slight edge when respondents were asked their future plans for consolidating on one of the vendors over the other. While 44% had no plans to change how they use the two vendors, 24% said they planned to expand their Databricks use and reduce Snowflake, compared to 17% who said they planned to expand Snowflake and reduce Databricks. And 4% said they planned to completely shift to Databricks and phase out Snowflake, compared to just 2% who said the inverse. Furthermore, Snowflake and Databricks are the only two vendors with positive Net Promoter Scores, which reflect overall percentages of “promoters” (a 9 or 10 on a 10-point scale to recommend the product to a colleague) minus percentages of “detractors” (a 6 or lower score). Databricks has a 5% Net Promoter Score and Snowflake has a 1%.
Commentary from IT decision makers in ETR Insights interviews sheds some light on how some see the differences between these two vendors. A Senior Vice-President of Global Infrastructure for a large financials enterprise said one difference between the two vendors is that of user-friendliness versus power. “I think both tools have come at it from the same angle of, ‘Hey, the traditional SQL data warehouse, it’s kind of gone. It doesn’t work. We need the data lakes. We need all this stuff.’ But I think they came at it with two different approaches. Snowflake came at it as, ‘let’s make it more user-friendly.’ Databricks said, ‘We’re going to make it really powerful and do a lot of stuff,’ but it’s a little more complex.”
The CTO of a small tech enterprise echoed the ease-of-use sentiment for Snowflake: “We like the platform. It’s very easy to use. The analytic outputs are great. I think it’s a wonderful tool.” The Director of Data Science for a large business and tech services enterprise, however, agreed with its ease of use but pointed to Snowflake’s cost: “Snowflake can get really expensive, really fast. So we stayed away from Snowflake.”
When it comes to individual product strengths for the two vendors, Databricks had the highest level of agreement that it has an innovative technical roadmap (81%), tied at the top with Microsoft Azure Synapse, while Snowflake ranked fourth with 74% of respondents in agreement. To the statement “this product does everything I expect a cloud data warehouse to do,” Snowflake has the second-highest level of agreement (80%), tied with Microsoft Azure Synapse and trailing only Google BigQuery; Databricks is close behind in fourth at 79%.
Finally, respondents were asked to provide open-ended responses to which vendors they felt were most innovative and which they would most prioritize is given the opportunity to rebuild their cloud data stack. Snowflake was the most innovative vendor in this market overall, and Databricks was the fourth most innovative. For the vendor respondents would most prioritize, also known as most desired vendor, Snowflake came in third and Databricks in fourth, trailing Microsoft Azure Synapse and Amazon Redshift. Many organizations desire multiple data warehousing solutions, and combinations of both Snowflake and Databricks alongside offerings from cloud giants are common. As the Head of Data Science and Analytics for a large healthcare and manufacturing enterprise remarked, their embrace of Snowflake, Databricks, and AWS serves an array of purposes: they seek to “land a lot of things in Snowflake” as an anchor in their data and analytics pipeline, leverage AWS’s “good solutions which are custom for a manufacturing plant’s needs,” and capitalize on the “flexibility and robustness that Databricks offers.”
The Cloud Giants – Microsoft Azure Synapse, Amazon Redshift, and Google BigQuery
The three public cloud giants prominently feature cloud data warehousing tools as part of a broader suite of product offerings in data and analytics, including data lakes, databases, business intelligence and reporting tools, and governance and security tools. Several factors explain why organizations may choose one or several data warehousing tools in their tech stack, but with Google, AWS, and Microsoft the calculus is often whether the organization already has an existing footprint with one or more of these cloud players. Microsoft customers in particular tend to consolidate tooling decisions around the 365 license and their commitment to Azure, which leads many organizations to make some use of Synapse in their data and analytics programs, if not entirely consolidate around Synapse. Likewise, customers with sizable AWS investments will see Redshift as a natural choice, and the same for Google customers.
Respondents in the Market Array report very fast ROI with these cloud players’ data warehousing offerings. Google BigQuery, Amazon Redshift, and Microsoft Azure Synapse have the three highest rates of ROI, with 70%, 68%, and 68% of respondents expecting ROI within the first three years, respectively (see Figure 3 below). Notably, Starburst (68%), Snowflake (66%), and Databricks (66%) also have very high rates of ROI within the first three years.
Figure 3. ETR’s Market Array expected ROI for Cloud Data Warehouses was derived from a survey of 324 IT decision makers with direct utilization or evaluation knowledge of the specific tools listed. This is a partial rendering of the ROI chart from the Market Array. Contact the ETR team for access to the full data set.
Despite the great ROI expectations for these cloud giants’ data warehousing tools and the extent to which cloud services become sprawled and embedded within organization’s tech stacks, the tools are curiously not all very sticky. Our Market Array finds the three vendors have very different levels of expected length of use for these cloud data warehouses. Microsoft Azure Synapse expected length of use is the second-best in the survey, with 69% expecting at least three more years of commitment to the tool. Google BigQuery, on the other hand, is fifth among vendors in the survey, with 62% indicating three or more years of use. Amazon Redshift is the second-worst in the rankings, with only 48% indicating three or more years of use. One Sr. Cloud Infrastructure Architect for a large manufacturing and tech enterprise noted “there’s a fear associated with moving your data warehouse into a specific platform. I think a lot of things follow suit that are tangent to some of that,” like being sucked in to a cloud vendor’s ML/AI tools or security tools. Certainly, organizations worry about becoming locked into a certain tech ecosystem, but at least based on these data, fears may be overblown.
Perhaps unsurprisingly, though, the cloud giants’ data warehouses are seen as easy to integrate with an existing ecosystem. The majority (85%) of respondents said Microsoft Azure Synapse integrated easily into an existing ecosystem, and 78% said the same of Amazon Redshift, the vendors with the most and second-most agreement for that particular product strength attribute. Microsoft Azure Synapse and Amazon Redshift also hold the two highest levels of agreement with the statement “technical professionals with relevant expertise for this product are available,” suggestive of wide availability of IT skills and training in the market for both the Microsoft Azure and AWS ecosystems. Google BigQuery had the fourth-highest agreement on this strength, too.
Finally, respondents in this Market Array seem satisfied with how innovative the data warehousing offerings are from these three cloud giants. All three are in the top-five ranking of agreement with the statement that “this product has an innovative technical roadmap,” with each vendor receiving more than 70% agreement on this product strength. Microsoft Azure Synapse, Amazon Redshift, and Google BigQuery also are the second-, third-, and fifth-most innovative vendors in the market based on open-ended responses. But if given the opportunity to prioritize one vendor if rebuilding a tech stack, Microsoft and AWS are the top write-in choices, suggesting an affinity for each vendor’s broader suite of product offerings and how they integrate seamlessly with one another, combined with an appealing technical roadmap spelled out for their respective data warehousing products.
A Slew of Alternatives
Beyond the five most dominant vendors covered up to this point in the report, the remaining vendors in the Market Array for Cloud Data Warehouses position themselves in the market with differing technological perspectives, diverse pricing and customer acquisition approaches, and often smaller market shares.
Some of these products are from long-established megavendors like SAP, Oracle, and IBM, or long-time data-focused vendors like Teradata. These vendors have historical strength in on-premises data management and have fleshed out cloud versions of these tools over time to keep up with a rapidly evolving cloud-focused market and to retain their existing customers by meeting their cloud needs. The perception has been that these vendors were lagging the cloud giants and Databricks and Snowflake but are now finally coming to parity with the capabilities available in these cloud competitors’ data warehousing offerings. These vendors are not seen by Market Array respondents as particularly innovative compared to other cloud-native vendors. For example, SAP, Oracle, IBM, and Teradata have some of the lowest agreement with the statement
“this product has an innovative technical roadmap,” with SAP Data Warehouse Cloud eighth-best out of 16 vendors and 62% of respondents agreeing with the sentiment. IBM, Oracle, and Teradata VantageCloud have notably lower levels of agreement that their product has an innovative technical roadmap, at 54%, 48%, and 43%, respectively.
“I think your traditional data vendors are going to have a hard time in the next couple of years,” said one CTO for a global tech conglomerate, calling out Teradata and Oracle in particular.
“I think the pace of innovation that’s happening in that space is going to be difficult for them.”
These vendors also have some of the lowest expected ROI, lowest agreement with the statement that “this product offers good value for the money,” and lowest agreement that “this product is easy to implement.” But, perhaps key to these vendors’ strategy, respondents have some of the highest agreement that these products are difficult to replace, with SAP Data Warehouse Cloud (67%) and Oracle (56%) having the highest levels of agreement with the statement. Further to point, a Director of IT for a large retail and consumer enterprise lamented that “it took us almost three years” to migrate away from a “huge installation” of Teradata, and they “still have small portions running on Teradata.”
Cloudera, Starburst, Panoply, Yellowbrick, and Dremio each offer slightly different perspectives on their data warehousing offerings. Cloudera is known for its flexibility in cloud, hybrid, and on-premises deployment and for its breadth of tools that it supports for a variety of workloads. Large organizations find Cloudera to be a stable cloud data warehouse product with strong governance and security capabilities. It lands in the middle of the pack on most measures in this Market Array, with 54% of respondents saying they expect at least three more years of use of the product and 59% saying they expect ROI within the first three years. One Vice-President and Manager of Data Analytics for a large financials enterprise also called out Cloudera for being “a fine cloud data warehouse solution, but not as easy to use,” which prompted his organization to move away from Cloudera and toward a major cloud platform’s product.
A relatively newer entrant to the market, Starburst is a data lakehouse product known for its ability to handle complex queries, with capacity-based pricing rather than consumption-based. Panoply is seen as easy to use with a lot of automation and features subscription pricing based on data processed. Yellowbrick is a massively parallel processing (MPP) SQL database known for its performance as well as cost control and is the sixth highest ranked in respondents seeing it as good value for the money. Dremio has its roots in data virtualization, and its positioning as a data warehouse today still pushes the idea of a unified semantic layer for analytics with easy connection to a wide array of data sources. Each of these vendors, however, receive some of the lowest levels of agreement with the statement
“this product integrates easily with our existing ecosystem,” ranging from 35% to 56% of agreement. Furthermore, these vendors also have some of the lowest levels of agreement that
“this product does everything I expect a cloud data warehouse to do,” indicating these tools may be the least robust in their capabilities, especially compared to market leaders like Snowflake, Databricks, and the cloud giants.