Hosts:
Dheeraj Pandey - Co-Founder & CEO of DevRev, former CEO of Nutanix
Amit Prakash - Co-Founder & CTO of ThoughtSpot, former engineer at Google and Microsoft
Summary
In this episode, hosts Dheeraj Pandey and Amit Singh embark on a comprehensive exploration of the analytics landscape. With nearly two decades of experience bridging data and design, they delve into everything from analytics' evolution and real-time data challenges to the influence of large language models (LLMs) and the future of AI workloads. They examine today’s leading players—Snowflake, Databricks, Google, and Microsoft—and reflect on analytics, SaaS, and the future of data processing.
Key Takeaways
The State of Analytics: Analytics has evolved, yet data teams are “just getting by.” LLMs, though transformative, are not yet fully exploited in the analytics space.
Data is the Lifeblood of AI: In analytics and AI, data pipelines are critical to achieving insights, similar to how blood is essential for brain function.
Snowflake vs. Databricks: Snowflake leads in SQL-based data at rest; Databricks excels in data in motion and ML/ETL operations. The industry is trending toward open-source solutions like Iceberg to reduce vendor lock-in.
Impact of Open Source and Cost Efficiency: Open-source formats, notably Iceberg, allow enterprises to maintain flexibility and reduce costs, although this neutralizes some of Snowflake’s advantages.
The Shift to Cloud: Moving to cloud analytics impacted performance expectations and cost structures. The ability to decouple storage from compute was crucial for adapting analytics tools to the cloud.
SaaS 2.0 Transformation: In SaaS 2.0, platforms integrate analytics, search, and AI capabilities, abstracting complexity from users while enhancing productivity.
Visualization Matters: Companies like Microsoft, Google, and Salesforce are leading the way in data visualization, albeit at varying degrees of innovation and integration.
The Future of Analytics & AI: AI workloads are rapidly evolving, but the winning compute solution will come from whoever best adapts to real-time, unstructured, and conversational data needs.
In-Depth Insights
Data at the Core of AI
Dheeraj likens data to blood in the human body: essential for AI. Without a robust data flow, AI and analytics lack the resources to generate meaningful intelligence. "Data-driven" and "data-centric" AI underscore the importance of well-structured data pipelines for effective, scalable AI solutions.
Analytics Giants: Snowflake vs. Databricks
Strengths and Limitations: Snowflake, originally SQL-focused, excels with data at rest and a robust query engine. Databricks, with roots in ETL and ML prep, leads in data-in-motion processes.
Iceberg’s Open-Source Influence: Both companies are now adapting to open-source data formats, such as Iceberg, as client demand for interoperability and cost management grows.
Moving from Legacy to Cloud
Dheeraj and Amit discuss how the shift to cloud-based architectures affected analytics products like ThoughtSpot and Tableau. The cloud increased costs, especially for RAM, shifting the architectural focus from speed to cost-effective compute-storage balance.
Snowflake’s approach of separating storage and compute enabled cost efficiency, impacting ThoughtSpot's transition and re-architecting Falcon, its in-memory database.
Visualization as a Strategic Asset
Visualization is now a critical part of completing the analytics picture. Microsoft’s Power BI and Google’s Looker are positioned for businesses as a strategic solution; however, Microsoft's bundling and Google’s thin analytics layer challenge Snowflake and Databricks, which still lack integrated visualization.
Amit notes Tableau’s initial success due to innovative visualization and in-memory analytics but comments on its limited progress post-Salesforce acquisition.
Challenges in SaaS 2.0 Architecture
DevRev exemplifies the SaaS 2.0 model by incorporating analytics, search, and workflows directly into its platform, offering an integrated experience. SaaS 2.0 challenges traditional software by eliminating the need for customers to build separate search or analytics engines.
DevRev’s “micro-tenancy” design allows scalability from startups to enterprises. This “micro-tenant” model fosters flexibility for startups needing low-cost, resource-efficient access and larger enterprises requiring extensive functionality.
AI’s Role in the Analytics Landscape
The future of AI analytics remains uncertain, but real-time processing, conversational data, and unstructured search capabilities are predicted to define the next wave of innovation.
Current challenges include high compute costs and undefined ROI. The conversation shifts towards “control planes” over “data planes,” where the competition may hinge on who offers the best MLOps infrastructure and adaptability to conversational AI.
Host Biographies
Amit Prakash
Co-founder and CTO at ThoughtSpot, previously at Google and Microsoft. Amit has an extensive background in analytics and machine learning, holding a Ph.D. from UT Austin and a B.Tech from IIT Kanpur.Dheeraj Pandey
Co-Founder and CEO of DevRev, and former CEO of Nutanix. Dheeraj has led multiple tech ventures and is passionate about AI, design, and the future of product-led growth.
Episode Breakdown
{00:00:00} Intro and Catch-Up: Dheeraj and Amit kick off the episode with reflections on recent travels and discuss episode topics.
{00:02:00} Why Analytics?: They explore the intersection of data and design, with Dheeraj likening data’s role in AI to blood in the body.
{00:03:10} Current State of Analytics: Amit shares thoughts on how analytics has evolved, but challenges still remain.
{00:05:16} Snowflake vs. Databricks: They discuss the origins of both companies, strengths, and the SQL vs. data-in-motion contrasts.
{00:10:06} Iceberg and Open-Source: The discussion shifts to Iceberg’s open-source impact and the potential for decoupling storage from compute to reduce costs.
{00:12:20} Data Layer Optimization: Challenges Snowflake and Databricks face with the industry moving toward open formats like Iceberg.
{00:15:50} The Importance of Unstructured Data: How companies handle unstructured data, and its implications for compute and AI.
{00:17:03} AI Ecosystem: Dheeraj and Amit compare OpenAI, Anthropic, and Meta’s Lama in the context of centralized vs. decentralized AI models.
{00:20:00} AI Workloads: The evolving AI workload landscape and how companies are still finding their way with real-time analytics.
{00:23:10} Visualization Tools & Microsoft Power BI: They discuss the strengths and shortcomings of Power BI, Microsoft’s distribution, and its positioning against Tableau.
{00:27:00} Looker’s Success with Developers: The impact of Looker’s LookML programming model, which made it a hit with developers and unique in analytics
{00:30:00} Google’s Integration Challenges: Looker’s adaptation struggles post-Google acquisition and the challenges of merging with Google’s tech stack
{00:33:00} Snowflake and Databricks’ Visualization Gaps: Potential for ThoughtSpot to fill the gap in data visualization tools for these analytics giants
{00:34:10} Tableau’s Early Innovations: Discussing Tableau’s origin story, in-memory analytics, and how its pace of innovation slowed post-Salesforce acquisition.
{00:36:55} SaaS 2.0 and Micro-Tenancy: DevRev’s approach to integrated analytics and multi-tenancy at scale, making analytics accessible for both startups and enterprises.
{00:42:30} ThoughtSpot’s Move from Falcon: Reflecting on the cost and complexity challenges of moving ThoughtSpot’s in-memory Falcon architecture to the cloud.
{00:45:13} Real-Time Analytics in SaaS 2.0: Dheeraj elaborates on how DevRev prioritizes an integrated analytics experience without relying on standalone data warehouses.
{00:49:00} Latency Trade-Offs in Cloud Analytics: Amit reflects on the balance between real-time performance and cloud cost efficiency.
{00:51:00} Future Vision of Analytics as a Search Problem: They discuss how analytics, search, and workflows may converge in a future of conversational AI
{00:52:50} Outro and Next Episode Preview: Dheeraj previews an upcoming episode with Amit Ganesh, a VP at Google Cloud and an influential figure in database development.
Resources and References
Snowflake
A cloud data platform specializing in data warehousing, SQL query optimization, and scalability for enterprise analytics.
Learn more about Snowflake
Databricks
Unified analytics platform with origins in Spark, offering solutions for data engineering, machine learning, and AI workflows.
Learn more about Databricks
Iceberg (Apache)
An open-source table format for huge analytic datasets that helps remove vendor lock-in and improve query optimization.
Explore Apache Iceberg
Tableau
A leader in visualization-first BI tools, now part of Salesforce, known for its intuitive dashboards and in-memory analytics.
Learn more about Tableau
Power BI
Microsoft's business analytics tool, integrated with its Office 365 suite, designed for enterprise dashboarding and reporting.
Learn more about Power BI
Looker
Google Cloud’s enterprise BI platform, built on LookML for SQL generation and tailored for modern cloud warehouses.
Learn more about Looker
ThoughtSpot
Search-driven analytics platform enabling users to generate insights using natural language and AI.
Learn more about ThoughtSpot
BigQuery
Google Cloud’s data warehouse platform designed for big data analytics with fast SQL query execution.
Learn more about BigQuery
Redshift
Amazon Web Services’ cloud data warehouse solution optimized for massive parallel query execution.
Learn more about Redshift
Retrieval-Augmented Generation (RAG)
An emerging AI approach combining retrieval systems and large language models for contextual and accurate information generation.
Introduction to RAG on Hugging Face
Falcon Database
ThoughtSpot’s proprietary distributed in-memory database designed to power sub-second analytics on large datasets.
Discover ThoughtSpot’s Falcon Technology
Microsoft Fabric
A unified SaaS solution integrating Power BI, Azure Synapse, and other Microsoft services for data engineering and analytics.
Learn about Microsoft Fabric
This episode highlighted the complex, evolving world of data analytics, driven by increasing integration of AI and cloud-native capabilities. Dheeraj and Amit underline how the analytics landscape demands adaptability as companies face choices between proprietary and open-source tools, single-tenant and multi-tenant models, and the integration of real-time processing. Their reflections on SaaS 2.0 and AI-driven analytics hint at a future where data and workflows converge, enhancing speed, personalization, and accessibility in enterprise systems.
Share this post