Top Takeaways from Databricks Data and AI Summit 2024 (2024)

Top Takeaways from Databricks Data and AI Summit 2024 (1)

The DataBricks Data + AI Summit is a conference for people who like to code – we saw many lines of code in the keynote presentations. They are proud of their open source work, boasting 12m lines of open source code, and of their academic connections, highlighting speakers such as Yejin Choi (University of Washington) and Fei-Fei Lei.

Attendance was very high: 16k people in person and 60k watching sessions online. The sessions, keynotes, and expo hall very invariably packed out. Generative AI took centre stage, being the focus of much of the keynotes and many track sessions, including multiple live demos within the keynote, always a risky proposition with a non-deterministic technology.

Core announcements from the summit

Databricks AI/BI - A low-code AI-driven dashboard creation and Genie, a conversational query interface, the heart of which was driven by having gen AI able to query theUnity Catalog to understand what data was available and the context behind it.

Shutterstock Image AI - Image generation by models trained upon Shutterstock’s library of stock pictures. An unusual complement to tools in a data and analytics space, but as organizations increasingly use LLMs for creative text generation, the ability to generate images and logos becomes a natural complementary capability.

The heart of what Databricks was offering however came from therecent Mosaic acquisition featuring a suite of capabilities to support data teams in the use of large language models including:

  • AI Agent Framework:building tools to enable gen AI systems to perform tasks such as querying enterprise data, raising customer support tickets, filing JIRA issues or even executing code snippets. Each agent is allowed a single very tightly defined activity that the LLM process can invoke.
  • AI Model Training: a zero-code way to fine-tune models with an organisation’s own data.
  • AI Agent Evaluation: scoring AI outputs on quality, collecting feedback, and diagnosing model issues.
  • AI Gateway: to manage and control the availability of LLMs and allow switching between LLMs with minimal effort and risk
  • AI Tool Catalog: to govern and share AI tools, tied in with the Unity Catalog

Top Takeaways from Databricks Data and AI Summit 2024 (2)

Fig 1: example of Chain of Thought Reasoning process with agents

Top Takeaways from Databricks Data and AI Summit 2024 (3)

Fig 2: end result of the combined process - slack message with cookie image

The keynotes understandably covered these launches in detail but also highlighted that Databricks sees Small Language Models as an important technology for the future, highlighting an alliance with NVIDIA (and hence a guest star appearance from Jensen Huang, reveling in the role of most popular CEO in the world) to provide the compute power needed to allow customers to train their own small models. They were proud to boast that Databricks customers had tuned over 200,000 custom AI models in the last year.

For me, the highlight of the keynote was the section from Yejin Choi (University of Washington). Professor Choi’s section of the keynote covered her research on how to build an effective small language model, starting with older, smaller models, such as GPT2 to generate training sets of data, used to teach a next-generation model within a specialised space.

The ultimate result was a small and performant model that produced a high success rate for a single, specialised task. At Matillion, we look forward to exploring how we can support customers in the use of fine-tuned small language models over the next few months.

The other major Gen AI takeaway from the keynote presentations was the results of Databricks’ customer survey that revealed 85% of AI projects are yet to make it into production. This may well be a reflection of the fact that while it is easy to stand up an interesting POC using a python notebook, turning these into robust production processes is much harder. True Democratisation of AI will require giving domain subject matter experts the ability to configure it and build it into production-level business processes – without those SMEs needing to also learn advanced python skills.

Track sessions also reflected a strong focus on Gen AI, with in-depth technical deep dives on the growing ecosystem of skills and technologies needed in this space. These included the complexities of ingesting unstructured data, fine-tuning models to use alongside agents, retrieval augmented generation for marketing purposes, LLM Ops, feedback collection, and how GenAI can accelerate innovation. There is a lot here to master for any company that wishes to take advantage of what is still a very new and rapidly growing field.

At our booth, traffic was relentless. We were asked for demo after demo of our recently released capabilities to call a large language model inside a data pipeline – where Databricks native Gen AI functions or calls to models provided by Azure, OpenAI, or AWS Bedrock. It was fantastic to see so many folks realise the potential of using LLMs within a data pipeline to do classification, sentiment analysis, summarisation, and translation at scale without needing extensive coding skills.

Elsewhere at the summit, we saw more pieces of the ecosystem, such asUnstructured.io’s capabilities to handle ingestion of many and varied formats and how Neo4J is pioneering the use ofknowledge graphs within Retrieval Augmented Generation as an alternative to vector databases. We await with interest to see if this method will improve context relevance and hence LLM accuracy.

Specialist providers such asSkyflow addressing data privacy via API and consultancy vendors such asSlalom,rearc, and many more are increasingly focusing on implementing AI projects for their clients.

Finally, the prize for most eye-catching promotional event goes toDuck DB, for taking their name and logo extremely literally:

Top Takeaways from Databricks Data and AI Summit 2024 (4)

The Databricks Data + AI Summit 2024 showcased a vibrant and dynamic data landscape – emphasizing the convergence of open-source innovation, academic excellence, and practical AI applications. The summit underscored the community's enthusiasm and commitment to pushing the boundaries of data and AI technologies – inspiring leaving attendees to harness these technologies for their own use cases.

Top Takeaways from Databricks Data and AI Summit 2024 (2024)

References

Top Articles
Latest Posts
Article information

Author: Cheryll Lueilwitz

Last Updated:

Views: 6213

Rating: 4.3 / 5 (54 voted)

Reviews: 93% of readers found this page helpful

Author information

Name: Cheryll Lueilwitz

Birthday: 1997-12-23

Address: 4653 O'Kon Hill, Lake Juanstad, AR 65469

Phone: +494124489301

Job: Marketing Representative

Hobby: Reading, Ice skating, Foraging, BASE jumping, Hiking, Skateboarding, Kayaking

Introduction: My name is Cheryll Lueilwitz, I am a sparkling, clean, super, lucky, joyous, outstanding, lucky person who loves writing and wants to share my knowledge and understanding with you.