The interest in large language models (LLMs) has surged by 1,310%, according to a new study by data and AI company Databricks.
The 2023 State of Data + AI report analyses anonymised usage data from over 9,000 global Databricks customers to comprehensively examine organisations’ data and AI initiatives.
Nick Eayrs, VP of field engineering at Databricks in Asia Pacific and Japan, said the historic surge of interest in LLMs since ChatGPT launched to the public late last year has made the topic inescapable.
LLMs are machine learning models that are very effective at performing language-related tasks such as translation, answering questions, chat and content summarization, as well as content and code generation.
“Not only is the technology improving at an unparalleled cadence, but companies are also building their own models like never before,” said Eayrs. “Now, predictive models are underpinning mission-critical tasks, giving organisations significant competitive advantage and allowing them to provide highly differentiated products and services.”
The latest report by Databricks uncovers where enterprises find themselves in this transformation, and the platforms and tools they are using to take advantage of it.
The hype around LLMs is real, considering that from the end of November 2022 to the beginning of May 2023, SaaS LLMs used to access models like OpenAI grew exponentially, with Lakehouse customers at 1,310%.
Transformer-related libraries like HuggingFace (an NLP toolkit and model hub), which are used to train homegrown LLMs and were in demand even before the launch of ChatGPT, grew 82% within the same time frame.
Data transformation and integration are more vital than ever. The fastest growing tools on Databricks are dbt (206% YoY) and FiveTran (181%).
Of the 10 most popular data and AI products, six are data integration tools, including Informatica and Qlik, which makes it the fastest-growing market on the Databricks Lakehouse.
Companies also consider open source. When looking at the most popular data and AI products, Microsoft Power BI and Plotly reign above the rest.
However, organisations are showing a strong pull to open technologies and four in five most popular data and AI products are based on open source software — including dbt, Hugging Face and GeoPandas.
Enterprises are doing more AI projects than ever before – and getting better at it. The number of models that are candidates for production (used in operations) grew 411% year-over-year, while the number of experimental projects grew 54%.
The company’s data also shows that, on average, one in three experimental models are a candidate for the real world, compared to one in five last year. This suggests that organisations are getting better at building and scaling these projects.
Further, the growth of AI should not discount traditional data analytics. Power BI was the most popular program running on top of the Lakehouse last year.
The Lakehouse is increasingly used for data warehousing, including serverless data warehousing with Databricks SQL, which grew 144% YoY.