The Pulse #99: Relational databases here to stay as good fits for AI?
Also: $415M not enough for founders to stay at startup; France targeting NVIDIA with antitrust; a standout dev tools success story at GitLab, and more.
The Pulse is a series covering insights, patterns, and trends within Big Tech and startups. Notice an interesting event or trend? Send me a message.
Happy 4th of July to US readers; I hope you enjoy the long weekend. I’m taking a half-holiday here in Amsterdam, as my wife’s American. For that reason, it’s a shorter than usual edition of The Pulse, today. The full-length version returns next week!
In this issue, we cover:
Relational databases here to stay as good fits for AI?
$415M not enough for founders to stay at startup
France targets NVIDIA with antitrust regulation
Microsoft insiders don’t want to be “IT for OpenAI”
Figma to train on paying customers’ data by default
More job cuts at Microsoft
A standout dev tools success story: GitLab
Industry pulse
Relational databases here to stay as good fits for AI?
With the rise of large language models (LLMs,) vector database solutions are more relevant than before because embeddings are at the core of LLMs. An embedding is a vector that is a multi-dimensional representation of a token (basically a piece of text, image, or similar.) Operations like retrieval augmented generation (RAG) calculate the embedding of the input, and try to find previously stored embeddings (chunks of texts) in a vector database, so these vector databases are now very useful. We previously covered RAGs in more detail.
Lots of venture capital has flowed into vector database startups, with Pinecone one of the best-known cases, along with Chroma, Weaviate, and others.
The paper “What goes around comes around… and around,” was authored by Michael Stonebraker — who is a computer scientist (and currently a professor at MIT) with decades of experience in database systems: the cofounder of Ingres, Vertica and VoltDB, and the recipient of the 2014 Turing Award — and Andrew Pavlo — the cofounder of AI-powered SQL optimization startup, Ottertune, and an associate professor at Carnegie Mellon university. They analyzed the evolution of database management systems, and interestingly concluded that relational database management systems add vector support surprisingly rapidly, and that vector database systems must become more relational in order to stay competitive:
“After LLMs became “mainstream” with ChatGPT in late 2022, it took less than one year for several RDBMSs to add their own vector search extensions. In 2023, many of the major RDBMSs added vector indexes, including Oracle, SingleStore, Rockset, and Clickhouse.
There are two likely explanations for the quick proliferation of vector indexes. The first is that similarity search via embeddings is such a compelling use case that every DBMS vendor rushed out their version and announced it immediately. The second is that the engineering effort to introduce a new index data structure is small enough that it did not take that much work for the DBMS vendors to add vector search. Most of them did not write their vector index from scratch and instead integrated an open-source library (e.g., pgVector, DiskANN, FAISS).
We anticipate that vector DBMSs will undergo the same evolution as document DBMSs by adding features to become more relational-like (e.g., SQL, transactions, extensibility). Meanwhile, relational incumbents will have added vector indexes to their already long list of features and moved on to the next emerging trend.”
The paper is worth reading, and makes the compelling case backed by data that relational databases are here to stay. Their takeaway (emphasis mine):
“We predict that what goes around with databases will continue to come around in the coming decades. Another wave of developers will claim that SQL and the relational model (RM) are insufficient for emerging application domains. People will then propose new query languages and data models to overcome these problems. There is tremendous value in exploring new ideas and concepts for DBMSs (it is where we get new features for SQL.) The database research community and marketplace are more robust because of it.
However, we do not expect these new data models to supplant the relational model.”
I agree that SQL providers seem like a safe bet for the majority of computing tasks, including working with embeddings and vectors. Obviously, if there’s huge amounts of data or extremely specialized use cases, do some research and potentially use a different tool. Still, relational stores like PostgreSQL and MySQL have shown themselves to scale surprisingly well.
Thank you to this post on X by Jaromir Hamala, engineer at QuestDB, that surfaced this observation.
Thank you to Dan Goldin for pointing out that important and relevant context on Michael Stonebraker was missing originally.