Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

Sorry, you do not have permission to ask a question, You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please type your username.

Please type your E-Mail.

Please choose an appropriate title for the post.

Please choose the appropriate section so your post can be easily searched.

Please choose suitable Keywords Ex: post, video.

Browse

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

Querify Question Shop: Explore Expert Solutions and Unique Q&A Merchandise

Querify Question Shop: Explore Expert Solutions and Unique Q&A Merchandise Logo Querify Question Shop: Explore Expert Solutions and Unique Q&A Merchandise Logo

Querify Question Shop: Explore Expert Solutions and Unique Q&A Merchandise Navigation

  • Home
  • About Us
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • About Us
  • Contact Us
Home/ Questions/Q 5538

Querify Question Shop: Explore Expert Solutions and Unique Q&A Merchandise Latest Questions

Author
  • 60k
Author
Asked: November 27, 20242024-11-27T10:12:08+00:00 2024-11-27T10:12:08+00:00

How Modern SQL Databases Are Changing Web Development – #4 Into the AI Era

  • 60k

SQL databases have been powering the web since their inception. The combination of strict schema, ACID transactions, and strong integrity makes it still the best storage choice for many apps even today. Database systems, albeit sounding unfathomable and dull, have been a highly vibrant domain in the past few years. Lots of talents are devoted to it, open source projects are created one after another, companies have been striving to build profitable businesses, and VCs have been making bets on those who look like the next unicorns.

Under the flourishing surface, the new generation of modern SQL databases is renovating this 50-year-old technology and gradually reshaping how we build web applications – what we do with it, how we manage it, and the way we program against it. This series of articles tries to explore this topic from multiple fronts. Now we've come to the last part of it. Let's get hold of the hottest trend of the year – AI, and see how modern databases enable easier development of AI-powered applications.

LLM? Not So Fast!

Before ChatGPT showed the world how powerful a large language model can be, AI, or more precisely, machine learning, has been there for over 60 years. It has had its ups and downs and has made excellent progress in many domains, including computer vision, natural language processing, speech recognition, etc.

Unlike GPT, traditional machine learning methods often require training custom models for solving specific problems. While ML practitioners like talking about learning algorithms, the truth is that most of their time is spent on data wrangling, and this gets SQL databases into the picture. Preparing data for model training requires heavy preprocessing of raw data: eliminating noise, filling missing values, merging multiple pieces of data, normalizing values, extracting features, and so on. Filtering, joining, and transforming data are the tasks that databases are most adept at.

Some excellent tools were created to represent these tasks “naturally” in SQL and even let most of the computation happen inside the database. PostgresML is a great example. It's built above PostgreSQL and provides a set of functions that allow you to train and use machine learning models with SQL. Here's how you can train a classification model for the classic handwritten digit recognition problem:

  1. Loading a dataset as a table

    SELECT * FROM pgml.load_dataset('digits'); 
  2. Training a classifier using “xgboost” algorithm

    SELECT * FROM pgml.train(     'Handwritten Digit Image Classifier',     algorithm => 'xgboost',     'classification',     'pgml.digits',     'target' ); 
  3. Making predictions using the trained model

    SELECT     target,     pgml.predict('Handwritten Digit Image Classifier', image) AS prediction FROM pgml.digits  LIMIT 10; 

Although such a model can also be easily trained with other programming languages, PostgresML makes it possible to get the job done without the data ever leaving your database. Plus, if you're a SQL lover and proficient in preparing the training data using it, you can have your entire preprocessing-training-predicting pipeline written in SQL.

LLM & RAG: A Bit of Background

Fast forward to the post-GPT era, and many people equate AI to LLM today. To understand the database's role in this new context, let's first have a quick review of the landscape of LLM.

LLMs are pre-trained machine learning models specialized for generating text output based on text input. They are called “large” because the models have a very large number of internal parameters and are trained with a very large amount of data to ensure adequate performance. Because the models are pre-trained, developers can use them directly without worrying about the training process. Instead of fighting with feature engineering and training algorithms, they now focus on prompt engineering.

However, being pre-trained also means that the models are not tailored to any specific task. One “deep customization” technique is called Retrieval Augmented Generation (RAG). Suppose you have a knowledge base of documents for a domain; adopting RAG involves the following steps:

  • Calculate embeddings for each document (using an existing model or APIs like OpenAI).
  • Store the embeddings in a database that allows fast similarity-based search.
  • Given a user query, calculate its embedding and find the most similar documents in the database.
  • Use the matching document to augment the input query as additional context and feed it to the LLM.

Embeddings are vector representations of text data.

Retrieval Augmented Generation

As you can see, databases still play an essential role in this architecture, but in a very different way.

Vector Search

A RAG implementation's quality and performance highly depend on the similarity-based search of embeddings. The challenge arises from the fact that embeddings are usually high-dimensional vectors, and the knowledge base may have many documents. It's not surprising that the popularity of LLM catalyzed the development of specialized vector databases like Pinecone and Weaviate. However, SQL databases are also evolving to meet the new challenge.

PostgreSQL's pgvector extension is probably the most widely used SQL solution for storing and searching vector data today. The extension introduces a “vector” type specialized for storing high-dimensional vector data. It allows you to create vector indices (in “IVFFlat” or “HNSW” format for different indexing/searching performance tradeoffs) and leverage them to do various types of similarity searches.

-- Create a table with a vector column CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3));  -- Insert vectors INSERT INTO items (embedding) VALUES ('[1,2,3]'), ('[4,5,6]');  -- Create an "HNSW" index for L2 distance search CREATE INDEX ON items USING hnsw (embedding vector_l2_ops);  -- Get the nearest neighbors by L2 distance SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5; 
Enter fullscreen mode Exit fullscreen mode

SQL databases are not explicitly optimized for vector searches; benchmarks show it's usually slower than vector databases. However, there are still good reasons to use them:

  • You can filter with both vector and non-vector data simultaneously.
  • You can combine full-text search and vector-based search.
  • You can have the matching document's content directly returned in the search result without making a separate fetch request.
  • You can leverage the database's security mechanisms (roles, row-level-security, etc.) to segregate data across tenants.

SQL as the Facade of LLM Workflows

Besides storing and searching vector data, there are also other remarkable innovations that let you describe your entire LLM workflows using SQL – conceptually similar to how end-to-end machine learning tasks are done with PostgresML.

Mindsdb is a good example. It abstracts everything related to an AI workflow as “virtual tables”. For example, you can import OpenAI API as a “virtual table”:

CREATE ML_ENGINE openai_engine FROM openai USING     api_key = 'your-openai-api-key';  CREATE MODEL my_openai_model PREDICT answer USING     engine = 'openai',     question_column = 'question',     model_name = 'openai_model_name'     api_key = 'YOUR_OPENAI_API_KEY; 
Enter fullscreen mode Exit fullscreen mode

Then query it to get answers to questions:

SELECT question, answer FROM my_openai_model WHERE question = 'Where is Stockholm located?'; 
Enter fullscreen mode Exit fullscreen mode

And get back result like:

question answer
Where is Stockholm located? Stockholm is located in Sweden.

It's essentially a platform for integrating data, APIs, and LLMs and using SQL to declaratively orchestrate them to create high-level workflows like RAG, chatbot, text summarization, etc.

Wrap Up

AI is everywhere. It's encouraging us to rethink every aspect of software development, and database is no exception. We should expect to see SQL databases – one of the most time-tested supporting pillars of software engineering, continue exploring this new frontier and being reshaped along the way.


ZenStack is our open-source TypeScript toolkit for building high-quality, scalable apps faster, smarter, and happier. It centralizes the data model, access policies, and validation rules in a single declarative schema on top of Prisma, well-suited for AI-enhanced development. Start integrating ZenStack with your existing stack now!

aidatabasesqlwebdev
  • 0 0 Answers
  • 4 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

Sidebar

Ask A Question

Stats

  • Questions 4k
  • Answers 0
  • Best Answers 0
  • Users 2k
  • Popular
  • Answers
  • Author

    Insights into Forms in Flask

    • 0 Answers
  • Author

    Kick Start Your Next Project With Holo Theme

    • 0 Answers
  • Author

    Refactoring for Efficiency: Tackling Performance Issues in Data-Heavy Pages

    • 0 Answers

Top Members

Samantha Carter

Samantha Carter

  • 0 Questions
  • 20 Points
Begginer
Ella Lewis

Ella Lewis

  • 0 Questions
  • 20 Points
Begginer
Isaac Anderson

Isaac Anderson

  • 0 Questions
  • 20 Points
Begginer

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help

Footer

Querify Question Shop: Explore Expert Solutions and Unique Q&A Merchandise

Querify Question Shop: Explore, ask, and connect. Join our vibrant Q&A community today!

About Us

  • About Us
  • Contact Us
  • All Users

Legal Stuff

  • Terms of Use
  • Privacy Policy
  • Cookie Policy

Help

  • Knowledge Base
  • Support

Follow

© 2022 Querify Question. All Rights Reserved

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.