Recommendation algorithm with vector database

In this article, we’ll explore how to build a simple movie recommendation system using vector databases and Node.js. We’ll utilize pgVector, a PostgreSQL extension that enables scalable vector computations, to create a vectorized movie dataset and perform nearest neighbor searches to generate recommendations.First, let’s start by discussing the importance of vector databases and their advantages over traditional relational databases.

Why Vector Databases?

Traditional relational databases are great for storing structured data, but they fall short when it comes to handling large amounts of unstructured or semi-structured data. This is where vector databases come in.

Vector databases are designed to handle high-dimensional data, such as vectors and matrices, efficiently. They offer several benefits over traditional relational databases, including:

Scalability

Vector databases are optimized for handling large datasets and can scale horizontally by adding more nodes to the cluster. This makes them ideal for applications that require processing massive amounts of data.

Speed

Vector databases are designed for speed and can perform calculations much faster than traditional relational databases. They achieve this through the use of specialized hardware, such as GPUs or TPUs, and optimized algorithms.

Flexibility

Vector databases allow for various data types, including numerical, text, and image data. They also support various query languages, such as SQL, Python, and Rust.

A Simple Movie Recommender

First, let’s start by setting up the project and installing the required dependencies:

npm init -y
npm install express pg sequelize pgvector openai-api

Now, let’s create a new Express.js server and connect to our PostgreSQL database using Sequelize:

const express = require('express');
const Sequelize = require('sequelize');
const { Op } = Sequelize;
const app = express();

const sequelize = new Sequelize('movies', 'your_username', 'your_password', {
  host: 'localhost',
  dialect: 'postgres'
});

app.use(express.json());

const Movie = sequelize.define('Movie', {
  id: {
    type: Sequelize.INTEGER,
    primaryKey: true,
    autoIncrement: true
  },
  title: {
    type: Sequelize.STRING
  },
  description: {
    type: Sequelize.TEXT
  },
  genre: {
    type: Sequelize.STRING
  },
  director: {
    type: Sequelize.STRING
  },
  actors: {
    type: Sequelize.ARRAY(Sequelize.STRING)
  },
  releaseYear: {
    type: Sequelize.INTEGER
  }
}, {
  hooks: {}
});

app.get('/movies', async (req, res) => {
  const movies = await Movie.findAll({
    where: {
      genre: { [Op.like]: '%action%' }
    }
  });
  res.json(movies);
});

app.listen(3000, () => {
  console.log('Server listening on port 3000');
});

Next, we’ll add the pgvector plugin to our PostgreSQL database:

CREATE EXTENSION IF NOT EXISTS pgvector;

Now, we can create a vector field for our Movie model using the pgvector module:

const { Vector } = require('pgvector');

const movieVector = new Vector(sequelize, Movie, 'movie_vectors', {
  fields: ['title', 'description', 'genre'],
  dimension: 128
});

Here, we create a new vector field called movieVectors for our Movie model, using the fields option to specify the columns we want to include in the vector (in this case, title, description, and genre). We also set the dimension option to 128, which determines the number of dimensions in the vector space.To generate vectorized embeddings for our movies, we’ll use the OpenAI Embeddings API:

const openai = require('openai-api');

const api = new openai.Api('YOUR_API_KEY');

const movieTitles = ['The Shawshank Redemption', 'The Godfather', 'The Dark Knight'];

async function fetchEmbeddings() {
  const response = await api.encode({
    inputs: movieTitles,
    output_type: 'json',
  });

  const embeddings = response.data.map(( embedding ) => ({
    title: movieTitles[embedding.input_index],
    embedding,
  }));

  console.log(embeddings);
}

fetchEmbeddings();

here’s another direct approach we can use!

// Set up OpenAI client
const openaiClient = new openai.OpenAIClient(process.env.OPENAI_API_KEY);

// Define a function to get recommendations
async function getRecommendations(input) {
  // Get embeddings from OpenAI
  const embeddings = await openaiClient.embedding.get(input);

  // Calculate similarity between input and existing reviews
  const similarities = await calculateSimilarity(pool, embeddings);

  // Return top N recommended reviews
  return similarities.slice(0, 5);
}

// Define a function to calculate similarity between two vectors
async function calculateSimilarity(pool, vec1, vec2) {
  // Calculate cosine similarity between two vectors
  const similarity = Math.abs(vec1.dot(vec2));

  // Normalize similarity to get a score between 0 and 1
  similarity /= (Math.sqrt(vec1.length * vec2.length));

  return similarity;
}

// Endpoint to get recommendations
app.post('/recommendations', async (req, res) => {
  const input = req.body.input;
  const recommendations = await getRecommendations(input);
  res.json(recommendations);
});

In conclusion, implementing a recommendation system using vector databases and the OpenAI Embeddings API can be a powerful way to provide personalized suggestions to users. By leveraging the capabilities of natural language processing and machine learning, this approach allows developers to create intelligent systems that can understand and analyze text data, and make informed decisions based on it.The benefits of this approach are numerous.

Firstly, it enables the creation of recommendation systems that are not limited by the traditional constraints of collaborative filtering or content-based filtering methods. Instead, vector databases can capture the nuances and complexities of human language, allowing for more accurate and diverse recommendations.

Secondly, the use of pre-trained models like the OpenAI Embeddings API can significantly reduce the amount of training data required, making it easier and faster to develop and deploy these systems.

Finally, this approach can also improve the scalability and efficiency of recommendation systems, as vector databases can handle large amounts of data and perform calculations much faster than traditional relational databases.

However, there are also some challenges and limitations to consider when implementing this approach. One of the main challenges is the quality of the embeddings themselves, which can vary depending on the specific model used and the quality of the training data.

Additionally, there may be issues with dimensionality, where the number of features in the vector space becomes too high, leading to overfitting or decreased performance. Careful tuning and evaluation of the model can help mitigate these risks.

Despite these challenges, the potential benefits of using vector databases and the OpenAI Embeddings API for recommendation systems make it an exciting area of research and development. As the capabilities of natural language processing and machine learning continue to evolve, we can expect to see even more innovative applications of this technology in the future. Whether you’re a developer looking to build a cutting-edge recommendation system or a business looking to improve customer engagement, this approach is definitely worth considering.

Leave a comment

Your email address will not be published. Required fields are marked *