Pgvector
11 TopicsScalable Vector Search with DiskANN - Available to all Azure Database for PostgreSQL
We’re thrilled to announce the public preview of DiskANN on Azure Database for PostgreSQL is now open! No sign-up needed — it's available to all Azure Database for PostgreSQL customers right now. Based on your valuable feedback from our initial release in October, we've supercharged DiskANN with parallel index build for improved performance, numerous bug fixes, and enhanced stability. DiskANN enables developers to perform highly accurate and efficient vector searches on large vector datasets, making it an ideal solution for scaling Generative AI applications. Try DiskANN today and elevate your AI projects to the next level! What is DiskANN? Developed by Microsoft Research and used extensively at Microsoft in global services such as Bing and Microsoft 365, DiskANN is an approximate nearest neighbor search algorithm designed for efficient vector search at scale. It provides high recall, high throughput, and low query latency essential for modern AI and RAG applications. Why use Azure Database for PostgreSQL with DiskANN Vector Index? Scalability: DiskANN is optimized for large datasets, making it ideal for handling millions of vectors. Accuracy: DiskANN uses iterative post filtering to enhance the accuracy of filtered vector search results without compromising on speed or precision. Low Latency: The DiskANN graph index construction makes it very efficient during search, minimizing the number of SSD reads to achieve high throughput and low latency. Integration: Seamlessly integrates with Azure Database for PostgreSQL, leveraging the power and flexibility of PostgreSQL. Learn more about DiskANN from Microsoft. Benefits of using a vector index in your AI application Using a vector index in PostgreSQL, such as pg_diskann, dramatically improves query performance and reduces latency for high-dimensional data applications like search engines, recommendation systems, and e-commerce websites. Unlike brute-force search, vector indexes optimize similarity searches by organizing data for efficient nearest neighbor queries using metrics like cosine similarity, Euclidean distance, or inner product. They leverage approximate algorithms, such as DiskANN, to reduce the search space, enabling sub-linear query times even for datasets with millions of vectors. On average using a Vector Index you can achieve sub-10-millisecond query times on a 1-million-row dataset, while brute-force search could take ~200 milliseconds or more, making using Vector index ideal for real-time applications. For example, an Airbnb-style platform could use vector search to match a user's query with similar properties in the database, and the index allows the system to quickly surface the most relevant listings, transforming what could be seconds-long processing into millisecond responses, ensuring a fast and personalized search experience. Using DiskANN on Azure Database for PostgreSQL Using DiskANN on Azure Database for PostgreSQL is easy. Enable the pgvector & diskann Extension: Allowlist the pgvector and diskann extension within your server configuration. Create Extension in Postgres: Create the pg_diskann extension on your database along with any dependencies. CREATE EXTENSION IF NOT EXISTS pg_diskann CASCADE; Create a Vector Column: Define a table to store your vector data, including a column of type vector for the vector embeddings. CREATE TABLE demo ( id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, embedding public.vector(3) ); INSERT INTO demo (embedding) VALUES ('[1.0, 2.0, 3.0]'), ('[4.0, 5.0, 6.0]'), ('[7.0, 8.0, 9.0]'); Index the Vector Column: Create an index on the vector column to optimize search performance. The pg_diskann PostgreSQL extension is compatible with pgvector, it uses the same types, distance functions and syntactic style. CREATE INDEX demo_embedding_diskann_idx ON demo USING diskann (embedding vector_cosine_ops) Perform Vector Searches: Use SQL queries to search for similar vectors based on various distance metrics (cosine similarity in the example below). SELECT id, embedding FROM demo ORDER BY embedding <=> '[2.0, 3.0, 4.0]' LIMIT 5; Ready to Dive In? Use the DiskANN preview today and explore the future of AI-driven applications with the power of Azure Database for PostgreSQL! Run our end-to-end sample RAG app with DiskANN Learn More Integrating DiskANN with Azure Database for PostgreSQL enables scalable, efficient AI applications. By leveraging advanced vector search capabilities, you can enhance the performance of your AI applications and deliver more accurate results faster than ever before. Learn more about DiskANN in Azure Database for PostgreSQL Azure Database for PostgreSQL in Semantic Kernel Azure Database for PostgreSQL | 🦜️🔗 LangChain DiskANN – Microsoft ResearchIntroducing DiskANN Vector Index in Azure Database for PostgreSQL
We're thrilled to announce the preview of DiskANN, a leading vector indexing algorithm, on Azure Database for PostgreSQL - Flexible Server! Developed by Microsoft Research and used extensively at Microsoft in global services such as Bing and Microsoft 365, DiskANN enables developers to build highly accurate, performant and scalable Generative AI applications surpassing pgvector’s HNSW and IVFFlat in both latency and accuracy. DiskANN also overcomes a long-standing limitation of pgvector in filtered vector search, where it occasionally returns incorrect results.LangChain integration with Azure Database for PostgreSQL (Part 1)
Use LangChain to split documents into smaller chunks, generate embeddings for each chunk using Azure OpenAI, and store them in a PostgreSQL database via the pgvector extension. Then, we’ll perform a vector similarity search on the embedded documents.