Optimizing Vector Similarity Searches at Scale

Microsoft

Jul 24, 2023

This post is co-authored by @adieldar (Principal Data Scientist, Microsoft)

In a previous blog – Azure Data Explorer for Vector Similarity Search, we focused on how Azure Data Explorer (Kusto) is perfectly suited for storing and searching vector embeddings.

In this blog, we will focus on performance tuning and optimizations for running vector similarity searches at scale.

We will continue working on the Wikipedia scenario where we generate the embeddings of wiki pages using OpenAI and store them in kusto. We then use series_cosine_similarity_fl kusto function to perform similarity searches.

Demo scenario

Optimizing for scale

To optimize the cosine similarity search we need to split the vectors table to many extents that are evenly distributed among all cluster nodes. This can be done by setting Partitioning Policy for the embedding table using the .alter-merge policy partitioning command:

.alter-merge table WikipediaEmbeddingsTitleD policy partitioning  
``` 
{ 
  "PartitionKeys": [ 
    { 
      "ColumnName": "vector_id_str", 
      "Kind": "Hash", 
      "Properties": { 
        "Function": "XxHash64", 
        "MaxPartitionCount": 2048,      //  set it to max value create smaller partitions thus more balanced spread among all cluster nodes 
        "Seed": 1, 
        "PartitionAssignmentMode": "Uniform" 
      } 
    } 
  ], 
  "EffectiveDateTime": "2000-01-01"     //  set it to old date in order to apply partitioning on existing data 
} 
```

In the example above we modified the partitioning policy for WikipediaEmbeddingsTitleD. This table was created from WikipediaEmbeddings by projecting the documents’ title and embeddings.

Notes:

The partitioning process requires a string key with high cardinality, so we also projected the unique vector_id and converted it to string.
The best practice is to create an empty table, modify its partition policy then ingest the data. In that case there is no need to define the old EffectiveDateTime as above.
It takes some time after data ingestion until the policy is applied.

To test the effect of partitioning we created in a similar manner multiple tables containing up to 1M embedding vectors and tested the cosine similarity performance on clusters with 1, 2, 4, 8 & 20 nodes (SKU Standard_E4d_v5).

The following table and chart compare search performance (in seconds) before and after partitioning:

	Number of Nodes
# of vectors	*1 (no partitioning)**	2	4	8	20
25,000 Vectors	3.4	0.95	0.67	0.57	0.51
50,000 Vectors	6.2	1.5	0.92	0.65	0.55
100,000 Vectors	12.4	2.6	1.55	1	0.57
200,000 Vectors	24.2	5.2	2.8	1.65	0.63
400,000 Vectors	48.5	10.3	5.4	2.95	0.87
800,000 Vectors	96.5	20.5	10.5	6	1.2
1,000,000 Vectors	102	26	13.3	7.2	1.4

* Note that the cluster has 2 nodes, but the tables are stored on a single node (this is our baseline before applying the partitioning policy)

You can see that even on the smallest 2 nodes cluster the search speed is improved by more than x4 factor, and in general the speed is inversely proportional to the number of nodes. The number of embedding vectors that are needed for common LLM scenarios (e.g. Retrieval Augmented Generation) rarely exceeds 100K, thus by having 8 nodes searching can be done in 1 sec.

How can you get started?

If you would like to try this demo, head to the  azure_kusto_vector  GitHub repository and follow the instructions.

The Notebook in the repo will allow you to -   

Download precomputed embeddings created by OpenAI API. 
Store the embeddings in ADX. 
Convert raw text query to an embedding with OpenAI API. 
Use ADX to perform cosine similarity search in the stored embeddings

You can start by -

Using KQL Database in Microsoft Fabric by signing up for a free trial - https://aka.ms/try-fabric
Spinning up your own free Kusto cluster -  https://aka.ms/kustofree

We look forward to your feedback and all the exciting things you build with kusto & vectors!

Updated Nov 09, 2023

Version 4.0

Microsoft

Joined February 14, 2022

View Profile

Azure Data Explorer Blog

Follow this blog board to get notified when there's new activity

Anshul_Sharma
Microsoft
Apr 09, 2024
vsrekha213 absolutely, the example shows how-to load it one-time from Notebooks - the data once loaded into Kusto, stays there for future use depending on your retention settings. In Fabric, you can use the same capabilities in Eventhouse. Event house overview (preview) - Microsoft Fabric | Microsoft Learn
vsrekha213
Copper Contributor
Mar 10, 2024
Hi,
The provided example demonstrates using pre-computed vector data stored in a CSV file. While this works, I'm interested in a more permanent solution. Can we directly store vector data within a Fabric Warehouse table, in a dedicated column, instead of relying on a notebook to load it each time? My goal is to persistently store the vector data in a database, making it readily available for future use without requiring manual loading through notebooks.
Anshul_Sharma
Microsoft
Aug 16, 2023
Hi AleksandraJ - one quick option is for you to use the Notebook in repo and run it in Microsoft Fabric. You can just use the KQL Database in Fabric.
If you want to avoid using spark, you can just download the dataset, and upload it to kusto using the Ingestion Wizard. Ingest data into Azure Data Explorer using the ingestion wizard - Azure Data Explorer | Microsoft Learn
AleksandraJ
Copper Contributor
Aug 03, 2023
Hi!
Nice article. However I have a doubts that I can follow above example and github repo with free ADX cluster. I tried it and I don't see how I can do this. Moreover it is not mentioned in article or anywhere in github repo, but you need spark session up and running and you can not run it locally (or it is quite tricky and not quick option also if you have already pyspark installed locally). Any other option how to write data to a free kusto table (free cluster) without using spark dataframe and authentication ?

Blog Post

Optimizing Vector Similarity Searches at Scale

Share