I used FAISS, so you don’t have to

6 min readJul 7, 2022

The introduction of Approximate Neural Neighbours methods started the renaissance of k-NN-like approaches. We’re all in desperate need of scalable yet simply interpretable methods, and similarity-based ones were great candidates but offered poor performance back in the day. When Facebook presented FAISS, and Spotify open-sourced their own tool, Annoy, using similarity, but combined with neural embeddings this time, became a topic again. And both libraries allowed many companies to start performing their neural search more efficiently, which has to be emphasized, but they also gave them another headache.

Typical workflow of computing embeddings. A model takes converts input into a high-dimensional vector.

What makes FAISS different

FAISS, which stands for Facebook AI Similarity Search, is a library written in C++ with a Python interface, which provides some data structures and methods to make the vector search efficient. Vector search really differs from the traditional search, as it can’t be based on inverted indexes anymore, and has to consider the distance between the data points, not the values of the single dimensions only. Those vectors are typically generated by deep learning models, and there are plenty of pretrained ones available off the shelf.

As a C++ library, FAISS is a great tool for performing your experiments quickly. If you have a dataset, you can generate embeddings out of your data points with a pretrained model, or fine-tune an existing one, to then create a FAISS index and finally perform the search for similar examples. And that will be blazing fast if you compare it to brute force k-NN. Unfortunately, the index is available only in the process that created it, so the integration with some external services requires some additional effort and a custom interface to create. All the vectors will also be put into the RAM of that single machine, so in case of bigger amounts of data and/or high dimensionality of the embeddings, you’ll need an enormous amount of memory available, or accept using the virtual memory and slowing it all down. There are of course some optimizations available, like product quantization (PQ), but at the end of the day, FAISS is just another heavy data structure you need to maintain somehow.

What makes FAISS, in turn, similar

Do not get me wrong — I love quick prototyping tools, and I use them heavily. But I have never even thought about running a Jupyter notebook with some pandas operation chains and called it a production system. All those libraries share some similar traits with FAISS. Each of them has been created to support experiments. They are quick and easy to set up, do not require DevOps experience to launch, and maintenance is unnecessary. But they all do not scale well. And that’s a matter of choice, not their drawback. When I get new data and just want to try out what’s in there, I don’t want to design a normalised database structure, put my dataset inside it, and then write lots of JavaScript code to visualize and realize it’s garbage. Well, I could, but that would be a waste of time and I can probably come to the conclusion it is garbage a few months sooner if I just loaded it with pandas, using my lovely Jupyter Notebook instance, and visualized it with the great help of matplotlib or seaborn.

There are many great tools designed for experiments. FAISS is one of them.

I’m pretty sure, that if I, from the very beginning, used an SQL database and created some b-tree or hash indexes on the columns, which I knew I’d be using often, I could have an optimized Data Science workflow, designed for a specific case. But that would be premature optimization. However, if we claim or even just pretend to do the “Science” work, that involves many failures on our way and floundering in the dark. Fast prototyping is crucial, even if our code executes way slower than it could. Going into production is not the responsibility of the Data Scientist, or at least it shouldn’t be. There are some other people who can move the experiments into a running system, and none of them will ever use pandas working with huge CSVs as a preferred data manipulation tool. And that’s also true for FAISS. “Make it work. Make it right. Make it fast.” — this is a famous statement, but DS experiments rarely go even up to the second phase.

There are many stories of using FAISS in production, but they always emphasize the importance of managing the servers, containers or pods. There is a lot of management going on, and that might be an overkill that distracts you from bringing the business value. You also need to create an abstraction layer, so the other services can communicate with your vector search engine. In a world dominated by cloud and serverless hype, systems like that seem a bit outdated.

Is vector search all we need?

NLP and Computer Vision tasks can sometimes depend solely on single pieces of text or images. But in many cases, we have some metadata or just equally important other attributes of our data points, which cannot be encoded in the embeddings directly. Just imagine a visual search system looking for a particular item by the photo you provide, preferably available in a place you live or at least nearby. I know there might be an excellent sushi takeaway in Tokio, but chances are I won’t order it for lunch if I live in Europe.

Embeddings cannot capture some of the important context data.

We cannot rely on vector search only if we have some other criteria to fulfil. And anyone working with ANN can tell it already. The good old search world we wanted to avoid has chased us cruelly. Unfortunately, FAISS on its own doesn’t support that mode of search. What we can do instead is to use a secondary search system, like Elasticsearch, to check which of the vector search results match the conditions. Sometimes even a good old relational friend may work well, but still — we need two systems to support the search, and probably another service to combine those results.

Do we have any alternative?

Taking things seriously, we need to start thinking about neural search as a first-class citizen in our systems. We have established great tools for keyword-based matching, or traditional attribute-based filtering, and there has been a lot going on in the area of vector similarity recently. The fast experimental way that FAISS provides is great for POCs, but if we want to scale things up, we need a system that will act as a vector database, capable of being launched on a whole bunch of machines, with redundancy and fault tolerance already built-in.

There are some differences between the existing vector search engines. Some of them, like Pinecone or Google’s Vertex AI, are available as SaaS first and only, but the majority are open source, so anyone can start simply with no hidden costs. And if you at some point decide you need to scale, then going into the cloud or on-premise cluster should be also possible, without any need to build your own system around it. And last, but not least — the best solutions will allow you to store not only the vectors, but also some different attributes you also want to filter by, or just simply retrieve with your neighbours.

Open Source projects really stand out in the world of ANN. The well-known Lucene-based tools, like Solr or Elasticsearch, also started implementing the indexes for dense vectors, but rather as an addition to their core functionalities, which are rather a keyword or inverted index-based. And that could work for you in case the neural search is just the icing on the cake, but not that important. However, if you take semantic search seriously and think of it as a major functionality, you have to consider using a tool which is designed specifically for that. And there are plenty of options, with tools like Qdrant that will allow you to run the neural search easily, with the support of additional filtering mechanisms done in a scalable way, while keeping a user-friendly interface.

References

https://speakerdeck.com/kumon/cvpr-2020-tutorial-a-large-scale-visual-search-system-in-the-c2c-marketplace-app-mercari?slide=21

I used FAISS, so you don’t have to

What makes FAISS different

What makes FAISS, in turn, similar

Is vector search all we need?

Do we have any alternative?

Written by Kacper Łukawski

Responses (1)