About similarity search
Traditional databases are made up of structured tables containing symbolic information. For example, an image collection would be represented as a table with one row per indexed photo. Each row contains information such as an image identifier and descriptive text. Rows can be linked to entries from other tables as well, such as an image with people in it being linked to a table of names.
AI tools, like text embedding (word2vec) or convolutional neural net (CNN) descriptors trained with deep learning, generate high-dimensional vectors. These representations are much more powerful and flexible than a fixed symbolic representation, as we’ll explain in this post. Yet traditional databases that can be queried with SQL are not adapted to these new representations. First, the huge inflow of new multimedia items creates billions of vectors. Second, and more importantly, finding similar entries means finding similar high-dimensional vectors, which is inefficient if not impossible with standard query languages.*
How can a vector representation be used?
Let’s say you have an image of a building — for example, the city hall of some midsize city whose name you forgot — and you’d like to find all other images of this building in the image collection. A key/value query that is typically used in SQL doesn’t help, because you’ve forgotten the name of the city.
This is where similarity search kicks in. The vector representation for images is designed to produce similar vectors for similar images, where similar vectors are defined as those that are nearby in Euclidean space.*
*Text extracted from this excelent blog post from Facebook AI
Mycelia API - Table Data
Insert Table Data
Example JSON Format:
[
{
"index":0,
"name":"Product Name 0",
"category":"Category"
},
{
"index":1,
"name":"Product Name 1",
"category":"Category"
}
]
POST https://mycelia.azure-api.net/clone/table/setup/data/YOUR_DB_NAME
Train/Apply a Model
POST https://mycelia.azure-api.net/clone/table/setup/unsupervised/YOUR_DB_NAME
Perform Ultra-Fast Similarity Search
GET https://mycelia.azure-api.net/clone/similar/id/YOUR_DB_NAME?index=INDEX_TO_BE_QUERIED
Mycelia API - Image Data
Insert Image Data
Example JSON Format:
[
{
"index":0,
"image_base64
":"/9j/4ROFRXhpZgAASUkqAAgAAA ..."
},
{
"index":1,
"image_base64
":"/9j/4ROFRXhpZgAASUkqAAgAAA ...",
}
]
POST https://mycelia.azure-api.net/clone/image/YOUR_IMG_DB_NAME
Perform Ultra-Fast Similarity Search
GET https://mycelia.azure-api.net/clone/similar/id/YOUR_IMG_DB_NAME?index=IMG_INDEX_TO_BE_QUERIED
Mycelia API - Text Data
Insert Text Data
Example JSON Format:
[
{
"index":0,
"text
":"awesome product review text..."
},
{
"index":1,
"text
":"not so awesome but nevertheless still a review...",
}
]
POST https://mycelia.azure-api.net/clone/text/pretrained/YOUR_TEXT_DB_NAME
Perform Ultra-Fast Similarity Search
GET https://mycelia.azure-api.net/clone/similar/id/YOUR_TEXT_DB_NAME?index=TEXT_INDEX_TO_BE_QUERIED