Image embeddings capture visual features (shapes, colors, objects, textures)
Image embedding models are typically CNN or vision transformer based
Image Embedding Generation:
- Instead of embedding text, we embed images using a pre-trained vision model
- Popular models include Vision Transformers (ViT), ResNet, EfficientNet, or CLIP
- The model extracts features from images and converts them to dense vector representations
Core Similarities with Text Search:
- Both methods convert unstructured data (text/images) into vector representations
- Both use similarity metrics (typically cosine similarity) to find closest matches
- Both can be stored in vector databases like AstraDB for efficient retrieval
Applications:
- Product recommendations (visually similar products)
- Reverse image search
- Finding duplicate or near-duplicate images
- Content moderation (finding similar inappropriate content)
- Medical image analysis (finding similar cases)
import os
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from PIL import Image
import torch
from transformers import AutoImageProcessor, AutoModel
# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Initialize image embedding model - using ViT (Vision Transformer)
model_name = "google/vit-base-patch16-224"
processor = AutoImageProcessor.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name).to(device)
# Function to get image embedding
def get_image_embedding(image_path):
image = Image.open(image_path)
inputs = processor(images=image, return_tensors="pt").to(device)
with torch.no_grad():
outputs = model(**inputs)
# Use CLS token as the image embedding
embedding = outputs.last_hidden_state[:, 0, :].cpu().numpy().flatten()
return embedding
# List of image paths
image_paths = [
"Indigo Nation Men Plain Orange Shirts.jpg",
"ADIDAS Men Navy Blue Shirts.jpg",
"Indigo Nation Men Price catch Blue Shirts .jpg",
"Puma Men's Foundation Grey Polo T-shirt.jpg",
"Indigo Nation Men Bling Pink Shirts.jpg"
]
# Query image
query_image = "Indigo Nation Men Plain Orange Shirts.jpg"
# Get embedding for the query image
query_embedding = get_image_embedding(query_image)
# Store all embeddings and their corresponding images
all_embeddings = []
for image_path in image_paths:
embedding = get_image_embedding(image_path)
all_embeddings.append(embedding)
# Convert to numpy arrays for similarity calculation
query_embedding_np = np.array(query_embedding).reshape(1, -1)
all_embeddings_np = np.array(all_embeddings)
# Calculate cosine similarity between query and all images
similarities = cosine_similarity(query_embedding_np, all_embeddings_np).flatten()
# Create a DataFrame to display results
results = pd.DataFrame({
'Image': image_paths,
'Similarity Score': similarities
})
# Sort by similarity score in descending order
results = results.sort_values('Similarity Score', ascending=False)
print(f"Query image: {query_image}")
print("\nSimilarity Search Results:")
print(results)
# Find the most similar image
most_similar_idx = np.argmax(similarities)
print(f"\nMost similar image: \"{image_paths[most_similar_idx]}\" with similarity score: {similarities[most_similar_idx]:.4f}")
No comments:
Post a Comment