Discovery
Semantic Search vs Keyword Search for Ecommerce | Scouty
When to use semantic search, when keyword search wins, and how hybrid retrieval helps catalog-heavy stores match shopper intent.
A lot of ecommerce teams hear “semantic search” and assume keyword search is obsolete. It is not. The two solve different problems, and the most useful retrieval setup for ecommerce is almost always a hybrid that combines them.
This post is a practical comparison: when each one wins, when they fail, and how to think about hybrid retrieval for a real product catalog.
What keyword search is good at
Keyword search. Also called lexical or BM25 search. Matches terms in the query against terms in the index. It is fast, predictable, and excellent at:
- Exact identifiers: SKUs, model numbers, ISBNs, part numbers.
- Brand and category names: “Levi’s 501,” “GORE-TEX,” “stainless steel.”
- Attribute filters: size, color, material, price.
It is also operationally simple. Synonym rules, typo tolerance, and merchandising controls behave in ways merchandisers can predict.
Where keyword search struggles
Keyword search struggles when shoppers describe what they want without using your catalog’s vocabulary:
- “Something for sweaty feet” → does not match “moisture-wicking sock.”
- “A boot for wet trails” → does not match “GORE-TEX hiking boot.”
- “Office chair that won’t kill my back” → does not match “ergonomic mesh task chair.”
You can build synonym rules forever, and you’ll still miss most of the long tail.
What semantic search adds
Semantic search uses embeddings. Vector representations of text. To retrieve products by meaning, not surface form. Two phrases that mean the same thing live near each other in vector space, even if they share no words.
Semantic search is good at:
- Use-case queries: “for cooking on a boat,” “for sensitive skin.”
- Natural-language phrases: “running shoes that don’t hurt my heels.”
- Long-tail synonyms you would never write rules for.
It is less good at exact identifier lookup, where surface-form precision matters. A semantic model might rank a similar SKU above the exact one a shopper typed.
When semantic search wins outright
A few patterns where semantic retrieval clearly outperforms keyword:
- Catalog-heavy stores with rich product copy where shoppers ask in everyday language.
- Beauty and skincare (“hydrating,” “for redness,” “fragrance-free”).
- Home, fashion, and lifestyle (“cottagecore vibe,” “boho-style rug”).
- B2B catalogs with technical jargon, where buyers describe a use case but rely on you to map it to specs.
When keyword search wins outright
- SKU, ISBN, model number, or part number lookups.
- Auto parts where the buyer types “BMW E92 brake pad” and means a precise compatibility filter.
- Strict attribute searches like “size 10” or “blue.”
Hybrid retrieval is the real answer
In production, the best ecommerce search experiences run keyword and semantic side-by-side and merge the results. A simple recipe:
- Run a fast keyword search for the same query.
- Run a semantic search in parallel.
- Merge results with a reranker that respects exact identifier matches and product attributes, while pulling in semantically relevant items keyword search would have missed.
- Apply merchandising rules on top.
This gives you the precision of keyword search and the recall of semantic search.
What about reranking?
Rerankers are a separate model layer that takes the merged candidates and resorts them by query-document relevance. They are useful when:
- Your catalog has many near-duplicates (variants, colors).
- You want to enforce business rules (in-stock priority, margin priority) on a high-recall set.
Rerankers add cost and latency, so most stores should start without one and only add reranking when they see clear cases where merged results are noisy.
Cost and infrastructure
Semantic search costs more than keyword search, in two places:
- Embedding generation. Every product, document, and query has to be embedded.
- Vector storage and retrieval. Vector indexes have memory and CPU costs that scale with dimensionality and corpus size.
This is why Scouty meters semantic search by semantic queries and vector objects. Those are the actual cost drivers. Storage-heavy customers should plan for higher tiers.
How to roll this out
A pragmatic rollout:
- Start with great keyword search. Typo tolerance, synonyms, attribute indexing, merchandising rules.
- Add semantic search for natural-language and use-case queries. Watch zero-result rate drop.
- Use hybrid retrieval in production once you have good baselines.
- Add a reranker only when you see clear ranking issues hybrid retrieval doesn’t solve.
How Scouty handles this
Scouty includes a semantic search starter allowance in the Growth plan and scales semantic queries and vector objects through a separate add-on. Most stores get meaningful zero-result recovery from baseline product search plus semantic. Without needing a full rerank pipeline.
If you’d like an expert review of which retrieval setup fits your catalog, request a free expert-led Search Audit.