When I started building Queryra, the idea was simple: replace WooCommerce's broken LIKE query with something that actually understands what customers are searching for.
The reality was more complex. Building a semantic search engine that's fast enough for real-time e-commerce, accurate enough to beat keyword search, and affordable enough to offer at $9.99/month required solving a chain of engineering problems.
This article is the technical story of building Queryra — the architecture decisions, the trade-offs, and the lessons learned going from a Python prototype to a production service handling search for WooCommerce stores.
Why Not Just Use ChatGPT?
The obvious first approach: send each search query to the OpenAI API, include the product catalog as context, and let GPT find relevant matches.
I built this in a weekend. It worked. And it was completely impractical.
Speed: GPT-3.5 took 2-4 seconds per query. GPT-4 took 5-8 seconds. E-commerce search needs sub-500ms responses. Customers don't wait.
Cost: Every search = API call. With product context included, each query cost $0.01-0.05. A store with 500 searches/day would pay $150-750/month just in OpenAI fees — making a $9.99/month product impossible.
Reliability: Rate limits, API outages, and usage caps meant search could fail during peak traffic — exactly when it matters most.
Privacy: Sending every store's product catalog to OpenAI on every search raised GDPR concerns and customer trust issues.
The alternative: build a custom search pipeline using embeddings and vector similarity. More engineering work upfront, but faster, cheaper, and under our control.
Understanding Embeddings
The core of semantic search is vector embeddings — mathematical representations of text that capture meaning.
An embedding model reads text and outputs a vector (an array of numbers, typically 384-1536 dimensions). Texts with similar meanings produce vectors that are close together in this high-dimensional space.
