Custom Taxonomies in AI Search for WordPress — Book Genres, Materials, Property Types

A bookstore wants visitors searching *"1920s American novel"* to find The Great Gatsby. A real estate site wants *"two-bedroom condo near downtown"* to surface the right listings. A recipe site wants *"vegan Italian dinner under 30 minutes"* to land on the right pages. All three need search that understands data living outside the post body — in custom taxonomies like book_genre, book_era, property_type, location, cuisine, dietary, course.

WordPress's default search doesn't read taxonomies. Most third-party search plugins don't either. The current Queryra plugin does — automatically, with no setup. This post explains what changed, what it means for stores with rich data models, and how it works under the hood.

What are custom taxonomies, and why default WordPress search ignores them

WordPress lets you register any number of custom taxonomies — categories of structured data attached to posts, products, or any custom post type. Built-in examples: category (for posts) and product_cat (for WooCommerce). Custom examples that sites add for their own data models:

A bookstore registers book_genre, book_author, book_publisher, book_series, book_era
A real estate plugin registers property_type, location, amenities, building_year
A recipe plugin registers cuisine, dietary, course, difficulty, cooking_method
A music streaming site registers artist, album, genre, mood, instrument
An events plugin registers venue, event_category, organizer, event_type

These taxonomies are real data in the WordPress database (wp_terms + wp_term_relationships), but default WordPress search ignores them. Search runs as a SQL LIKE '%query%' against post_title and post_content only. A book tagged book_genre: fiction won't appear when a visitor searches *"fiction novel"* unless the word "fiction" literally appears in the book's description.

This is the same architectural blind spot that affects WordPress's relationship with page builders — important data lives in the database, but the search engine doesn't know how to read it.

What Queryra does with custom taxonomies

The current Queryra plugin auto-detects every public taxonomy registered on your site and sends terms to the AI search index alongside the standard fields. Specifically:

Auto-detection. The plugin queries WordPress for taxonomies registered with public => true, on every sync. No whitelist to maintain, no settings to toggle. Add a new taxonomy from your theme or another plugin — it's picked up on the next sync.
Smart exclusions. Built-in taxonomies covered by dedicated fields are skipped automatically: category, post_tag, product_cat, product_tag, product_brand, yith_product_brand, pwb-brand, post_format, nav_menu, link_category. These are already searchable via the standard categories/tags/brand plumbing.
Structured payload. Remaining taxonomies are sent to the Queryra API as a map of slugs to comma-separated term names, e.g.:

{
  "book_genre": "fiction, classic, american-literature",
  "book_author": "F. Scott Fitzgerald",
  "book_era": "1920s",
  "book_setting": "new york"
}

On the backend, each taxonomy contributes two things simultaneously: it's appended to the embedding text (so AI semantic search picks up *"twentieth century novel"* matching a book tagged book_era: 1920s), and it's stored as filterable metadata (so future intent-aware queries like *"vegan Italian dinner"* can pre-filter by cuisine: italian + dietary: vegan before the semantic step).

Real-world examples by industry

The most useful way to think about this is by use case. A few common patterns:

### Bookstore / library

Taxonomies typical to a Books custom post type or WooCommerce books category:

book_genre — fiction, non-fiction, fantasy, romance, mystery
book_author — F. Scott Fitzgerald, Ursula K. Le Guin
book_publisher — Penguin Classics, Vintage
book_series — Earthsea, Wheel of Time
book_era — 1920s, Victorian, Contemporary
book_setting — New York, Middle Earth, dystopian future

A search like *"twentieth century American novel about ambition"* now lands on Gatsby — the semantic match catches "twentieth century" against book_era: 1920s (close enough), "American" against book_genre: american-literature, and "ambition" semantically resolves through the book's description plus author context.

### Real estate listings

property_type — condo, single-family, multi-family, commercial
location — Park Slope, Williamsburg, downtown Boston
amenities — pool, parking, gym, doorman
building_year — pre-war, mid-century, new construction

*"Pre-war doorman building in Park Slope"* — three taxonomies in one query, all extracted from the natural language and matched against listings.

### Recipes

cuisine — Italian, Mexican, Japanese
dietary — vegan, gluten-free, keto
course — appetizer, main, dessert
difficulty — easy, intermediate, advanced
cooking_method — baked, grilled, no-cook

*"Easy gluten-free dessert"* picks up difficulty: easy + dietary: gluten-free + course: dessert simultaneously.

### Music / streaming

artist — David Bowie, Joni Mitchell
album — Hunky Dory, Blue
genre — folk-rock, glam-rock, ambient
mood — melancholic, energetic, contemplative
instrument — piano-driven, guitar-driven

*"Melancholic piano-driven folk"* — three custom taxonomies, all part of the same semantic vector.

### Events

venue — Madison Square Garden, Roundhouse London
event_category — concert, conference, workshop
organizer — official festival, independent promoter

### WooCommerce stores

WooCommerce stores using global product attributes (defined in Products → Attributes) get those indexed automatically via WordPress's pa_* taxonomies (pa_color, pa_size, pa_material, pa_brand). Any custom taxonomy you register beyond those — recommended_age, season, room, style — joins the same pipeline.

The pattern repeats across nearly every niche store. Once your data is modelled as taxonomies (which is how WordPress encourages you to model structured attributes), AI semantic search can use it without any per-site configuration.

How it works under the hood

The flow is straightforward and mirrors how categories and brand have always worked — just expanded to N custom taxonomies:

WordPress plugin side. On sync, the plugin calls get_taxonomies(['public' => true]) to discover taxonomies, then wp_get_object_terms($post_id, $taxonomy) to fetch terms for each post. Terms are joined with commas and packaged into a taxonomies field in the API record payload.

Backend storage. Queryra's records database has a dedicated taxonomies column (JSONB on Postgres). The map is stored as-is.

Embedding text. During the sync-to-search-index step, each taxonomy is appended as a "Label: terms" line to the document text that goes into the AI embedding — same pattern as the existing "Brand: nike" and "Categories: Books, Fiction" lines. For a Gatsby record, the embedding text gets:

The Great Gatsby. Brand: penguin classics. Categories: Books, Fiction.
Book Genre: classic, american-literature.
Book Author: F. Scott Fitzgerald.
Book Era: 1920s.
Book Setting: new york.
[full description text follows]

Result: the AI embedding for this record now "knows" about the era, the author, the setting — even though none of those words might appear in the post body.

Filterable metadata. Each taxonomy is also flattened into ChromaDB metadata with a tax_ prefix: tax_book_era: "1920s", tax_book_setting: "new york", lowercased for case-insensitive matching. This enables future intent-parser-driven filters: *"books set in New York"* can pre-filter to records where tax_book_setting contains "new york" before the semantic search step.

Hash-based change detection. When you edit a term assignment (add book_era: 1920s to a post that previously had book_era: contemporary), the plugin's content hash changes — triggering a re-sync of that specific record. No full re-import needed.

Tuning what gets indexed

Auto-detection works for most sites without setup. For sites that need precise control — private taxonomies that shouldn't go to search, slug renames for cleaner labels, or restricting to a specific list — Queryra exposes the queryra_indexable_taxonomies filter:

add_filter('queryra_indexable_taxonomies', function ($taxonomies_map, $post) {
    // Only index specific taxonomies on the 'book' post type
    if ($post->post_type === 'book') {
        return array_intersect_key(
            $taxonomies_map,
            array_flip(['book_genre', 'book_author', 'book_era'])
        );
    }
    return $taxonomies_map;
}, 10, 2);

The filter receives the auto-detected map and returns whatever you want sent. Common patterns: whitelist a small subset, exclude internal/private taxonomies, rename slugs for readability before they hit the embedding (e.g. book_genre → Book Genre happens automatically, but you can override).

Full documentation of this filter and the companion queryra_indexable_meta_content lives in our developer filters guide (coming soon).

What you don't need to do

Worth being explicit about, because the absence of setup is the feature:

No whitelist to maintain. Every public taxonomy on your site gets indexed automatically. Add a new taxonomy next month — picked up on next sync.
No editor changes. Authors keep using the standard WordPress UI to assign terms (the metabox in the post editor sidebar). No special editing experience required.
No theme modifications. Queryra hooks into WordPress at the SQL search layer — your single-book.php template, your archive pages, your shop layout — all unaffected.
No re-import. Once you upgrade Queryra, the next sync (manual or automatic on post save) picks up taxonomies. Existing records get refreshed naturally on edit.
No data migration. Taxonomies stay in WordPress's standard wp_terms + wp_term_relationships tables. Queryra reads them via the WordPress API.

TL;DR

Default WordPress search ignores custom taxonomies (book_genre, material, property_type, cuisine, venue, etc.) — they don't live in post_content, so SQL LIKE queries miss them.
Queryra now auto-detects every public custom taxonomy on your site and sends terms to the AI semantic search index.
Built-in taxonomies (category, post_tag, product_cat/tag, brand variants) are skipped — already covered by dedicated fields.
Each taxonomy contributes to both the embedding (semantic match) and filterable metadata (precise filter) — same pattern as how categories and brand have always worked.
Use cases by industry: bookstore (book_genre, book_era), real estate (property_type, amenities), recipes (cuisine, dietary), music (mood, instrument), events (venue), and WooCommerce stores with custom attribute taxonomies.
Zero setup. Auto-detection on every sync. For precise control, the queryra_indexable_taxonomies developer filter lets you override the auto-detected list.

If your site has rich custom data models, this is the difference between *"the search engine misses half my product attributes"* and *"every meaningful attribute on every record is searchable"* — without anyone touching the editor experience.

Try Queryra free