How to Automatically Categorize E-commerce Products with AI

Manually categorizing a product catalog doesn't scale. Learn how to use AI classification to assign every new product to the right taxonomy node instantly - whether you're running a marketplace, a catalog tool, or a data pipeline.

The catalog problem

Every e-commerce platform eventually faces the same challenge: product data comes in from multiple sources - seller uploads, CSV imports, API feeds, manual entry - and none of it is consistently categorized. A seller lists "Wireless Noise Cancelling Headphones" and puts it under "Electronics." Another seller lists the same product type and drops it under "Audio" or "Accessories" or nothing at all.

The result is a catalog that's inconsistent, hard to browse, and difficult to search. Customers can't find what they're looking for. Merchandising teams can't run reliable category-level reporting. And the problem compounds every time a new seller joins or a new product feed is ingested.

Manual categorization is the typical solution - and it doesn't scale. Hire a team to review products, and you're still the bottleneck every time volume spikes. AI classification removes the bottleneck entirely.

What automated categorization looks like

Instead of a person reading a product title and picking a category from a dropdown, every new product listing is sent to a classification API the moment it's submitted. The API returns the best-fit category from your taxonomy, along with a confidence score. Your application saves the category, and the product appears in the right place in your catalog - instantly.

For the vast majority of products, the whole thing is invisible. For edge cases where confidence is low, you route to a human review queue rather than auto-assigning. You get consistent categorization at scale with a small, well-targeted manual review step where it actually matters.

Defining your taxonomy

Start flat. Even if your real taxonomy has hundreds of nodes arranged in a tree, begin by classifying into your top-level categories. A typical top-level set for a general marketplace might look like:

  • Electronics - devices, cables, audio, computing, cameras
  • Clothing & Apparel - tops, bottoms, footwear, accessories
  • Home & Garden - furniture, tools, decor, kitchen
  • Sports & Outdoors - fitness equipment, camping, team sports
  • Beauty & Personal Care - skincare, haircare, fragrance
  • Toys & Games - board games, action figures, puzzles, outdoor play
  • Food & Grocery - pantry, snacks, beverages, fresh produce
  • Books & Media - books, music, movies, software
  • Automotive - parts, accessories, tools, car care
  • Health & Wellness - supplements, medical supplies, fitness

Save this as a named schema in classifaily and reference the schema ID on every call. You define it once; your entire pipeline uses it.

The API call

Given a product like this from a seller's feed:

Title: Foam Camping Sleeping Pad - Ultralight, 72" x 20"
Description: Lightweight closed-cell foam sleeping pad for backpacking
             and camping. Folds compactly for easy packing. R-value 2.0.
             Great for three-season use.

You send the title and description to classifaily:

curl -X POST https://api.classifaily.com/v1/classify \
  -H "Authorization: Bearer cai_live_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Foam Camping Sleeping Pad - Ultralight, 72\" x 20\"\n\nLightweight closed-cell foam sleeping pad for backpacking and camping. Folds compactly for easy packing. R-value 2.0. Great for three-season use.",
    "categories": ["Electronics", "Clothing & Apparel", "Home & Garden", "Sports & Outdoors", "Beauty & Personal Care", "Toys & Games", "Food & Grocery", "Books & Media", "Automotive", "Health & Wellness"],
    "explain": true
  }'

Response:

{
  "label": "Sports & Outdoors",
  "confidence": 0.97,
  "reasoning": "Product is a camping sleeping pad described for backpacking use. Camping and outdoor gear maps clearly to Sports & Outdoors.",
  "request_id": "req_03kz..."
}

Write Sports & Outdoors to the product record and continue. No human involved.

Batch processing existing catalogs

If you have an existing catalog that's uncategorized or inconsistently categorized, the /batch endpoint lets you classify multiple products in a single request rather than making one call per product:

curl -X POST https://api.classifaily.com/v1/batch \
  -H "Authorization: Bearer cai_live_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "inputs": [
      "Foam Camping Sleeping Pad - Ultralight, 72\" x 20\"",
      "Women'\''s High-Waist Yoga Leggings - 4-Way Stretch",
      "Stainless Steel French Press Coffee Maker, 34oz",
      "LEGO Star Wars Millennium Falcon Building Set",
      "Vitamin D3 5000 IU Softgels, 365 Count"
    ],
    "categories": ["Electronics", "Clothing & Apparel", "Home & Garden", "Sports & Outdoors", "Beauty & Personal Care", "Toys & Games", "Food & Grocery", "Books & Media", "Automotive", "Health & Wellness"]
  }'

The response returns a label and confidence for each input in order. You can run your entire uncategorized backlog through in a fraction of the time it would take a team to review manually.

Drilling into subcategories

Once you have a top-level label, run a second classification call with the subcategory list for that branch. For a product classified as Electronics, your second call might target:

{
  "input": "...",
  "categories": ["Audio", "Cameras & Photography", "Computers & Laptops", "Mobile Phones", "Smart Home", "TV & Home Theater", "Wearables", "Cables & Accessories"]
}

Two calls, two levels of taxonomy, and the product is fully categorized. You only run the subcategory call for the branch that matched - so you're not making ten extra API calls per product, just one more targeted one.

Handling low-confidence responses

Some products are genuinely ambiguous. A "resistance band" could plausibly live under Sports & Outdoors or Health & Wellness. A "USB-C hub" could be Electronics or Automotive depending on context. When the API returns a confidence below your threshold - typically 0.75 is a good starting point - don't auto-assign. Instead, flag the product for human review:

$result = classify($product);

if ($result['confidence'] >= 0.75) {
    assign_category($product_id, $result['label']);
} else {
    flag_for_review($product_id, $result['label'], $result['confidence']);
}

Products in the review queue show the AI's best guess alongside the confidence score, so reviewers aren't starting from scratch - they're confirming or correcting a suggestion. In practice, fewer than 8% of products hit this threshold. You're automating the other 92%.

Enriching with seller context

Sellers often include useful metadata alongside the product listing - the category they self-selected, their store type, or previous listings. You can include this as context in the input to improve accuracy:

{
  "input": "Store type: outdoor gear retailer\nSeller-selected category: equipment\n\nProduct: Foam Camping Sleeping Pad...",
  "categories": [...]
}

The classifier treats everything in the input field as context. Feeding seller signals - even noisy ones - can meaningfully lift confidence scores on edge cases, because it gives the model more signal to work with beyond just the product title.

Going further

Product categorization is the foundation, but the same classification layer can drive additional catalog enrichment. Run a second call to classify condition (new, refurbished, used). Run a third to detect listing policy violations before a product goes live. Run a fourth to identify whether the listing is a duplicate or variant of an existing product. Each classification call is independent, fast, and composable - you're building a pipeline, not a single lookup.

For marketplaces and catalog tools with high ingest volume, the ROI is immediate: a categorization team that was handling 500 products per day gets replaced by an API call that handles 500 products per minute, with better consistency than any manual process.

Start classifying your product catalog today.

Free plan. No credit card. 100 requests per month to get started.

Get started free