Tackling E-commerce Catalog Complexity with Generative AI – AWS Nova

Messy product catalogs frustrate customers, hurt search performance, and quietly erode revenue. Discover how Factspan uses AWS-native Generative AI to enrich attributes, align taxonomy, and generate brand-safe content—turning catalog chaos into a scalable, discovery-first asset.

Why this blog?

Behind every “no results found” or irrelevant product suggestion lies a deeper catalog problem. This article reveals how an AI-first, governance-driven approach can solve structural data issues before they reach the customer. You’ll learn why fixing catalog integrity is just as critical to revenue as marketing, merchandising, or pricing strategy.

There’s a quiet kind of chaos at the heart of digital commerce. It doesn’t scream, doesn’t crash, doesn’t throw errors. It lingers, in missed searches, misclassified products, clunky filters, and disappointing discovery. This chaos lives in the product catalog.

Most retailers don’t talk about it. Because to admit the catalog’s a mess is to open the door to a long, expensive, exhausting problem. Product titles are inconsistent. Descriptions lack detail or sound like copy-paste jobs. Attributes like color, size, material, and use case are often wrong, missing, or jumbled. Search results surface the wrong products. And the same item might show up under three different names depending on the page or the platform.

The pace keeps accelerating, more products, more channels, more campaigns. But the time to deliver? Shrinking fast. The content team can’t keep up. They’re doing the best they can with spreadsheets, macros, outdated PIM systems, and a library of old product copy that no longer reflects how customers search today.

Here’s what that looks like on the inside:

Pandas is the preferred Python data manipulation library, providing simple DataFrame operations on in-memory data. It is frequently used for exploratory analysis for small-to-medium datasets. Polars, on the other hand, is a modern DataFrame library written in Rust and using Apache Arrow for memory management. While Pandas conducts operations quickly (each command runs instantly) and mostly single-threaded, Polars supports both eager and lazy execution and can automatically use many CPU cores in parallel. This difference in design gives Polars a performance edge for large datasets and complex transformations. Both libraries provide a high-level DataFrame API, and in fact Polars mirrors much of Pandas syntax to ease the learning curve.

  • Product data arrives from suppliers in different formats, none of which match internal schema
  • Categories evolve organically, without centralized governance
  • Attributes get lost, misformatted, or overwritten, especially when products are batch-uploaded
  • Marketplace templates and taxonomy don’t align with brand guidelines, leading to duplicate content.
  • Search engines, filters, and recommendations degrade because the metadata isn’t consistent

For the customer, all of this shows up as friction. They search for “organic baby lotion” and get results for adult sunscreen. They filter for “black running shoes” and see navy. They click on an image only to find the description is clearly for a different variant.

It’s not just a bad UX. It’s lost revenue, plain and simple.

Where does Generative AI actually fit

Most people think of GenAI as a content writer. And yes, it can generate product descriptions, but that’s barely scratching the surface. The real value comes when GenAI is used as a catalog intelligence engine. One that doesn’t just write, but understands structure. One that isn’t just about creativity, but about alignment.

Instead of treating the catalog as a static copy, GenAI helps interpret it as a dynamic data layer. It recognizes attribute gaps, rewards, titles for clarity and SEO, aligns language with tone-of-voice, and suggests taxonomy clean-ups, all while working within brand-safe, audit-ready constraints.

The challenge isn’t generation. It’s governed generation, making sure every output respects the underlying schema, product hierarchy, and commercial logic.

Building something teams can trust

At Factspan, we knew the technical solution needed to be intelligent but not opinionated, something that integrates into existing pipelines, not replaces them.

We built a lightweight GenAI engine designed to plug into real e-commerce catalog workflows. The engine ingests messy product data, applies prompt scaffolding and schema-aware logic, and returns output that content and merchandising teams can trust. Everything from category classification to attribute enrichment and description generation was governed by a feedback loop.

Designed on AWS-native infrastructure, the solution embeds LLMs and schema-aware prompts directly into catalog pipelines, enhancing enrichment and classification without altering existing systems.

The outcome? Auto-generated content that’s not just fluent, but accurate, context-aware, and agent-ready, powering SEO, search, recommendations, and ads. Early adopters are already seeing gains in product visibility and discoverability.

A nod from the ecosystem – recognized at AWS Nova Demo

This project was recently recognized by AWS as a Semi-Finalist in the 2025 Nova Demo Competition, a showcase of GenAI solutions solving real enterprise problems.

And while the award is a proud moment, what matters more is what it signals: that enterprise GenAI isn’t about novelty anymore. It’s about fit, integration, governance, and value. It’s about solving old problems like messy catalogs, in smarter ways.

Behind the Scenes

The solution is a result of deep collaboration across Factspan teams, combining expertise in retail content, data architecture, and generative AI.

The team designed an architecture that ensures every GenAI output aligns with real-world catalog constraints. From schema mapping and taxonomy consistency to training for brand tone and integrating with existing QA environments, every piece was built with scale, safety, and merchandising usability in mind.

This isn’t just an AI prototype. It’s a practical, production-ready solution that retail teams can trust, built by Factspan, with a clear focus on what actually works in the field.

Designing for Discovery, Not Just Delivery

Fixing a product catalog isn’t flashy, but it’s what makes everything else work: search, personalization, recommendations, ads, and ultimately conversions.
With GenAI, the opportunity isn’t just faster copywriting. It’s structured, scalable, discovery-first content, content that’s aligned with taxonomy, tone, and the full commerce stack.

At Factspan, we’re helping retailers move from content rewrites to content strategies, building GenAI solutions that are structure-aware, context-rich, and ready for scale. Recognized by the AWS Nova program, our e-commerce cataloging copilot delivers catalog-level optimization and real-time insights designed for retail teams.

Curious how your catalog can drive better discovery?
Register for a Demo and see how Nova + FLUX copilots can transform your retail workflows.

Ready to transform your product catalog into a growth engine?

Featured content

Fractional AI and the Future of Enterprise GenAI S...

Multi-Agent Healthcare Ecosystems for Smarter and ...

Enterprises are Turning to Compact Models in AI...

Choosing Polars Over Pandas for High-Performance D...

Scroll to Top