# How vector space modeling solves the cold-start problem in ecommerce

> A 700-dimensional product vector space solves the cold-start problem by enabling accurate ecommerce personalization from a shopper's first click, without any prior identity data.

**Author:** Sarah Miguel Cournane
**Published:** June 1, 2026
**Tags:** personalization, product-recommendations, ai, vector-search

---

Sarah Miguel Cournane, Data Scientist in Hello Retail's R&D department, describes how the platform maps every product as a coordinate in a vector space with more than 700 dimensions. This approach reframes the cold-start problem: rather than needing to know who a new visitor is, the system identifies what products they interact with and draws on those products' existing behavioral relationships to personalize from the very first click.

## The cold-start problem, explained simply

Every recommendation engine eventually hits the same wall: a new customer arrives and there is nothing to go on. No purchase history. No browsing patterns. No profile data. The system has to guess, and guessing produces noise rather than relevance.

Sarah compares it to starting to date someone. You know nothing about the other person - whether they are sporty, whether they have siblings. You have to figure it out in real time, with limited signals, and every mismatch costs you something. In ecommerce, that cost usually lands as a missed conversion.

The cold-start problem is one of the most studied challenges in recommendation system design. Traditional collaborative filtering approaches rely on accumulated user behavior - the more interactions a platform records, the better its recommendations get. That works well at month six. It does almost nothing for a visitor in their first session.

<a href="https://www.mckinsey.com/capabilities/growth-marketing-and-sales/our-insights/the-value-of-getting-personalization-right-or-wrong-is-multiplying" rel="nofollow">McKinsey research (2021)</a> found that 71% of consumers expect personalized interactions, and 76% say they become frustrated when those expectations go unmet. For new visitors - exactly the cohort the cold-start problem affects most - that frustration often means a quick exit.

## Products as coordinates: How the vector space works

Sarah's explanation of the machinery behind the recommendations starts with a fundamental reframe: products, not shoppers, are the primary unit of analysis.

In a traditional product catalog, a black men's running t-shirt might carry product ID 10. That number tells the system nothing about what the product actually is. In the vector space Sarah describes, the same product is represented as a point in more than 700 dimensions. Its position encodes everything behavioral data reveals about it: what other products customers view alongside it, what they buy after browsing it, which categories cluster around it, how its purchasing patterns diverge from visually similar items.

Two products with overlapping characteristics - say, product ID 10 and a newly added product ID 20, both black men's running t-shirts - end up occupying almost the same location in that space. They sit in the same cluster. Because they share a location, they can share data.

This is the structural solution to the cold-start problem for products. When a new item enters the catalog, it does not arrive as a blank slate. It arrives with neighbors - established products in the same cluster - and it inherits their behavioral context immediately. Sarah points out that the system does not need to wait for the new product to accumulate its own interaction history; the cluster does that work on its behalf.

## How new customers become knowable

The same logic extends to new shoppers, and this is where Sarah's thinking connects product-level modeling to the broader personalization challenge.

When a first-time visitor lands on a store and clicks on three products, those three products already exist somewhere in the vector space. They have relationships - with each other, with adjacent clusters, with purchase sequences built up across thousands of stores. The system does not need to know who the visitor is. It needs to know what those products know, and those products know quite a lot.

Sarah notes that a few interactions are enough to triangulate position within the preference landscape and surface relevant recommendations, because the products carry the behavioral context the system needs. The shopper does not have to identify themselves; they simply have to shop.

<a href="https://www.epsilon.com/us/about-us/pressroom/new-epsilon-research-indicates-80-of-consumers-are-more-likely-to-make-a-purchase-when-brands-offer-personalized-experiences" rel="nofollow">Epsilon research (2017)</a> puts the commercial value of this capability in sharp relief: 80% of consumers are more likely to make a purchase when brands offer personalized experiences. Getting personalization right for new visitors - the hardest population to serve well - is where the gap between generic and relevant shows up most clearly in conversion data.

## The economics behind the catalog problem

Sarah also surfaces two data points from Hello Retail's merchant network that reframe how urgently this matters for day-to-day operations.

Across the stores Hello Retail works with, roughly 30% of products account for 70% of revenue. That concentration holds across very different verticals - Sarah notes the merchant base spans everything from stores selling boats to stores selling clothes - which suggests it reflects something structural about how catalogs and purchasing behavior interact rather than anything category-specific.

The second number is more disruptive: around 50% of products active in a store's catalog today will no longer be active six months from now. Inventory turns over. Seasonal lines disappear. New SKUs replace discontinued ones. The behavioral data accumulated about an outgoing product risks sitting idle unless the system can identify its successor and transfer what it learned.

Vector space clustering makes that transfer possible. When a new product enters the catalog close to a departing one, the cluster relationship is explicit and the knowledge moves automatically. Sarah's point is that product churn of this scale makes the vector space approach a practical necessity, not an engineering luxury.

<a href="https://www.bcg.com/publications/2019/next-level-personalization-retail" rel="nofollow">BCG's research on personalization in retail</a> highlights that the stores extracting the most value from behavioral data are those that treat it as a durable asset rather than a session-level signal. The vector space approach is what makes durability possible: product relationships persist even as individual SKUs cycle in and out of the catalog.

## Beyond data privacy: Why product-centricity is a design choice

Sarah closes with a point that might seem counterintuitive at first. The reason the recommendation logic does not require knowing who a shopper is has little to do with data privacy, even though the privacy benefit is real.

It is an architectural choice. Products are more stable than users. A product's vector position, built up from thousands of interactions across a diverse merchant base, is a richer and more durable signal than a user profile that resets to zero every time a brand-new customer begins their first session. Privacy compliance is a welcome consequence. The primary reason is that products are the more reliable foundation for recommendations that work from click one.

Watch the full Conversations episode with Sarah Miguel Cournane: [Product Agents](/en/conversations/product-agents/).

---

*This content is from the Hello Retail blog. For the full experience with images and formatting, visit [helloretail.com/en/blog/vector-space-modeling-cold-start-problem](https://helloretail.com/en/blog/vector-space-modeling-cold-start-problem)*
