Contents
  1. What Association Mining Finds
  2. Key Measures
  3. The Apriori Property
  4. Strong Rules
  5. Product Recommendation Example
← All posts

Association Rule Mining: Support, Confidence, and Apriori

Association rule mining finds co-occurrence patterns in transaction data. Support and confidence measure rule strength. The Apriori algorithm efficiently prunes the search space using the downward closure property.

What Association Mining Finds

Given a set of transactions (e.g. customer purchases, web sessions, medical records), association rule mining finds rules of the form:

ABA \Rightarrow B

meaning: if a transaction contains item set AA, it is likely to also contain item set BB.

Applications beyond retail:

  • Objects, phrases, and entities co-occurring in images, video, and social media.
  • Medical symptoms and diagnoses.
  • Web page navigation sequences.
  • Social network connections.

Key Measures

Support: the proportion of transactions that contain a given item set XX:

sup(X)={tT:Xt}T\text{sup}(X) = \frac{|\{t \in T : X \subseteq t\}|}{|T|}

A low-support item set is rare. A high-support item set is common.

Confidence: the conditional probability that a transaction contains YY given it contains XX:

conf(XY)=sup(XY)sup(X)\text{conf}(X \Rightarrow Y) = \frac{\text{sup}(X \cup Y)}{\text{sup}(X)}

High confidence means the rule fires reliably when the antecedent is present.

Lift: adjusts confidence for the base rate of YY:

lift(XY)=conf(XY)sup(Y)\text{lift}(X \Rightarrow Y) = \frac{\text{conf}(X \Rightarrow Y)}{\text{sup}(Y)}

Lift >1> 1 indicates a positive association beyond chance.

The Apriori Property

The key insight that makes mining tractable: if an item set is infrequent (below minimum support), all its supersets are also infrequent. This is the downward closure (or antimonotonicity) property.

Apriori uses this to prune the search space:

  1. Find all frequent 1-item sets (items above minimum support).
  2. Use frequent kk-item sets to generate candidate (k+1)(k+1)-item sets.
  3. Prune any candidate whose subsets are not all frequent.
  4. Count support of remaining candidates against the transaction database.
  5. Repeat until no new frequent item sets are found.

This avoids counting supersets of infrequent item sets, which can eliminate a large portion of the candidate space.

Strong Rules

A rule ABA \Rightarrow B is considered strong if it exceeds both a minimum support threshold and a minimum confidence threshold. Finding all strong rules requires:

  1. Finding all frequent item sets (Apriori or FP-Growth).
  2. Generating rules from each frequent item set.
  3. Filtering by minimum confidence.

The challenge: with many items, the number of candidate item sets is exponential. Closed and maximal frequent item sets compact the representation without losing information.

Product Recommendation Example

Given customer purchase history as transactions, a rule such as:

{bread,butter}{jam}\{\text{bread}, \text{butter}\} \Rightarrow \{\text{jam}\}

with support 0.15 and confidence 0.72 means: in 15% of all transactions, customers bought all three items. Of those who bought bread and butter, 72% also bought jam.

This rule is actionable: place jam near bread and butter, or recommend jam to customers who have bread and butter in their cart.

Divide customer types by product category or shop identity to discover segment-specific patterns. Different segments may have very different association rules even on the same product catalogue.

← All posts