Twitch Extensions: Part 1 – An Introduction

This is the first part of a I don’t know how many parts series on Twitch Extensions. We’ll cover how to develop them, how to EBS them, some notes on Designing and what to avoid, some practices for after release care, and touch on some marketing ideas/things for your Extension Website.

So since this is the first part, we’ll cover the basics first. What, Where, Who, How, and Why of an Extension

When is Gamora
Why is Gamora

What is a Twitch Extension

A Twitch Extension is basically an iFrame that allows a developer to create anything they want, as long as it fits within the Guidelines set out by Twitch, and of course it’s Terms of Service and Developer Agreements

This can range from MiniGames, to QnA/Polling extensions, to Community information extensions, to game information extensions, or play with the Streamer extensions

Some examples include

  • Sound Alerts – Lets users pick a sound to play on Stream, can be free or utilize bits/channel points
  • Crowd Control – Provides plugins for a variety of games, to allow people to interact with the game, make it easier/harder for the Broadcaster
  • Cardboard.live – Lets viewers see what cards are in your current deck, and check the state of the game board, without having to spam/ask in chat
  • Borderlands 3 ECHOcast – Lets viewers check out your Borderlands 3 character, and let viewers win extra loot for their own character when the Broadcaster opens Red Chests in game
  • Detroit: Community Play – Ask the community to pick/vote on the option when a multiple choice question appears during the game play of Detroit: Become Human.
  • The Cohhilition – A community interaction extension, that provides access to various Community things (in this case for a single channel), without having the viewer leave the comfort of the Twitch page

Some of these descriptions are super simple to cover the salient points and many will do more things than my summary covers

Check out more extensions at Twitch’s own Extension Discovery

Where is a Twitch Extension

Twitch being a live streaming site, provides to the Broadcaster, a page, and that page will consist of a number of elements.

Depending on if the streamer is live or not the elements on the page will vary slightly. If the Broadcaster is live, you’ll land on the video/chat page, if the Broadcaster is not live, you’ll land on a “Home”/index style page

Now the part that we care about is the “Chat”/live view page, and on that page you will find a number of sections

An example of a Twitch Channel Page
An example of a Twitch Channel Page
  • The Video Player
  • The live Chat
  • The Stream information section – The Title and Category
  • A small about the Broadcaster section
  • The Panels section

A Twitch Extension can be added to a couple of these sections, and has 3 main (and two auxiliary) integration points.

The Main Integration points

  • Video Overlay – The Extension can cover/utilize the whole of the video player
  • Panel – The Extension appears in the panels section below the stream, and has width of 318px and a maximum height of 500px
  • Video Component – Basically a panel that appears over the video player but is locked to the right hand side of the player, it can utilize a varied amount of the player space

The “main” integration points are mutually exclusive, an Extension can only occupy any one of those slots at once

The Auxiliary Integration points

An example of a mobile extension on iOS
An example of a mobile extension on iOS
  • Mobile Panel – The Extension is available on mobile for mobile users to interact, it will replace the chat, and dimensions wise basically similar to a panel on PC (in terms of ratio), but you would have to consider landscape views on tablets as well.
  • Panel popout – Panel Extensions can be opened in a new window and can be resized by the user at will

An extension can be in one of the “main” integration points, and the mobile point.

Who is a Twitch Extension

As part of being on Twitch, extensions are able to use a number of Extension Features, as well as doing more “regular” Twitch stuff. You could run a regular chat bot that runs with your extension, the Twitch Extension Timeout with bits does this in order to run the actual timeout commands on users

So aside from the “regular” stuff like chat bots, Twitch Extensions have access to some additional features

  • Bits Support – Allow Viewers, to exchange bits (a digital good) for various “digital goods” inside Extensions, this could range from an extra vote in polling extensions, or picking a victim in “Timeout With Bits”, or a cool cloak for your character in a game. Revenue generate here is split 80/20 between the Broadcaster/Developer.
  • Subscription Support – Allows the Extension to check the subscription status of a viewer on the channel the extension is installed to, avoiding the need for the Extension Developer to get and maintain oAuth access tokens from the Broadcaster “separately” to the install process of the Extension
  • Identitiy Link – Allows viewers to “login’ to your extension, we’ll cover this more in a later post in the series
  • Chat Capabilities – Allow the Extension to send chat messages (via a HTTP POST request), usually used as a notification system to prompt viewers to perform an action in the extension, like a new poll has started, go vote, for example
  • The Configuration Service – We’ll cover this in a later post as well, but it’s a way to store data on Twitch’s server that you can use in the extension, this might be something like, the name of the Broadcasters Character in a game that you would use in an API request to get information about the character
  • Streamer Allowlist – allows the Extension Developer to restrict whom can install the Extension to their channel

We’ll cover each Capability/feature in future blog posts in the series

How is a Twitch Extension

We’ll cover this more, in depth in later posts, a Twitch Extension is a bundle of files uploaded to the Extension CDN (Content Delivery Network). This needs to include your HTML, JS, CSS, and any static images you want to store on the CDN (pretty handy for background images for panel extensions). Twitch has some restrictions on what an extension can load from external sources, but essentially images are fine, CSS/JS is not, CSS/JS must be local/included.

Those files are uploaded to a sub domain of Twitch, into a particular sub folder tree on that sub domain, which we will cover more in depth in a later post, when we talk about building extensions and a suitable way to test them and some related gotchas.

All Twitch Extensions have their bundles uploaded to the Twitch and before they are released (or updated) to the masses, the Twitch Extension Review team will review the Extension, to ensure it works as intended, there is no major bugs effecting activation, the Extension compiles with the Guidelines and Terms of Services, and most importantly contains nothing malicious to interfere with the Twitch website or the viewer using the extension Computer/device

A Twitch Extension is allowed to communicate offsite, the resource just has to be secured over SSL, this is commonly referred to as an EBS or Extension Backend Service, we’ll cover this more later as well!

Why is a Twitch Extensions?

But Why is a Twitch Extension (any use) I hear you cry?

A Twitch Extension provides ways for the Streamers Community to perform rich interactions, without leaving the Twitch Broadcasters page, which means you keep the Viewer watching the Stream or interacting with Chat, with relatively easy access to Twitch API’s, without long additional steps for Viewer Authentication

Summary

That is it for Part 1 in this series on Twitch Extensions, I’ve cover the basic What, Where, Who, How of Twitch Extensions

Parts will either be weekly or bi-weekly, we will see how we go!

BUT MOTHER I CRAVE VIOLENCE

Well, until I write the next part if you want to read more about the Developer Side of Extensions, you can pop a visit over the to the Documentation or take a look at Twitch’s Introductory Page and you can always join us on the “TwitchDev Discord Server”, visit the Developer Support Page for the current invite link!

Were the sub headings supposed to make sense? No not really.

Why you think you are good enough to even write blog posts on Extensions? I made a one or two of them Extensions of various types.

Twitch API Examples

I spend a lot of time on the Twitch Developer forums and Discord helping out other third party developers. That among other things led to me being asked to become a Twitch Ambassador, which is probably a story for another post.

As part of spending a lot of time helping of Forums/Discord, it become useful to write up some examples in various languages for people to refer to, since some people prefer code examples over documentation, and it’s easier to demonstrate how to tie multiple calls/endpoints together for the desired result.

To that end my GitHub Repo at barrycarlyon/twitch_misc now exists and holds examples from Authentication flows (from Implicit to server access and regular user in-between), extension config/pubsub, and examples for Webhooks and the new Eventsub (which is worth a look!). So if you are looking for some examples do checkout the Repository. Some of the examples can even be tested on GitHub itself via GitHub pages, the examples available are listed in the readme and at the Github Pages site.

Twitch also recently made the requirement that all calls to helix (aka the New API) need to be Authenticated using a Bearer, which made it difficult for Extensions to get the viewers details. So to that end I created a basic example of how to do that in an Extension with a “User Profile Extension” example. Which is at BarryCarlyon/twitch_profile_extension. So this covers a good way to handle that flow.

Right now most of the examples are nodeJS, or PHP, but there are some in Python kicking about!

I’ll be looking at adding more examples and other examples in other languages as we go!

I’m usually really bad at commenting my code as I prefer reading the code, but I made a conscious effort to add useful code comments on these repos!

Google oAuth and offline access

Been doing a lot of various stuff and things for CohhCarnage and some of that stuff has involved building an achievements tracking system for the website.

One of those achievements, is for YouTube Subscription. Where the achievement is awarded to the logged in user, if the user has subscribed to a given YouTube channel, in this case Cohh’s YouTube.

In order to make sure that people can’t “cheat” the system, we ask them to link their Google/YouTube account with the website and use the relevant API to look up their Subscription status.

Initially this worked fine, but I ran into some issues where the oAuth token stored has expired and thus I can’t do a status check, for cases where the user links their YouTube to their Cohhilition Account then doesn’t subscribe on YouTube until after 24 hours later (or some caching issue with Google).

So, the simple fix using Googles PHP Library for oAuth’ing is to just do a

<?php

$client->setAccessType('offline')

Now, this works fine for the most part, you happily get a refresh token, and can thus renew your token.

Then comes a hiccup, if for whatever reason you have offline access type on, and the user has previously authorised the application and it’s offline permission, you DON’T get a refresh token in some cases. Some user cases include:

  • you’ve lost their token,
  • or got a bad one
  • or the user managed to find the authentication loop (again) when they shouldn’t, and thus a new code/token combo is generated

Normally you are using something like:

<?php

$client = new Google_Client();
$client->setClientId($client_id);
$client->setClientSecret($client_secret);
$client->setRedirectUri($redirect_uri);
$client->addScope("email");
$client->addScope("profile"); 
$client->setAccessType('offline');// last forever/give me a refresh

But in order to make sure that you get a refresh_token EVERY time someone goes through the authentication loop, you have to adjust as follows:

<?php

$client = new Google_Client();
$client->setClientId($client_id);
$client->setClientSecret($client_secret);
$client->setRedirectUri($redirect_uri);
$client->addScope("email");
$client->addScope("profile"); 
$client->setAccessType('offline');// last forever/give me a refresh
$client->setApprovalPrompt('force');// force a refresh token return everytime

Apparently, using

'offline'

is supposed to imply

'force'

according to some Stack Overflows posts, but this doesn’t seem the case.

In the end my full Google_Client setup looks like:

<?php

        $client = new Google_Client();
        $client->setAuthConfig($consumer);
        $client->addScope('profile');
        $client->addScope('email');
        $client->addScope('https://www.googleapis.com/auth/youtube.readonly');
        $client->setAccessType('offline');// asks for a refresh token
        $client->setApprovalPrompt('force');// forces the refresh token being returned
        $client->setIncludeGrantedScopes(true);
        $client->setRedirectUri($callback);

Just an odd thing I came across recently that I thought I would write up. Most of the notes here are from Stack Overflow post on the subject

PHP5.6+ cURL and file uploads

Came across something odd today, and thought I’d condense down my Tweets on the subject into a blog post.

Basically, I use cURL and some wacky wacky stuff to upload files to a site over HTTP POST. And since I’d just grabbed PHP 7 from HomeBrew, it had overridden my PHP 5.5 install that comes as standard on OSX 10.10 and thus I cross checked the script with my MacPorts PHP 5.6 install and found the same. (Yes THREE different PHP versions for science…)

“Traditionally” the method for this would be something along the lines of:

<?php

    $ch = curl_init('SOMEURL');
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($ch, CURLOPT_POST, true);
    curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36');

    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

    $data = array(
        'some_file' => '@' . $some_path
    );
    curl_setopt($ch, CURLOPT_POSTFIELDS, $data);

    $r = curl_exec($ch);
    curl_close($ch);

On PHP 5.5 and previous that works file, using a @ at the start of a POST entry would instruct PHP/cURL to treat the data/string as a File Path to Upload.

This behaviour is controlled by the PHP cURL constant of CURLOPT_SAFE_UPLOAD. In PHP 5.6 this constant changed from default FALSE to default TRUE, setting to TRUE means that a string starting @ is treated as a String and not a File Path to upload. The changes are documented on the PHP.net website, but the primary trip up is that most of us just use the defaults and we get tripped up when things change.

So, after trying to set this to FALSE under PHP 5.6 it still wasn’t working, and under PHP 7 you are thrown an error to indicate that you are not allowed to change this constant any more for security reasons, which is fine.

The solution is to use the CURLFile class, which is pretty straightforward:

<?php

    $ch = curl_init('SOMEURL');
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($ch, CURLOPT_POST, true);
    curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36');

    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

    $data = array(
        'some_file' => new CURLFile($some_path)
    );
    curl_setopt($ch, CURLOPT_POSTFIELDS, $data);

    $r = curl_exec($ch);
    curl_close($ch);

This is the truly lazy edition, just chuck a new CURLFile($path) at it, instead of the @. I’m sure CURLFile does more useful stuff, but this was enough to get me back up and running!

Thus endeth this blog post!

Magento – Products showing in the wrong category

I came across something odd a while back, and until recently not had time to investigate the issue.

Products were showing up in categories that they were not explicitly in.

In summary, when you reindex catalog_category_product Magento goes and looks at all the products and the categories that they are in. For the categories that the product are in, it gets the category tree of that category.

So consider this category hierarchy:

  1. Category A
    1. Sub Category A 1
      1. Bottom Category 1
      2. Bottom Category 2
    2. Sub Category A 2
      1. Bottom Category 3
      2. Bottom Category 4
    3. Sub Category A 3
      1. Bottom Category 5
      2. Bottom Category 6
  2. Category B
    1. Sub Category B 1
      1. Bottom Category 7
      2. Bottom Category 8
    2. Sub Category B 2
      1. Bottom Category 9
      2. Bottom Category 10
    3. Sub Category B 3
      1. Bottom Category 11
      2. Bottom Category 12

Now you create a product and put it in Bottom Category 9 and Bottom Category 11. When you reindex it’s put into the list to show up in the category and the parents on that tree, so Category B and Category B 2, and so on for Bottom Category 11.

Now over time we add more products and more categories, and several months later we decide to rearrange our category tree, as we’ve gotten too wide and our needs have changed, or some other reason. Say to something like this, a strikethru indicates a disabled category (as in is_active is 0 (it’s a boolean integer))

  1. Category A
    1. Sub Category A 1
      1. Bottom Category 1
      2. Bottom Category 2
      3. Bottom Category 3
      4. Bottom Category 4
    2. Sub Category A 2
    3. Sub Category A 3
      1. Bottom Category 5
      2. Bottom Category 6
  2. Category B
    1. Sub Category B 1
      1. Bottom Category 7
      2. Bottom Category 8
    2. Sub Category B 2
      1. Bottom Category 9
      2. Bottom Category 10
    3. Sub Category B 3
      1. Bottom Category 11
      2. Bottom Category 12

Now of course we reindex, and the problems start.

The Problem

We have a problem, the Product correctly shows in Bottom Category 11 and it’s parents, but it is still showing up in Sub Category A 1 and it’s parents in error.

It seems that the reindexer takes categories that are disabled (as in is_active is false), and their tree, into account when building the list of category pages that a product can be displayed on.

So in order to fix this I went deep into Magento core to go and see which query was at fault. Turns out that it’s not exactly broken, it’s just Magento doesn’t check for it a give category is active or not when it builds the list.

The query at fault is:

INSERT INTO `catalog_category_anc_products_index_idx` (`category_id`, `product_id`, `position`) SELECT STRAIGHT_JOIN DISTINCT `ca`.`category_id`, `cp`.`product_id`, MIN(IF(ca.category_id = ce.entity_id, `cp`.`position`, (`ce`.`position` + 1) * (`ce`.`level` + 1 * 10000) + `cp`.`position`)) AS `position` FROM `catalog_category_anc_categs_index_idx` AS `ca`
 INNER JOIN `catalog_category_entity` AS `ce` ON `ce`.`path` LIKE `ca`.`path` OR ce.entity_id = ca.category_id
 INNER JOIN `catalog_category_product` AS `cp` ON cp.category_id = ce.entity_id
 INNER JOIN `catalog_category_product_index_enbl_idx` AS `pv` ON pv.product_id = cp.product_id GROUP BY `ca`.`category_id`,
	`cp`.`product_id`

As you can see it just collects from catalog_category_entity, which has a table structure of:

mysql> describe catalog_category_entity;
+------------------+----------------------+------+-----+---------+----------------+
| Field            | Type                 | Null | Key | Default | Extra          |
+------------------+----------------------+------+-----+---------+----------------+
| entity_id        | int(10) unsigned     | NO   | PRI | NULL    | auto_increment |
| entity_type_id   | smallint(5) unsigned | NO   |     | 0       |                |
| attribute_set_id | smallint(5) unsigned | NO   |     | 0       |                |
| parent_id        | int(10) unsigned     | NO   |     | 0       |                |
| created_at       | timestamp            | YES  |     | NULL    |                |
| updated_at       | timestamp            | YES  |     | NULL    |                |
| path             | varchar(255)         | NO   | MUL | NULL    |                |
| position         | int(11)              | NO   |     | 0       |                |
| level            | int(11)              | NO   | MUL | 0       |                |
| children_count   | int(11)              | NO   |     | 0       |                |
+------------------+----------------------+------+-----+---------+----------------+
10 rows in set (0.00 sec)

As you can see that table structure is somewhat limited. Since this is the bare bones basics of a category. And thus the query to collect categories of a product is in, considers inactive categories as if doesn’t have this information.

The Solution

The fix here is simple. Find where the query is defined and take into account the is_active category attribute, and we’ll do it the Magento way. Rather than hardcoding entity_id’s.

A quick note about how I found the query. I opened up public_html/lib/Varien/Db/Adapter/Pdo/Mysql.php and changed the variable $_debug from false to true, it’s around about line 103. This will output all the SQL queries to public_html/var/debug/pdo_mysql.log.

After that I just grep’ed the source catalog_category_anc_products_index_idx which told me where the table was defined in a xml config, and then grep’ed again for the internal reference name, (there was much much grep’ing for bit’s and pieces).

The query runs under the reindexAll function of Mage_Catalog_Model_Resource_Category_Indexer_Product. So basically take the ENTIRE contents of this function and setup a Model Resource override, and fix the query.

Module Basics

Creating our Module in the directory local/Company/Catalog

Normally I wouldn’t cover this, but we are overriding a resource which is a little different from just overriding a model.

etc/config.xml
<?xml version="1.0"?>
<config>
    <modules>
        <Company_Catalog>
            <version>0.0.1</version>
        </Company_Catalog>
    </modules>

    <global>
        <models>
            <companycatalog>
                <class>Company_Catalog_Model</class>
            </companycatalog>

            <catalog_resource_eav_mysql4>
                <rewrite>
                    <category_indexer_product>Company_Catalog_Model_Resource_Category_Indexer_Product</category_indexer_product>
                </rewrite>
            </catalog_resource_eav_mysql4>
        </models>
    </global>
</config>
Model/Resource/Category/Indexer/Product.php

Literally just a copy of the contents of reindexAll with some modifications. Starting at // Query Correction and ending at // end Query Correction.

<?php

class Company_Catalog_Model_Resource_Category_Indexer_Product extends Mage_Catalog_Model_Resource_Category_Indexer_Product
{
    /**
     * Rebuild all index data
     *
     * @return Mage_Catalog_Model_Resource_Category_Indexer_Product
     */
    public function reindexAll()
    {
        $this->useIdxTable(true);
        $this->beginTransaction();
        try {
            $this->clearTemporaryIndexTable();
            $idxTable = $this->getIdxTable();
            $idxAdapter = $this->_getIndexAdapter();
            $stores = $this->_getStoresInfo();
            /**
             * Build index for each store
             */
            foreach ($stores as $storeData) {
                $storeId    = $storeData['store_id'];
                $websiteId  = $storeData['website_id'];
                $rootPath   = $storeData['root_path'];
                $rootId     = $storeData['root_id'];
                /**
                 * Prepare visibility for all enabled store products
                 */
                $enabledTable = $this->_prepareEnabledProductsVisibility($websiteId, $storeId);
                /**
                 * Select information about anchor categories
                 */
                $anchorTable = $this->_prepareAnchorCategories($storeId, $rootPath);
                /**
                 * Add relations between not anchor categories and products
                 */
                $select = $idxAdapter->select();
                /** @var $select Varien_Db_Select */
                $select->from(
                    array('cp' => $this->_categoryProductTable),
                    array('category_id', 'product_id', 'position', 'is_parent' => new Zend_Db_Expr('1'),
                        'store_id' => new Zend_Db_Expr($storeId))
                )
                ->joinInner(array('pv' => $enabledTable), 'pv.product_id=cp.product_id', array('visibility'))
                ->joinLeft(array('ac' => $anchorTable), 'ac.category_id=cp.category_id', array())
                ->where('ac.category_id IS NULL');

                $query = $select->insertFromSelect(
                    $idxTable,
                    array('category_id', 'product_id', 'position', 'is_parent', 'store_id', 'visibility'),
                    false
                );
                $idxAdapter->query($query);

                /**
                 * Assign products not associated to any category to root category in index
                 */

                $select = $idxAdapter->select();
                $select->from(
                    array('pv' => $enabledTable),
                    array(new Zend_Db_Expr($rootId), 'product_id', new Zend_Db_Expr('0'), new Zend_Db_Expr('1'),
                        new Zend_Db_Expr($storeId), 'visibility')
                )
                ->joinLeft(array('cp' => $this->_categoryProductTable), 'pv.product_id=cp.product_id', array())
                ->where('cp.product_id IS NULL');

                $query = $select->insertFromSelect(
                    $idxTable,
                    array('category_id', 'product_id', 'position', 'is_parent', 'store_id', 'visibility'),
                    false
                );
                $idxAdapter->query($query);

                /**
                 * Prepare anchor categories products
                 */
                $anchorProductsTable = $this->_getAnchorCategoriesProductsTemporaryTable();
                $idxAdapter->delete($anchorProductsTable);

                $position = 'MIN('.
                    $idxAdapter->getCheckSql(
                        'ca.category_id = ce.entity_id',
                        $idxAdapter->quoteIdentifier('cp.position'),
                        '('.$idxAdapter->quoteIdentifier('ce.position').' + 1) * '
                        .'('.$idxAdapter->quoteIdentifier('ce.level').' + 1 * 10000)'
                        .' + '.$idxAdapter->quoteIdentifier('cp.position')
                    )
                .')';

                // Query Correction

                // load the is_active category attribute
                $attributeModel = Mage::getModel('eav/entity_attribute')->loadByCode(Mage_Catalog_Model_Category::ENTITY, 'is_active');
                $attributeTable = 'catalog_category_entity_' . $attributeModel->getBackendType();

                $select = $idxAdapter->select()
                ->useStraightJoin(true)
                ->distinct(true)
                ->from(array('ca' => $anchorTable), array('category_id'))
                ->joinInner(
                    array('ce' => $this->_categoryTable),
                    $idxAdapter->quoteIdentifier('ce.path') . ' LIKE ' .
                    $idxAdapter->quoteIdentifier('ca.path') . ' OR ce.entity_id = ca.category_id',
                    array()
                )

                ->joinInner(
                    array('cp' => $this->_categoryProductTable),
                    'cp.category_id = ce.entity_id',
                    array('product_id')
                )

                // left join the attribute
                ->joinInner(
                    array('ccei' => $attributeTable),
                    'ccei.entity_id = cp.category_id',
                    array()//select nothing
                )

                ->joinInner(
                    array('pv' => $enabledTable),
                    'pv.product_id = cp.product_id',
                    array('position' => $position)
                )

                ->where('ccei.attribute_id=' . $attributeModel->getAttributeId())//is_active
                ->where('ccei.value=1')

                ->group(array('ca.category_id', 'cp.product_id'));
                $query = $select->insertFromSelect($anchorProductsTable,
                    array('category_id', 'product_id', 'position'), false);
                $idxAdapter->query($query);

                // end Query Correction

                /**
                 * Add anchor categories products to index
                 */
                $select = $idxAdapter->select()
                ->from(
                    array('ap' => $anchorProductsTable),
                    array('category_id', 'product_id',
                        'position', // => new Zend_Db_Expr('MIN('. $idxAdapter->quoteIdentifier('ap.position').')'),
                        'is_parent' => $idxAdapter->getCheckSql('cp.product_id > 0', 1, 0),
                        'store_id' => new Zend_Db_Expr($storeId))
                )
                ->joinLeft(
                    array('cp' => $this->_categoryProductTable),
                    'cp.category_id=ap.category_id AND cp.product_id=ap.product_id',
                    array()
                )
                ->joinInner(array('pv' => $enabledTable), 'pv.product_id = ap.product_id', array('visibility'));

                $query = $select->insertFromSelect(
                    $idxTable,
                    array('category_id', 'product_id', 'position', 'is_parent', 'store_id', 'visibility'),
                    false
                );
                $idxAdapter->query($query);

                $select = $idxAdapter->select()
                    ->from(array('e' => $this->getTable('catalog/product')), null)
                    ->join(
                        array('ei' => $enabledTable),
                        'ei.product_id = e.entity_id',
                        array())
                    ->joinLeft(
                        array('i' => $idxTable),
                        'i.product_id = e.entity_id AND i.category_id = :category_id AND i.store_id = :store_id',
                        array())
                    ->where('i.product_id IS NULL')
                    ->columns(array(
                        'category_id'   => new Zend_Db_Expr($rootId),
                        'product_id'    => 'e.entity_id',
                        'position'      => new Zend_Db_Expr('0'),
                        'is_parent'     => new Zend_Db_Expr('1'),
                        'store_id'      => new Zend_Db_Expr($storeId),
                        'visibility'    => 'ei.visibility'
                    ));

                $query = $select->insertFromSelect(
                    $idxTable,
                    array('category_id', 'product_id', 'position', 'is_parent', 'store_id', 'visibility'),
                    false
                );

                $idxAdapter->query($query, array('store_id' => $storeId, 'category_id' => $rootId));
            }

            $this->syncData();

            /**
             * Clean up temporary tables
             */
            $this->clearTemporaryIndexTable();
            $idxAdapter->delete($enabledTable);
            $idxAdapter->delete($anchorTable);
            $idxAdapter->delete($anchorProductsTable);
            $this->commit();
        } catch (Exception $e) {
            $this->rollBack();
            throw $e;
        }
        return $this;
    }
}

The fix in depth

The first thing done here is to Magento Safe load the Category Attribute details for is_active.

                // load the is_active category attribute
                $attributeModel = Mage::getModel('eav/entity_attribute')->loadByCode(Mage_Catalog_Model_Category::ENTITY, 'is_active');
                $attributeTable = 'catalog_category_entity_' . $attributeModel->getBackendType();

After this join,

                ->joinInner(
                    array('cp' => $this->_categoryProductTable),
                    'cp.category_id = ce.entity_id',
                    array('product_id')
                )

we need to add another join, in this case we are joining to the table that contains the is_active information, it’s usually catalog_category_entity_int but we Magento Safe loaded it above.

                // left join the attribute
                ->joinInner(
                    array('ccei' => $attributeTable),
                    'ccei.entity_id = cp.category_id',
                    array()//select nothing
                )

Finally we need to adjust to add a where to clean up the join and select active categories only, after the last join and before the group function call:

                ->where('ccei.attribute_id=' . $attributeModel->getAttributeId())//is_active
                ->where('ccei.value=1')

This changes the query to:

INSERT INTO `catalog_category_anc_products_index_idx` (`category_id`, `product_id`, `position`) SELECT STRAIGHT_JOIN DISTINCT `ca`.`category_id`, `cp`.`product_id`, MIN(IF(ca.category_id = ce.entity_id, `cp`.`position`, (`ce`.`position` + 1) * (`ce`.`level` + 1 * 10000) + `cp`.`position`)) AS `position` FROM `catalog_category_anc_categs_index_idx` AS `ca`
 INNER JOIN `catalog_category_entity` AS `ce` ON `ce`.`path` LIKE `ca`.`path` OR ce.entity_id = ca.category_id
 INNER JOIN `catalog_category_product` AS `cp` ON cp.category_id = ce.entity_id
 INNER JOIN `catalog_category_entity_int` AS `ccei` ON ccei.entity_id = cp.category_id
 INNER JOIN `catalog_category_product_index_enbl_idx` AS `pv` ON pv.product_id = cp.product_id WHERE (ccei.attribute_id=119) AND (ccei.value=1) GROUP BY `ca`.`category_id`,
	`cp`.`product_id`

And Finally

Now with this module in place (remember to activate it in etc/modules) clear your cache and reindex. And hey presto, the product is no longer showing up in a category that is a parent of a category that the product is in that is now disabled. (Bit of a mouthful that)

In Summary

Much nicer, and not too much of a performance hit really.

Overall it means your products only show in the right active categories.

Generally it is only a issue when you disable a category and don’t delete it and then rearrange things leaving remnants of the old category tree floating about, so basically anything a common user would do.

I don’t tend to delete stuff myself, in case things get reactivated, and I’ll reuse the records.

Hopefully this is of help, rather than deleting records!