The End of Twitch PubSub

Finally Twitch EventSub is at feature parity with third party supported/documented PubSub, that the end of PubSub has been announced.

In a forum post over on Twitch Developers it has been announced the schedule for the shutdown of Twitch PubSub for access by third party developers.

The final death date is April 14th 2025.

Which I guess means it’s time for me to complete my own migration from PubSub to EventSub. Which at least is JUST ABOUT planned….

At least for 90% of what I need it’s ALL USERNOTICE as an EventSub topic, which is this case is channel.chat.notification and then whatever I choose to go with for consuming cheering.

So then for anything else I’ll need that doesn’t exist, either from it not existing in the first place OR a undocumented topic I might be using, off to uservoice I’ll go!

Off hand I think there is only one thing but we’ll see when I work on and complete my migration! The only other barrier is gonna be authentication but I have authentication or can get it for everything I need.

It is worth noting that Extension PubSub is unaffected, as that for the most part is a completely different system that just shares the same name.

Topic Migration notes

Edit: I guess I should add some notes about migration

PubSubEventSub
channel-bits-events-v1.<channel_id>channel.cheer or channel.chat.message
channel-bits-events-v2.<channel_id>channel.cheer or channel.chat.message
channel-bits-badge-unlocks.<channel_id>No Route, or inferred from channel.cheer
channel-points-channel-v1.<channel_id>channel.channel_points_custom_reward.add
channel-subscribe-events-v1.<channel_id>channel.chat.notification is what I will use for my use case of overlay/on stream alerts. Other type segregated topics exist
automod-queue.<moderator_id>.<channel_id>automod.message.hold
chat_moderator_actions.<user_id>.<channel_id>channel.moderate
low-trust-users.<channel_id>.<suspicious_user_id>automod.message.hold and automod.message.update
user-moderation-notifications.<current_user_id>.<channel_id>channel.chat.user_message_hold
whispers.<user_id>user.whisper.message

Twitch Extensions: Part 3 – The Architecture of an Extension

In Part 2 we spoke a fair bit about “the path and file name of the HTML file you wish to load, it is a relative path to the Testing Base URI/final upload URL”, and this week we will cover why I mentioned that every time!

So, this week The Architecture of a Twitch Extension!

An example of Architecture from Wikipedia, The Pyramids at Gizah - © CC BY-SA 2.0
An example of Architecture from Wikipedia, The Pyramids at Gizah – © CC BY-SA 2.0

Basic Architecture

We have covered before that essentially a Twitch Extension is a “Website” that is iFrame’ed onto the Twitch Page, into an integration slot, but what does that actually look like?

The Twitch Architecture Overview
The Twitch Architecture Overview – According to Twitch

This is the “stack” that represents how an Extension is loaded, to illustrate this, we’ll use FlightSimTrack installed to the FlightSimTrack Twitch channel

  • First we have the Twitch Channel page – the “Browser” above
  • Then the Twitch Extension “supervisor”, this basically handles any handshaking between the outer Twitch and the inner Extension, we can’t cover more about it as it’s not documented anywhere, and that is my theory on what it does! It is mentioned in passing in the documentation
  • Then the Extension itself – the “iFrame” above

Relative path? WHY?

So why “relative path”? Well here is what the URL to the Panel looks like for FlightSimTrack

https://q6gmlap07mpxekhpspevz2sq5xjth7.ext-twitch.tv/q6gmlap07mpxekhpspevz2sq5xjth7/ 0.0.2/78753d6eeea69840398d8e46ff018e3b/panel/index.html?anchor=panel&language=en&locale=en-US&mode=viewer&state=released&platform=web

The first thing we’ll notice is that the index.html is in a sub folder of the domain. And NOW you know why we said “the path and file name of the HTML file you wish to load, it is a relative path to the Testing Base URI/final upload URL” every, single, time. And it’s a common hiccup that Extension Developers run into, a number of people come into the Developer Discord needing extension help, and they either made a mistake packaging the Zip for upload (we’ll cover Upload procedure in a future post), or the developer has used an “absolute” link to a JS/CSS file (starting /style.css or https://example.com/style.css for example, instead of just style.css).

But lets break down this URL, into it’s parts

URL breakdown

https://extClientID.ext-twitch.tv/extClientID/Version/md5OfZip/path_to_file_for_integration_point.html?querystring

We’ll cover the Query string parameters more in a future post, but you can read about them in the Extension Documentation Reference

The most important one to be aware of right now is anchor which tells you which integration slot this is. In this example it’s a panel, and FlightSimTrack’s panel is configured to load panel/index.html for this integration slot.

  • extClientID is obvious, it’s the ClientID of your Extension
  • Extensions live on the domain extClientID.ext-twitch.tv, why a different domain for each extension and separate to Twitch? This prevents any issues with Cookies from the “main” Twitch Website, so there won’t be any session hijacking or other crazy things, you can read a little more on that in the documentation
  • Version – the version of the extension that these files are for
  • md5 of the zip file that was uploaded, during testing you might go to Hosted test/local test (two Stages of the Extension Lifecycle that we’ll cover next week) a few times, so the md5 will change, this is a easy “cache defeat” when you are Hosted testing
  • Final part is the path and html file you specified in the console to load for this integration slot

Asset Hosting for FlightSimTrack
Asset Hosting for FlightSimTrack

The URL structure is the same for every integration point, and all files are considered “public”, which is something to consider when building in things for Channel Moderators to use.

That about covers everything for the frontend

Well, what about the Backend?

Oh you want the backend to do you?

The Twitch Architecture Overview
The Twitch Architecture Overview – According to Twitch again!

First of all it’s important to note than a Backend, also referred to as EBS or Extension Backend Service, is entirely optional, depending on your Extension, what/how it functions or what it does, it might not even need an EBS, either from calling Public API’s or from it being completely self contained.

The SSL Requirement

But for those Extensions that need an EBS, there is essentially one rule, it must serves it’s contents over SSL, this isn’t just good for Security, but is required as the Twitch Website is over SSL and browsers will reject Mixed Content (loading non SSL content from SSL).

So

  • Loading from the EBS? Needs to be https://urltoyourebs/
  • Loading from a Public API? Needs to be https://someapi/
  • Loading from a Websocket? Needs to be wss://somesocket/
  • Loading an image from a CDN? Needs to be https://somecdn/image.png

To illustrate some examples:

  • FlightSimTrack viewer, nothing at all (images are on the Twitch Extensions CDN and map tiles come from map tile vendors over SSL)
  • FlightSimTrack config, EBS lives at https://twitch.extensions.barrycarlyon.co.uk/
  • CohhCarnage Panel Extension, uses an API at https://extensions.cohhilition.com/ and a socket at wss://extensions.cohhilition.com/
  • Dropped Frames video Extension, no API or socket at all, but images from a CDN, usually Twitter/Twitch avatars directly from Twitter/Twitch over HTTPs, it receives from Twitch PubSub only.

You can see that they all all work over endpoints/routes protected by SSL.

We’ll cover how to build a “custom” EBS, and verifying identity in a future post, this week it’s just “Architecture”, the long and short of it, is you are basically just building a Custom API to interact between your front end and backend. A common way I describe Extensions, is as follows

Extensions are a website, where the front end is on a different server to the backend, and you can’t do server side rendering

– Barry Carlyon on “Well what is an Extension anyway!”

Languages

Whilst a Twitch extension is “limited” to HTML and JavaScript, your EBS/Backend API can be in any language you want. But when it comes to examples, samples and boilerplates you’ll generally find them in JavaScript (via NodeJS) or Go, (since Twitch is a “Go House”). We’ll be using NodeJS in this series, merely because it’s potentially the easiest for people to understand and convert knowledge between different languages.

If you want to jump ahead, you can check out the Developer Rig, which will provide you access to a number of Extension Examples, which are also available on the TwitchDev GitHub, or my own Twitch Profile Extension (which demonstrates how to call the Twitch API via an “EBS” proxy), these examples are designed to highlight a specific function of Twitch Extensions, rather than a “practical” example.

Real Certificates only

Naturally this needs to use “real” SSL Certificates, as apposed to self signed, which is “bearable” for testing with. Generally you’ll find Extension Developers will whip out LetsEncrypt as it’s free and easy to setup, and straightforward to setup a testing system.

Summary

So that covers the Architecture of an Extension

Not really sure what else to add, here is a photo of my cat, Sprite.

Sprite has invaded this Blog Post
Sprite has invaded this Blog Post

What about next week?

Next week we will be looking at the Developer Rig and how/when to use it and setting up a Dev environment suitable for developing a Twitch Extension!

BUT MOTHER I CRAVE VIOLENCE

Well, until I write the next part if you want to read more about the Developer Side of Extensions, you can pop a visit over the to the Documentation or take a look at Twitch’s Introductory Page and you can always join us on the “TwitchDev Discord Server”, visit the Developer Support Page for the current invite link!

Why you think you are good enough to even write blog posts on Extensions? I made a one or two of them Extensions of various types.

Twitch Extensions: Part 2 – Integration Points

Welcome to part two of the Twitch Extension Series of Posts.

This week, we’ll be talking a little on Integration Points!

What integration points does a Twitch Extension have?

We touched on this in Part 1

Twitch provides developers with three main, two auxiliary, and two Broadcaster only integration points, and of these integration points Twitch lets us pick the HTML (and thus Javascript or CSS) file(s) we wish to load for all of these. (Well except Panel Popout, screw that guy, it owes me a tenner).

Hang on just a minute

What? OH! You want to know how to create an Extension first or where to set the HTML to be loaded for each view? Yes, that would make a little more sense, wouldn’t it!

Extensions can be created, configured, and release via the Twitch Developer Console.

The Developer console can be found at https://dev.twitch.tv/console

After logging in you’ll generally land on the Summary page that will list you current Extensions, Applications and any Games/Categories you may “own” on Twitch from being part of an organization

An example of the Twitch Developer Dashboard Overview
An example of the Twitch Developer Dashboard Overview

We’ll ignore everything else since we are only interested in Extensions.

Twitch Developer Console for Extensions
Twitch Developer Console for Extensions

If you click Extensions then “Create Extension” it’ll take you through a short “Setup Wizard” before presenting you with your ClientID for the Extension. We’ll do that now so we have an Extension to play with during this series!

PRO TIP: After starting the Wizard, please finish the wizard, even if you intend to change everything later.

  • Asks for your Extension Name, then Click Continue
  • Asks you to pick the views/integration points you want, which you can change later
  • Provide a Version number, 0.0.1 will suffice for now
  • Add additional details such as the Extension description and contact details for you
  • Now hit “Create Extension Version”

Twitch will send you an email to verify the provided contact details, so click the link in those email(s).

We’ll cover most of the other fields in a future post, but today we are interested in integration points.

Twitch will now have dropped us on the status page for our Extension

It’s important to note that you cannot use “Twitch” in the name of you Extension, and the name needs to be unique across Extensions and Applications across all of Twitch

So Integration Points?

On the Extension Status page, hit “Asset Hosting”, this will take us to the page to configure our “Extension Views” and what html to load. Twitch will prefill with something sensible, but you can use anything you want

"Asset Hosting" section of a Twitch Extension Console
“Asset Hosting” section of a Twitch Extension Console

At the top is the Testing Base URI, we are going to ignore this for now, but we’ll be covering it next week, when we might actually start building.

Below that comes the section that lets you pick via Checkbox, which views (for the viewer) you wish to enable. And the settings for each view.

Panel

A Panel is rendered below the stream, in the panels section, Broadcasters have “some” control over where a panel extension will appear in relation to the other panels, Twitch liks to jumble things round sometimes, but generally Panel Extensions are pretty sticky and reflect the broadcasters choice.

A panel has the following settings available

  • Panel Viewer Path – the path and file name of the HTML file you wish to load, it is a relative path to the Testing Base URI/final upload URL
  • Panel Height – a panel extension is fixed Width (318px), but the developer can choose what height to use, ranging from 100px to 500px

A Panel extension, can be popped out (one of the Auxiliary Integration points we mentioned), which will start at 318px wide and the specified height, but the Viewer can then resize this window. It’s something to be aware of when building your extension, and offers the ability to use Responsive design

Video – Fullscreen

A Video Fullscreen extension will cover the whole stream, so the developer and designer can utilize the whole stream. Usually it’s safe to assume that the size is 1920px x 1080px, and you can scale as needed, we’ll cover some ways to handle this in a future post

A video Fullscreen only has one setting, the HTML file you wish to load, it is a relative path to the Testing Base URI/final upload URL

Mobile

YES, THE CHECKBOXES GO: Panel, Video – Full, Video – Comp, Mobile but the page goes in a different order

The Mobile view is presented to Viewers using a Mobile device in the Twitch App for that device, such as an Apple iPhone, Apple Table, Android phone and so on.

Like Video Fullscreen there is only one option here, the HTML file to load, it is a relative path to the Testing Base URI/final upload URL

Mobile you’ll need to practice Responsive design as dimensions will vary by device and orientation of the device

Video – Component Viewer Path

A video Component roughly works similar to a Panel, but it defaults to being “closed” and can be manually opened by the Viewer by clicking the relevant icon in the “taskbar”, the Taskbar is presented to the viewer on the right of a Live Stream.

The Extensions Taskbar, left is shown a Video Component Extension in the closed state and right is the open state

The following options are available

  • Video Component Viewer Path – the path and file name of the HTML file you wish to load, it is a relative path to the Testing Base URI/final upload URL
  • Video Component Sizing Parameters – there are a number of options here we’ll cover this more in depth at a future post as it can get fun!

Non Viewer integration Points

That covers all the Viewer intergration points and their options in summary

Next we have the “Broadcaster” Integration points, there are two of them, both have the same available settings, the path and file name of the HTML file you wish to load, it is a relative path to the Testing Base URI/final upload URL

Configuration view

This is the main/one off configuration view, when a broadcaster first installs your Extension they will be prompted to ask if they want to visit the Configuration page or not. This page is used for one off or infrequent configuration

The Extension Config prompt during install
The Extension Config prompt during install

In some cases, developers can also block an extension being activated if the configuration has not been done, usually this would be used for things such as providing an API key to access another API, or asking the broadcaster what their Destiny 2/other game Character name is (for example)

The Twitch Extension configuration page for FlightSimTrack
The Twitch Extension configuration page for FlightSimTrack, this example provides additional setup instructions and the API Access Key for the Companion Desktop App

It can be reached from the install flow for an extension and from the “Cog” Icon on an Extension in the Extension Manager section of the Dashboard

In terms of Dimensions it’s usually Landscape but a broadcaster can resize it pretty narrow, so you’ll need to be responsive or provide a minimum width

Live Configuration View

The Live configration view is to be used by the broadcaster for common tasks or things that need to be done during a Stream, examples of this would be to start a Poll for a poll extension, or a “I’m starting a round” in a mini games extension.

The Live configuration view is reached by the Broadcaster from the “Quick Actions” Section of their “Stream Manager”

A Twitch Dashboard Quick Action Button
An example of a Quick Action Extension Button

When clicked, a new window will open, which will contain the Quick Action/Live Configuration View, it’s resizable by the broadcaster so you’ll need to practice responsive design again here

An Opened Quick Action
EliteTrack’s Quick Action view provides easy access to a summary of the data sent, and prompts the user about Bug Reporting

Summary

And that covers the various integration points in a bit more depth compared to last weeks post!

Next week, we’ll be covering “Getting Started Building” I think and why every time I mention a HTML file did I follow it with: “the path and file name of the HTML file you wish to load, it is a relative path to the Testing Base URI/final upload URL”

BUT MOTHER I CRAVE VIOLENCE

Well, until I write the next part if you want to read more about the Developer Side of Extensions, you can pop a visit over the to the Documentation or take a look at Twitch’s Introductory Page and you can always join us on the “TwitchDev Discord Server”, visit the Developer Support Page for the current invite link!

Why you think you are good enough to even write blog posts on Extensions? I made a one or two of them Extensions of various types.

The Future and all things change?!

Long time no write.

Hopefully gonna start writing again, (not the first time I’ve said that, so we’ll see how we go!)

Anywho, I do currently find myself unemployed after quitting my previous Job on Monday, nothing on my part but the actions of others leading to the actions of others forcing my hand.

I am currently deciding what to do next!

If you know whom I used to work for you can go look it up yourself, I don’t feel the need to talk about it myself.

I am continuing to do various bits and pieces for the Streamers I work with on Twitch, so nothing has changed there, I’m just out of full time work!

Watch this space is the TLDR!

In the mean time, go checkout my Twitch Extensions

Magento – Products showing in the wrong category

I came across something odd a while back, and until recently not had time to investigate the issue.

Products were showing up in categories that they were not explicitly in.

In summary, when you reindex catalog_category_product Magento goes and looks at all the products and the categories that they are in. For the categories that the product are in, it gets the category tree of that category.

So consider this category hierarchy:

  1. Category A
    1. Sub Category A 1
      1. Bottom Category 1
      2. Bottom Category 2
    2. Sub Category A 2
      1. Bottom Category 3
      2. Bottom Category 4
    3. Sub Category A 3
      1. Bottom Category 5
      2. Bottom Category 6
  2. Category B
    1. Sub Category B 1
      1. Bottom Category 7
      2. Bottom Category 8
    2. Sub Category B 2
      1. Bottom Category 9
      2. Bottom Category 10
    3. Sub Category B 3
      1. Bottom Category 11
      2. Bottom Category 12

Now you create a product and put it in Bottom Category 9 and Bottom Category 11. When you reindex it’s put into the list to show up in the category and the parents on that tree, so Category B and Category B 2, and so on for Bottom Category 11.

Now over time we add more products and more categories, and several months later we decide to rearrange our category tree, as we’ve gotten too wide and our needs have changed, or some other reason. Say to something like this, a strikethru indicates a disabled category (as in is_active is 0 (it’s a boolean integer))

  1. Category A
    1. Sub Category A 1
      1. Bottom Category 1
      2. Bottom Category 2
      3. Bottom Category 3
      4. Bottom Category 4
    2. Sub Category A 2
    3. Sub Category A 3
      1. Bottom Category 5
      2. Bottom Category 6
  2. Category B
    1. Sub Category B 1
      1. Bottom Category 7
      2. Bottom Category 8
    2. Sub Category B 2
      1. Bottom Category 9
      2. Bottom Category 10
    3. Sub Category B 3
      1. Bottom Category 11
      2. Bottom Category 12

Now of course we reindex, and the problems start.

The Problem

We have a problem, the Product correctly shows in Bottom Category 11 and it’s parents, but it is still showing up in Sub Category A 1 and it’s parents in error.

It seems that the reindexer takes categories that are disabled (as in is_active is false), and their tree, into account when building the list of category pages that a product can be displayed on.

So in order to fix this I went deep into Magento core to go and see which query was at fault. Turns out that it’s not exactly broken, it’s just Magento doesn’t check for it a give category is active or not when it builds the list.

The query at fault is:

INSERT INTO `catalog_category_anc_products_index_idx` (`category_id`, `product_id`, `position`) SELECT STRAIGHT_JOIN DISTINCT `ca`.`category_id`, `cp`.`product_id`, MIN(IF(ca.category_id = ce.entity_id, `cp`.`position`, (`ce`.`position` + 1) * (`ce`.`level` + 1 * 10000) + `cp`.`position`)) AS `position` FROM `catalog_category_anc_categs_index_idx` AS `ca`
 INNER JOIN `catalog_category_entity` AS `ce` ON `ce`.`path` LIKE `ca`.`path` OR ce.entity_id = ca.category_id
 INNER JOIN `catalog_category_product` AS `cp` ON cp.category_id = ce.entity_id
 INNER JOIN `catalog_category_product_index_enbl_idx` AS `pv` ON pv.product_id = cp.product_id GROUP BY `ca`.`category_id`,
	`cp`.`product_id`

As you can see it just collects from catalog_category_entity, which has a table structure of:

mysql> describe catalog_category_entity;
+------------------+----------------------+------+-----+---------+----------------+
| Field            | Type                 | Null | Key | Default | Extra          |
+------------------+----------------------+------+-----+---------+----------------+
| entity_id        | int(10) unsigned     | NO   | PRI | NULL    | auto_increment |
| entity_type_id   | smallint(5) unsigned | NO   |     | 0       |                |
| attribute_set_id | smallint(5) unsigned | NO   |     | 0       |                |
| parent_id        | int(10) unsigned     | NO   |     | 0       |                |
| created_at       | timestamp            | YES  |     | NULL    |                |
| updated_at       | timestamp            | YES  |     | NULL    |                |
| path             | varchar(255)         | NO   | MUL | NULL    |                |
| position         | int(11)              | NO   |     | 0       |                |
| level            | int(11)              | NO   | MUL | 0       |                |
| children_count   | int(11)              | NO   |     | 0       |                |
+------------------+----------------------+------+-----+---------+----------------+
10 rows in set (0.00 sec)

As you can see that table structure is somewhat limited. Since this is the bare bones basics of a category. And thus the query to collect categories of a product is in, considers inactive categories as if doesn’t have this information.

The Solution

The fix here is simple. Find where the query is defined and take into account the is_active category attribute, and we’ll do it the Magento way. Rather than hardcoding entity_id’s.

A quick note about how I found the query. I opened up public_html/lib/Varien/Db/Adapter/Pdo/Mysql.php and changed the variable $_debug from false to true, it’s around about line 103. This will output all the SQL queries to public_html/var/debug/pdo_mysql.log.

After that I just grep’ed the source catalog_category_anc_products_index_idx which told me where the table was defined in a xml config, and then grep’ed again for the internal reference name, (there was much much grep’ing for bit’s and pieces).

The query runs under the reindexAll function of Mage_Catalog_Model_Resource_Category_Indexer_Product. So basically take the ENTIRE contents of this function and setup a Model Resource override, and fix the query.

Module Basics

Creating our Module in the directory local/Company/Catalog

Normally I wouldn’t cover this, but we are overriding a resource which is a little different from just overriding a model.

etc/config.xml
<?xml version="1.0"?>
<config>
    <modules>
        <Company_Catalog>
            <version>0.0.1</version>
        </Company_Catalog>
    </modules>

    <global>
        <models>
            <companycatalog>
                <class>Company_Catalog_Model</class>
            </companycatalog>

            <catalog_resource_eav_mysql4>
                <rewrite>
                    <category_indexer_product>Company_Catalog_Model_Resource_Category_Indexer_Product</category_indexer_product>
                </rewrite>
            </catalog_resource_eav_mysql4>
        </models>
    </global>
</config>
Model/Resource/Category/Indexer/Product.php

Literally just a copy of the contents of reindexAll with some modifications. Starting at // Query Correction and ending at // end Query Correction.

<?php

class Company_Catalog_Model_Resource_Category_Indexer_Product extends Mage_Catalog_Model_Resource_Category_Indexer_Product
{
    /**
     * Rebuild all index data
     *
     * @return Mage_Catalog_Model_Resource_Category_Indexer_Product
     */
    public function reindexAll()
    {
        $this->useIdxTable(true);
        $this->beginTransaction();
        try {
            $this->clearTemporaryIndexTable();
            $idxTable = $this->getIdxTable();
            $idxAdapter = $this->_getIndexAdapter();
            $stores = $this->_getStoresInfo();
            /**
             * Build index for each store
             */
            foreach ($stores as $storeData) {
                $storeId    = $storeData['store_id'];
                $websiteId  = $storeData['website_id'];
                $rootPath   = $storeData['root_path'];
                $rootId     = $storeData['root_id'];
                /**
                 * Prepare visibility for all enabled store products
                 */
                $enabledTable = $this->_prepareEnabledProductsVisibility($websiteId, $storeId);
                /**
                 * Select information about anchor categories
                 */
                $anchorTable = $this->_prepareAnchorCategories($storeId, $rootPath);
                /**
                 * Add relations between not anchor categories and products
                 */
                $select = $idxAdapter->select();
                /** @var $select Varien_Db_Select */
                $select->from(
                    array('cp' => $this->_categoryProductTable),
                    array('category_id', 'product_id', 'position', 'is_parent' => new Zend_Db_Expr('1'),
                        'store_id' => new Zend_Db_Expr($storeId))
                )
                ->joinInner(array('pv' => $enabledTable), 'pv.product_id=cp.product_id', array('visibility'))
                ->joinLeft(array('ac' => $anchorTable), 'ac.category_id=cp.category_id', array())
                ->where('ac.category_id IS NULL');

                $query = $select->insertFromSelect(
                    $idxTable,
                    array('category_id', 'product_id', 'position', 'is_parent', 'store_id', 'visibility'),
                    false
                );
                $idxAdapter->query($query);

                /**
                 * Assign products not associated to any category to root category in index
                 */

                $select = $idxAdapter->select();
                $select->from(
                    array('pv' => $enabledTable),
                    array(new Zend_Db_Expr($rootId), 'product_id', new Zend_Db_Expr('0'), new Zend_Db_Expr('1'),
                        new Zend_Db_Expr($storeId), 'visibility')
                )
                ->joinLeft(array('cp' => $this->_categoryProductTable), 'pv.product_id=cp.product_id', array())
                ->where('cp.product_id IS NULL');

                $query = $select->insertFromSelect(
                    $idxTable,
                    array('category_id', 'product_id', 'position', 'is_parent', 'store_id', 'visibility'),
                    false
                );
                $idxAdapter->query($query);

                /**
                 * Prepare anchor categories products
                 */
                $anchorProductsTable = $this->_getAnchorCategoriesProductsTemporaryTable();
                $idxAdapter->delete($anchorProductsTable);

                $position = 'MIN('.
                    $idxAdapter->getCheckSql(
                        'ca.category_id = ce.entity_id',
                        $idxAdapter->quoteIdentifier('cp.position'),
                        '('.$idxAdapter->quoteIdentifier('ce.position').' + 1) * '
                        .'('.$idxAdapter->quoteIdentifier('ce.level').' + 1 * 10000)'
                        .' + '.$idxAdapter->quoteIdentifier('cp.position')
                    )
                .')';

                // Query Correction

                // load the is_active category attribute
                $attributeModel = Mage::getModel('eav/entity_attribute')->loadByCode(Mage_Catalog_Model_Category::ENTITY, 'is_active');
                $attributeTable = 'catalog_category_entity_' . $attributeModel->getBackendType();

                $select = $idxAdapter->select()
                ->useStraightJoin(true)
                ->distinct(true)
                ->from(array('ca' => $anchorTable), array('category_id'))
                ->joinInner(
                    array('ce' => $this->_categoryTable),
                    $idxAdapter->quoteIdentifier('ce.path') . ' LIKE ' .
                    $idxAdapter->quoteIdentifier('ca.path') . ' OR ce.entity_id = ca.category_id',
                    array()
                )

                ->joinInner(
                    array('cp' => $this->_categoryProductTable),
                    'cp.category_id = ce.entity_id',
                    array('product_id')
                )

                // left join the attribute
                ->joinInner(
                    array('ccei' => $attributeTable),
                    'ccei.entity_id = cp.category_id',
                    array()//select nothing
                )

                ->joinInner(
                    array('pv' => $enabledTable),
                    'pv.product_id = cp.product_id',
                    array('position' => $position)
                )

                ->where('ccei.attribute_id=' . $attributeModel->getAttributeId())//is_active
                ->where('ccei.value=1')

                ->group(array('ca.category_id', 'cp.product_id'));
                $query = $select->insertFromSelect($anchorProductsTable,
                    array('category_id', 'product_id', 'position'), false);
                $idxAdapter->query($query);

                // end Query Correction

                /**
                 * Add anchor categories products to index
                 */
                $select = $idxAdapter->select()
                ->from(
                    array('ap' => $anchorProductsTable),
                    array('category_id', 'product_id',
                        'position', // => new Zend_Db_Expr('MIN('. $idxAdapter->quoteIdentifier('ap.position').')'),
                        'is_parent' => $idxAdapter->getCheckSql('cp.product_id > 0', 1, 0),
                        'store_id' => new Zend_Db_Expr($storeId))
                )
                ->joinLeft(
                    array('cp' => $this->_categoryProductTable),
                    'cp.category_id=ap.category_id AND cp.product_id=ap.product_id',
                    array()
                )
                ->joinInner(array('pv' => $enabledTable), 'pv.product_id = ap.product_id', array('visibility'));

                $query = $select->insertFromSelect(
                    $idxTable,
                    array('category_id', 'product_id', 'position', 'is_parent', 'store_id', 'visibility'),
                    false
                );
                $idxAdapter->query($query);

                $select = $idxAdapter->select()
                    ->from(array('e' => $this->getTable('catalog/product')), null)
                    ->join(
                        array('ei' => $enabledTable),
                        'ei.product_id = e.entity_id',
                        array())
                    ->joinLeft(
                        array('i' => $idxTable),
                        'i.product_id = e.entity_id AND i.category_id = :category_id AND i.store_id = :store_id',
                        array())
                    ->where('i.product_id IS NULL')
                    ->columns(array(
                        'category_id'   => new Zend_Db_Expr($rootId),
                        'product_id'    => 'e.entity_id',
                        'position'      => new Zend_Db_Expr('0'),
                        'is_parent'     => new Zend_Db_Expr('1'),
                        'store_id'      => new Zend_Db_Expr($storeId),
                        'visibility'    => 'ei.visibility'
                    ));

                $query = $select->insertFromSelect(
                    $idxTable,
                    array('category_id', 'product_id', 'position', 'is_parent', 'store_id', 'visibility'),
                    false
                );

                $idxAdapter->query($query, array('store_id' => $storeId, 'category_id' => $rootId));
            }

            $this->syncData();

            /**
             * Clean up temporary tables
             */
            $this->clearTemporaryIndexTable();
            $idxAdapter->delete($enabledTable);
            $idxAdapter->delete($anchorTable);
            $idxAdapter->delete($anchorProductsTable);
            $this->commit();
        } catch (Exception $e) {
            $this->rollBack();
            throw $e;
        }
        return $this;
    }
}

The fix in depth

The first thing done here is to Magento Safe load the Category Attribute details for is_active.

                // load the is_active category attribute
                $attributeModel = Mage::getModel('eav/entity_attribute')->loadByCode(Mage_Catalog_Model_Category::ENTITY, 'is_active');
                $attributeTable = 'catalog_category_entity_' . $attributeModel->getBackendType();

After this join,

                ->joinInner(
                    array('cp' => $this->_categoryProductTable),
                    'cp.category_id = ce.entity_id',
                    array('product_id')
                )

we need to add another join, in this case we are joining to the table that contains the is_active information, it’s usually catalog_category_entity_int but we Magento Safe loaded it above.

                // left join the attribute
                ->joinInner(
                    array('ccei' => $attributeTable),
                    'ccei.entity_id = cp.category_id',
                    array()//select nothing
                )

Finally we need to adjust to add a where to clean up the join and select active categories only, after the last join and before the group function call:

                ->where('ccei.attribute_id=' . $attributeModel->getAttributeId())//is_active
                ->where('ccei.value=1')

This changes the query to:

INSERT INTO `catalog_category_anc_products_index_idx` (`category_id`, `product_id`, `position`) SELECT STRAIGHT_JOIN DISTINCT `ca`.`category_id`, `cp`.`product_id`, MIN(IF(ca.category_id = ce.entity_id, `cp`.`position`, (`ce`.`position` + 1) * (`ce`.`level` + 1 * 10000) + `cp`.`position`)) AS `position` FROM `catalog_category_anc_categs_index_idx` AS `ca`
 INNER JOIN `catalog_category_entity` AS `ce` ON `ce`.`path` LIKE `ca`.`path` OR ce.entity_id = ca.category_id
 INNER JOIN `catalog_category_product` AS `cp` ON cp.category_id = ce.entity_id
 INNER JOIN `catalog_category_entity_int` AS `ccei` ON ccei.entity_id = cp.category_id
 INNER JOIN `catalog_category_product_index_enbl_idx` AS `pv` ON pv.product_id = cp.product_id WHERE (ccei.attribute_id=119) AND (ccei.value=1) GROUP BY `ca`.`category_id`,
	`cp`.`product_id`

And Finally

Now with this module in place (remember to activate it in etc/modules) clear your cache and reindex. And hey presto, the product is no longer showing up in a category that is a parent of a category that the product is in that is now disabled. (Bit of a mouthful that)

In Summary

Much nicer, and not too much of a performance hit really.

Overall it means your products only show in the right active categories.

Generally it is only a issue when you disable a category and don’t delete it and then rearrange things leaving remnants of the old category tree floating about, so basically anything a common user would do.

I don’t tend to delete stuff myself, in case things get reactivated, and I’ll reuse the records.

Hopefully this is of help, rather than deleting records!