I came across something odd a while back, and until recently not had time to investigate the issue.
Products were showing up in categories that they were not explicitly in.
In summary, when you reindex catalog_category_product Magento goes and looks at all the products and the categories that they are in. For the categories that the product are in, it gets the category tree of that category.
So consider this category hierarchy:
- Category A
- Sub Category A 1
- Bottom Category 1
- Bottom Category 2
- Sub Category A 2
- Bottom Category 3
- Bottom Category 4
- Sub Category A 3
- Bottom Category 5
- Bottom Category 6
- Sub Category A 1
- Category B
- Sub Category B 1
- Bottom Category 7
- Bottom Category 8
- Sub Category B 2
- Bottom Category 9
- Bottom Category 10
- Sub Category B 3
- Bottom Category 11
- Bottom Category 12
- Sub Category B 1
Now you create a product and put it in Bottom Category 9 and Bottom Category 11. When you reindex it’s put into the list to show up in the category and the parents on that tree, so Category B and Category B 2, and so on for Bottom Category 11.
Now over time we add more products and more categories, and several months later we decide to rearrange our category tree, as we’ve gotten too wide and our needs have changed, or some other reason. Say to something like this, a strikethru indicates a disabled category (as in is_active is 0 (it’s a boolean integer))
- Category A
- Sub Category A 1
- Bottom Category 1
Bottom Category 2- Bottom Category 3
- Bottom Category 4
Sub Category A 2- Sub Category A 3
- Bottom Category 5
- Bottom Category 6
- Sub Category A 1
- Category B
- Sub Category B 1
- Bottom Category 7
- Bottom Category 8
- Sub Category B 2
Bottom Category 9- Bottom Category 10
- Sub Category B 3
- Bottom Category 11
- Bottom Category 12
- Sub Category B 1
Now of course we reindex, and the problems start.
The Problem
We have a problem, the Product correctly shows in Bottom Category 11 and it’s parents, but it is still showing up in Sub Category A 1 and it’s parents in error.
It seems that the reindexer takes categories that are disabled (as in is_active is false), and their tree, into account when building the list of category pages that a product can be displayed on.
So in order to fix this I went deep into Magento core to go and see which query was at fault. Turns out that it’s not exactly broken, it’s just Magento doesn’t check for it a give category is active or not when it builds the list.
The query at fault is:
1 2 3 4 5 | INSERT INTO `catalog_category_anc_products_index_idx` (`category_id`, `product_id`, `position`) SELECT STRAIGHT_JOIN DISTINCT `ca`.`category_id`, `cp`.`product_id`, MIN (IF(ca.category_id = ce.entity_id, `cp`.`position`, (`ce`.`position` + 1) * (`ce`.` level ` + 1 * 10000) + `cp`.`position`)) AS `position` FROM `catalog_category_anc_categs_index_idx` AS `ca` INNER JOIN `catalog_category_entity` AS `ce` ON `ce`.`path` LIKE `ca`.`path` OR ce.entity_id = ca.category_id INNER JOIN `catalog_category_product` AS `cp` ON cp.category_id = ce.entity_id INNER JOIN `catalog_category_product_index_enbl_idx` AS `pv` ON pv.product_id = cp.product_id GROUP BY `ca`.`category_id`, `cp`.`product_id` |
As you can see it just collects from catalog_category_entity, which has a table structure of:
mysql> describe catalog_category_entity; +------------------+----------------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +------------------+----------------------+------+-----+---------+----------------+ | entity_id | int(10) unsigned | NO | PRI | NULL | auto_increment | | entity_type_id | smallint(5) unsigned | NO | | 0 | | | attribute_set_id | smallint(5) unsigned | NO | | 0 | | | parent_id | int(10) unsigned | NO | | 0 | | | created_at | timestamp | YES | | NULL | | | updated_at | timestamp | YES | | NULL | | | path | varchar(255) | NO | MUL | NULL | | | position | int(11) | NO | | 0 | | | level | int(11) | NO | MUL | 0 | | | children_count | int(11) | NO | | 0 | | +------------------+----------------------+------+-----+---------+----------------+ 10 rows in set (0.00 sec)
As you can see that table structure is somewhat limited. Since this is the bare bones basics of a category. And thus the query to collect categories of a product is in, considers inactive categories as if doesn’t have this information.
The Solution
The fix here is simple. Find where the query is defined and take into account the is_active category attribute, and we’ll do it the Magento way. Rather than hardcoding entity_id’s.
A quick note about how I found the query. I opened up public_html/lib/Varien/Db/Adapter/Pdo/Mysql.php
and changed the variable $_debug
from false
to true
, it’s around about line 103. This will output all the SQL queries to public_html/var/debug/pdo_mysql.log
.
After that I just grep’ed the source catalog_category_anc_products_index_idx
which told me where the table was defined in a xml config, and then grep’ed again for the internal reference name, (there was much much grep’ing for bit’s and pieces).
The query runs under the reindexAll
function of Mage_Catalog_Model_Resource_Category_Indexer_Product
. So basically take the ENTIRE contents of this function and setup a Model Resource override, and fix the query.
Module Basics
Creating our Module in the directory local/Company/Catalog
Normally I wouldn’t cover this, but we are overriding a resource which is a little different from just overriding a model.
etc/config.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | <? xml version = "1.0" ?> < config > < modules > < Company_Catalog > < version >0.0.1</ version > </ Company_Catalog > </ modules > < global > < models > < companycatalog > < class >Company_Catalog_Model</ class > </ companycatalog > < catalog_resource_eav_mysql4 > < rewrite > < category_indexer_product >Company_Catalog_Model_Resource_Category_Indexer_Product</ category_indexer_product > </ rewrite > </ catalog_resource_eav_mysql4 > </ models > </ global > </ config > |
Model/Resource/Category/Indexer/Product.php
Literally just a copy of the contents of reindexAll with some modifications. Starting at // Query Correction
and ending at // end Query Correction
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 | <?php class Company_Catalog_Model_Resource_Category_Indexer_Product extends Mage_Catalog_Model_Resource_Category_Indexer_Product { /** * Rebuild all index data * * @return Mage_Catalog_Model_Resource_Category_Indexer_Product */ public function reindexAll() { $this ->useIdxTable(true); $this ->beginTransaction(); try { $this ->clearTemporaryIndexTable(); $idxTable = $this ->getIdxTable(); $idxAdapter = $this ->_getIndexAdapter(); $stores = $this ->_getStoresInfo(); /** * Build index for each store */ foreach ( $stores as $storeData ) { $storeId = $storeData [ 'store_id' ]; $websiteId = $storeData [ 'website_id' ]; $rootPath = $storeData [ 'root_path' ]; $rootId = $storeData [ 'root_id' ]; /** * Prepare visibility for all enabled store products */ $enabledTable = $this ->_prepareEnabledProductsVisibility( $websiteId , $storeId ); /** * Select information about anchor categories */ $anchorTable = $this ->_prepareAnchorCategories( $storeId , $rootPath ); /** * Add relations between not anchor categories and products */ $select = $idxAdapter ->select(); /** @var $select Varien_Db_Select */ $select ->from( array ( 'cp' => $this ->_categoryProductTable), array ( 'category_id' , 'product_id' , 'position' , 'is_parent' => new Zend_Db_Expr( '1' ), 'store_id' => new Zend_Db_Expr( $storeId )) ) ->joinInner( array ( 'pv' => $enabledTable ), 'pv.product_id=cp.product_id' , array ( 'visibility' )) ->joinLeft( array ( 'ac' => $anchorTable ), 'ac.category_id=cp.category_id' , array ()) ->where( 'ac.category_id IS NULL' ); $query = $select ->insertFromSelect( $idxTable , array ( 'category_id' , 'product_id' , 'position' , 'is_parent' , 'store_id' , 'visibility' ), false ); $idxAdapter ->query( $query ); /** * Assign products not associated to any category to root category in index */ $select = $idxAdapter ->select(); $select ->from( array ( 'pv' => $enabledTable ), array ( new Zend_Db_Expr( $rootId ), 'product_id' , new Zend_Db_Expr( '0' ), new Zend_Db_Expr( '1' ), new Zend_Db_Expr( $storeId ), 'visibility' ) ) ->joinLeft( array ( 'cp' => $this ->_categoryProductTable), 'pv.product_id=cp.product_id' , array ()) ->where( 'cp.product_id IS NULL' ); $query = $select ->insertFromSelect( $idxTable , array ( 'category_id' , 'product_id' , 'position' , 'is_parent' , 'store_id' , 'visibility' ), false ); $idxAdapter ->query( $query ); /** * Prepare anchor categories products */ $anchorProductsTable = $this ->_getAnchorCategoriesProductsTemporaryTable(); $idxAdapter -> delete ( $anchorProductsTable ); $position = 'MIN(' . $idxAdapter ->getCheckSql( 'ca.category_id = ce.entity_id' , $idxAdapter ->quoteIdentifier( 'cp.position' ), '(' . $idxAdapter ->quoteIdentifier( 'ce.position' ). ' + 1) * ' . '(' . $idxAdapter ->quoteIdentifier( 'ce.level' ). ' + 1 * 10000)' . ' + ' . $idxAdapter ->quoteIdentifier( 'cp.position' ) ) . ')' ; // Query Correction // load the is_active category attribute $attributeModel = Mage::getModel( 'eav/entity_attribute' )->loadByCode(Mage_Catalog_Model_Category::ENTITY, 'is_active' ); $attributeTable = 'catalog_category_entity_' . $attributeModel ->getBackendType(); $select = $idxAdapter ->select() ->useStraightJoin(true) ->distinct(true) ->from( array ( 'ca' => $anchorTable ), array ( 'category_id' )) ->joinInner( array ( 'ce' => $this ->_categoryTable), $idxAdapter ->quoteIdentifier( 'ce.path' ) . ' LIKE ' . $idxAdapter ->quoteIdentifier( 'ca.path' ) . ' OR ce.entity_id = ca.category_id' , array () ) ->joinInner( array ( 'cp' => $this ->_categoryProductTable), 'cp.category_id = ce.entity_id' , array ( 'product_id' ) ) // left join the attribute ->joinInner( array ( 'ccei' => $attributeTable ), 'ccei.entity_id = cp.category_id' , array () //select nothing ) ->joinInner( array ( 'pv' => $enabledTable ), 'pv.product_id = cp.product_id' , array ( 'position' => $position ) ) ->where( 'ccei.attribute_id=' . $attributeModel ->getAttributeId()) //is_active ->where( 'ccei.value=1' ) ->group( array ( 'ca.category_id' , 'cp.product_id' )); $query = $select ->insertFromSelect( $anchorProductsTable , array ( 'category_id' , 'product_id' , 'position' ), false); $idxAdapter ->query( $query ); // end Query Correction /** * Add anchor categories products to index */ $select = $idxAdapter ->select() ->from( array ( 'ap' => $anchorProductsTable ), array ( 'category_id' , 'product_id' , 'position' , // => new Zend_Db_Expr('MIN('. $idxAdapter->quoteIdentifier('ap.position').')'), 'is_parent' => $idxAdapter ->getCheckSql( 'cp.product_id > 0' , 1, 0), 'store_id' => new Zend_Db_Expr( $storeId )) ) ->joinLeft( array ( 'cp' => $this ->_categoryProductTable), 'cp.category_id=ap.category_id AND cp.product_id=ap.product_id' , array () ) ->joinInner( array ( 'pv' => $enabledTable ), 'pv.product_id = ap.product_id' , array ( 'visibility' )); $query = $select ->insertFromSelect( $idxTable , array ( 'category_id' , 'product_id' , 'position' , 'is_parent' , 'store_id' , 'visibility' ), false ); $idxAdapter ->query( $query ); $select = $idxAdapter ->select() ->from( array ( 'e' => $this ->getTable( 'catalog/product' )), null) ->join( array ( 'ei' => $enabledTable ), 'ei.product_id = e.entity_id' , array ()) ->joinLeft( array ( 'i' => $idxTable ), 'i.product_id = e.entity_id AND i.category_id = :category_id AND i.store_id = :store_id' , array ()) ->where( 'i.product_id IS NULL' ) ->columns( array ( 'category_id' => new Zend_Db_Expr( $rootId ), 'product_id' => 'e.entity_id' , 'position' => new Zend_Db_Expr( '0' ), 'is_parent' => new Zend_Db_Expr( '1' ), 'store_id' => new Zend_Db_Expr( $storeId ), 'visibility' => 'ei.visibility' )); $query = $select ->insertFromSelect( $idxTable , array ( 'category_id' , 'product_id' , 'position' , 'is_parent' , 'store_id' , 'visibility' ), false ); $idxAdapter ->query( $query , array ( 'store_id' => $storeId , 'category_id' => $rootId )); } $this ->syncData(); /** * Clean up temporary tables */ $this ->clearTemporaryIndexTable(); $idxAdapter -> delete ( $enabledTable ); $idxAdapter -> delete ( $anchorTable ); $idxAdapter -> delete ( $anchorProductsTable ); $this ->commit(); } catch (Exception $e ) { $this ->rollBack(); throw $e ; } return $this ; } } |
The fix in depth
The first thing done here is to Magento Safe load the Category Attribute details for is_active.
1 2 3 | // load the is_active category attribute $attributeModel = Mage::getModel( 'eav/entity_attribute' )->loadByCode(Mage_Catalog_Model_Category::ENTITY, 'is_active' ); $attributeTable = 'catalog_category_entity_' . $attributeModel ->getBackendType(); |
After this join,
1 2 3 4 5 | ->joinInner( array ( 'cp' => $this ->_categoryProductTable), 'cp.category_id = ce.entity_id' , array ( 'product_id' ) ) |
we need to add another join, in this case we are joining to the table that contains the is_active information, it’s usually catalog_category_entity_int
but we Magento Safe loaded it above.
1 2 3 4 5 6 | // left join the attribute ->joinInner( array ( 'ccei' => $attributeTable ), 'ccei.entity_id = cp.category_id' , array () //select nothing ) |
Finally we need to adjust to add a where
to clean up the join and select active categories only, after the last join and before the group function call:
1 2 | ->where( 'ccei.attribute_id=' . $attributeModel ->getAttributeId()) //is_active ->where( 'ccei.value=1' ) |
This changes the query to:
1 2 3 4 5 6 | INSERT INTO `catalog_category_anc_products_index_idx` (`category_id`, `product_id`, `position`) SELECT STRAIGHT_JOIN DISTINCT `ca`.`category_id`, `cp`.`product_id`, MIN (IF(ca.category_id = ce.entity_id, `cp`.`position`, (`ce`.`position` + 1) * (`ce`.` level ` + 1 * 10000) + `cp`.`position`)) AS `position` FROM `catalog_category_anc_categs_index_idx` AS `ca` INNER JOIN `catalog_category_entity` AS `ce` ON `ce`.`path` LIKE `ca`.`path` OR ce.entity_id = ca.category_id INNER JOIN `catalog_category_product` AS `cp` ON cp.category_id = ce.entity_id INNER JOIN `catalog_category_entity_int` AS `ccei` ON ccei.entity_id = cp.category_id INNER JOIN `catalog_category_product_index_enbl_idx` AS `pv` ON pv.product_id = cp.product_id WHERE (ccei.attribute_id=119) AND (ccei.value=1) GROUP BY `ca`.`category_id`, `cp`.`product_id` |
And Finally
Now with this module in place (remember to activate it in etc/modules) clear your cache and reindex. And hey presto, the product is no longer showing up in a category that is a parent of a category that the product is in that is now disabled. (Bit of a mouthful that)
In Summary
Much nicer, and not too much of a performance hit really.
Overall it means your products only show in the right active categories.
Generally it is only a issue when you disable a category and don’t delete it and then rearrange things leaving remnants of the old category tree floating about, so basically anything a common user would do.
I don’t tend to delete stuff myself, in case things get reactivated, and I’ll reuse the records.
Hopefully this is of help, rather than deleting records!