Machine Learning Evidence on the Determinants and Dynamics of Green Innovation Efficiency
Main article
Abstract
The transition from quantity-oriented to quality-oriented innovation is central to China's dual-circulation strategy, yet the determinants of green innovation efficiency (GIE) — the capacity of firms to convert R&D inputs into environmentally oriented innovative outputs — remain under-characterised at scale. This study assembles a panel of 43,812 firm-year observations from 4,287 listed firms across 20 two-digit CSRC sectors covering 2006–2023 and applies an ensemble of eleven machine learning algorithms — regularised linear models (Ridge, Lasso, ElasticNet), tree-based ensembles (Random Forest, Gradient Boosting, XGBoost, LightGBM, CatBoost), kernel Support Vector Regression, and a four-layer deep neural network — together with a traditional linear regression baseline. We construct three complementary GIE measures that contrast total green patents, quality-weighted green patents (giving higher weights to invention patents), and the Y02 IPC-filtered climate-relevant subset. A rigorous validation protocol — 70/30 stratified train–test split, repeated 10-fold cross-validation, RobustScaler preprocessing, conservative hyper-parameter grids, and SHAP-based interpretation — is applied uniformly across algorithms. Gradient Boosting attains the strongest out-of-sample performance (R² = 0.981, RMSE = 0.009, training-to-CV gap below 0.002), with LightGBM and CatBoost within one standard deviation. R&D intensity dominates the feature hierarchy (mean |SHAP| = 0.342), followed by green-patent stock and firm age; cross-sector differences are economically modest yet statistically significant after Bonferroni correction in six of 45 pairs, concentrated in information services, utilities and transportation. A structural break in 2015 aligned with the Made in China 2025 programme, a further acceleration after the 2021 dual-carbon pledge, and a pronounced east-coast advantage are documented. The findings support firm-level, capability-centred innovation policy, with narrowly targeted sectoral instruments reserved for traditionally low-efficiency regional sub-populations.
