Main article

Yuxuan Zhao
School of Economics and Management, Xi'an University of Technology, Xi'an 710054, Shaanxi, China
Wenqing Sun
School of Economics and Management, Xi'an University of Technology, Xi'an 710054, Shaanxi, China
Jianliang Guo
School of Management, Northwestern Polytechnical University, Xi'an 710072, Shaanxi, China
Yingzhe Fan
School of International Business, Shaanxi Normal University, Xi'an 710119, Shaanxi, China
Lina Meng*
School of Economics and Management, Xi'an University of Technology, Xi'an 710054, Shaanxi, China
linameng@xaut.edu.cn

Abstract

The transition from quantity-oriented to quality-oriented innovation is central to China's dual-circulation strategy, yet the determinants of green innovation efficiency (GIE) — the capacity of firms to convert R&D inputs into environmentally oriented innovative outputs — remain under-characterised at scale. This study assembles a panel of 43,812 firm-year observations from 4,287 listed firms across 20 two-digit CSRC sectors covering 2006–2023 and applies an ensemble of eleven machine learning algorithms — regularised linear models (Ridge, Lasso, ElasticNet), tree-based ensembles (Random Forest, Gradient Boosting, XGBoost, LightGBM, CatBoost), kernel Support Vector Regression, and a four-layer deep neural network — together with a traditional linear regression baseline. We construct three complementary GIE measures that contrast total green patents, quality-weighted green patents (giving higher weights to invention patents), and the Y02 IPC-filtered climate-relevant subset. A rigorous validation protocol — 70/30 stratified train–test split, repeated 10-fold cross-validation, RobustScaler preprocessing, conservative hyper-parameter grids, and SHAP-based interpretation — is applied uniformly across algorithms. Gradient Boosting attains the strongest out-of-sample performance (R² = 0.981, RMSE = 0.009, training-to-CV gap below 0.002), with LightGBM and CatBoost within one standard deviation. R&D intensity dominates the feature hierarchy (mean |SHAP| = 0.342), followed by green-patent stock and firm age; cross-sector differences are economically modest yet statistically significant after Bonferroni correction in six of 45 pairs, concentrated in information services, utilities and transportation. A structural break in 2015 aligned with the Made in China 2025 programme, a further acceleration after the 2021 dual-carbon pledge, and a pronounced east-coast advantage are documented. The findings support firm-level, capability-centred innovation policy, with narrowly targeted sectoral instruments reserved for traditionally low-efficiency regional sub-populations.

Article details

How to Cite

Zhao, Y., Sun, W., Guo, J., Fan, Y., & Meng, L. (2023). Machine Learning Evidence on the Determinants and Dynamics of Green Innovation Efficiency. Journal of AI Analytics and Applications, 1(2), 1-25. https://doi.org/10.63646/jaiaa.2023.010201