We explored the phenotypic characteristics and established a classification strategy of high-quality germplasm resources of wild Gentiana rigescens
Franch. ex Hemsl. In total, 887 samples of G. rigescens
collected from different regions were used as research materials. Principal component analysis (PCA), hierarchical clustering analysis (HCA), and membership function analysis were used to analyze and evaluate 17 active ingredient yield traits of the roots, stems, and leaves. Results showed that gentiopicroside yield in the roots had the highest Shannon-Wiener index value (I
= 1.64), while loganic acid and sweroside acid yields in the leaves had the lowest I
= 0.73). Based on D
value scoring and membership function analysis, we identified 214 high-quality and high-yield seed sources, accounting for 24.40% of the total sample size, distributed in Yunnan, Sichuan, and Guizhou. Variable importance in projection (VIP) analysis showed similar phenotypic characteristics among the high-quality germplasms in Yunnan and Sichuan, which were characterized by high sweroside, loganic acid, and 6'-O
-β-D-glucopyranosylgentiopicroside yield. The high-quality germplasms in Guizhou were characterized by high swertiamarin yield. Among the three different machine learning algorithms, results showed that the discrimination model established using the Random Forest (RF) algorithm had the highest prediction accuracy and stability and could effectively identify different provenances.