АГЕНТНАЯ МОДЕЛЬ СОПОСТАВИМОГО ВЫБОРА ОБЪЯСНИТЕЛЯ ДЛЯ МОДЕЛЕЙ МАШИННОГО ОБУЧЕНИЯ

УДК 004.891
DOI: 10.36871/2618-9976.2026.03.005

Авторы

Юрий Владиславович Трофимов,
Ассистент, младший научный сотрудник, Государственный университет «Дубна», Дубна, Россия
Алексей Николаевич Аверкин,
Кандидат физико-математических наук, доцент, Государственный университет «Дубна», Дубна, Россия
Станислав Олегович Косарев,
Студент, Государственный университет «Дубна», Дубна, Россия
Михаил Дмитриевич Лебедев,
Лаборант учебной лаборатории ИТО, Государственный университет «Дубна», Дубна, Россия
Максим Алексеевич Лопатин,
Студент, Федеральное государственное автономное образовательное учреждение высшего образования «Национальный исследовательский технологический университет «МИСИС», Москва, Россия

Аннотация

В работе рассматривается задача сопоставимого выбора методов объяснимости для моделей машинного обучения, когда итоговый вывод зависит от выбранного объяснителя, его настроек и протокола оценки. Методы трактуются как пространство кандидатов, из которого подключаемый агент выбирает объяснение для конкретной модели, типа данных и ограничений по ресурсам. Для обоснования правил выбора выполнен обзор по стандарту PRISMAScR за период 2016–2026 гг. с фокусом на CAM, SHAP и LIME и с учётом методов возмущений, контрфактуальных объяснений, TCAV и градиентных атрибуций. На основе результатов обзора построен единый воспроизводимый протокол экспериментальной проверки для изображений, табличных данных, текста и временных рядов с оценкой верности, устойчивости и вычислительной стоимости. На этой основе предложен агент выбора объяснителя, который отбирает допустимые классы методов и уточняет выбор через быструю калибровку по метрикам и ограничениям, повышая прослеживаемость выводов и снижая произвольность решений.

Ключевые слова

объяснимость моделей
интерпретируемость
объяснения
CAM
GradCAM
SHAP
LIME
методы возмущений
контрфактуальные объяснения
TCAV
устойчивость объяснений
верность объяснений
метрики оценки
вставка и удаление
бенчмарк
PRISMAScR
агент выбора объяснителя

Список литературы

[1] A Benchmark for Interpretability Methods in Deep Neural Networks // NeurIPS Proceedings URL: https://proceedings.neurips.cc/paper_files/paper/2019/hash/fe4b855 6000d0f0cae99daa5c5c5a410Abstract.html (дата обращения: 17.02.2026).

[2] A comprehensive study on fidelity metrics for XAI // ScienceDirect URL: https:// www.sciencedirect.com/science/article/pii/S0306457324002590 (дата обращения: 17.02.2026).

[3] A survey on medical explainable AI (XAI): recent progress, explainability approach, human interaction and scoring system // MDPI URL: https://www.mdpi.com/1424 8220/22/20/8068 (дата обращения: 17.02.2026).

[4] Ablationcam: Visual explanations for deep convolutional network via gradientfree localization // WACV 2020 URL: https://openaccess.thecvf.com/content_WACV_2020/html/Desai_AblationCAM_Visual_Explanations_for_Deep_Convolutional_Network_via_ Gradientfree_Localization_WACV_2020_paper.html (дата обращения: 17.02.2026).

[5] An interpretable prediction model for identifying N7methylguanosine sites based on XGBoost and SHAP // Cell URL: https://www.cell.com/moleculartherapyfamily/nucleicacids/fulltext/S21622531(20)302511?__cf_chl_rt_tk=4tLGHjF2JmQZjW6NWvcap 56xXQJvTchsN0TtJHmogtg17719405481.0.1.1mGYpTiojNhrCgHPfTFtJL8loKAGKcMjU mjgBv_fF.ws (дата обращения: 17.02.2026).

[6] Approximate Data Deletion from Machine Learning Models // Proceedings of Machine Learning Research URL: https://proceedings.mlr.press/v130/izzo21a.html (дата обращения: 17.02.2026).

[7] Augmented ScoreCAM: High resolution visual interpretations for deep neural networks // ScienceDirect URL: https://www.sciencedirect.com/science/article/abs/pii/S09507051 22006451 (дата обращения: 17.02.2026).

[8] Benchmarking Deletion Metrics with the Principled Explanations // openreview.net URL: https://openreview.net/forum?id=SKPhvzxO1g (дата обращения: 17.02.2026).

[9] Boundary IoU: Improving objectcentric image segmentation evaluation // CVPR 2021 URL: https://openaccess.thecvf.com/content/CVPR2021/html/Cheng_Boundary_IoU_ Improving_ObjectCentric_Image_Segmentation_Evaluation_CVPR_2021_paper.html (дата обращения: 17.02.2026).

[10] Clipscore: A referencefree evaluation metric for image captioning // Aclanthology URL: https://aclanthology.org/2021.emnlpmain.595/ (дата обращения: 17.02.2026).

[11] Comparative evaluation of posthoc explainability methods in AI: LIME, SHAP, and GradCAM // IEEE Xplore URL: https://ieeexplore.ieee.org/abstract/document/10762963 (дата обращения: 17.02.2026).

[12] Comparing Pearson, Spearman and Hoeffding’s D measure for gene expression association analysis // World Scientific URL: https://www.worldscientific.com/doi/abs/10. 1142/S0219720009004230 (дата обращения: 17.02.2026).

[13] Comparing the decisionmaking mechanisms by transformers and cnns via explanation methods // CVPR 2024 URL: https://openaccess.thecvf.com/content/CVPR2024/html/ Jiang_Comparing_the_DecisionMaking_Mechanisms_by_Transformers_and_CNNs_via_ Explanation_CVPR_2024_paper.html (дата обращения: 17.02.2026).

[14] Delivering trustworthy AI through formal XAI // Proceedings of the AAAI Conference on Artificial Intelligence URL: https://ojs.aaai.org/index.php/AAAI/article/view/21499 (дата обращения: 17.02.2026).

[15] Delivering trustworthy AI through formal XAIUnlocking the black box: Explainable artificial intelligence (XAI) for trust and transparency in ai systems // Academia URL: https://www.academia.edu/download/109274543/Article4JDAH41.pdf (дата обращения: 17.02.2026).

[16] Developing the sensitivity of LIME for better machine learning explanation // SPIE.Digital Library URL: https://www.spiedigitallibrary.org/conferenceproceedingsofspie/11006/1100610/DevelopingthesensitivityofLIMEforbettermachinelearningexplanation/10.1117/12.2520149.short (дата обращения: 17.02.2026).

[17] EigenCAM: Class Activation Map using Principal Components // IEEE Xplore URL: https://ieeexplore.ieee.org/abstract/document/9206626 (дата обращения: 17.02.2026).

[18] EigenCAM: Visual Explanations for Deep Convolutional Neural Networks // Springer URL: https://link.springer.com/article/10.1007/s42979021004493 (дата обращения: 17.02.2026).

[19] Evaluating input perturbation methods for interpreting CNNs and saliency map comparison // Springer URL: https://link.springer.com/chapter/10.1007/978303066415 2_8 (дата обращения: 17.02.2026).

[20] Evaluating the faithfulness of importance measures in NLP by recursively masking allegedly important tokens and retraining // Aclanthology URL: https://aclanthology.org/ 2022.findingsemnlp.125/ (дата обращения: 17.02.2026).

[21] Evaluating visual explanations of attention maps for transformerbased medical imaging // Springer URL: https://link.springer.com/chapter/10.1007/9783031776106_11 (дата обращения: 17.02.2026).

[22] Explainable AI (XAI): Core ideas, techniques, and solutions // ACM Digital Library URL: https://dl.acm.org/doi/full/10.1145/3561048 (дата обращения: 17.02.2026).

[23] Explainable artificial intelligence: Fundamentals, approaches, challenges, XAI evaluation, and validation // Taylor & Francis Group URL: https://www.taylorfrancis.com/chapters/edit/10.1201/97810035024322/explainableartificialintelligencemanojkumarmahto (дата обращения: 17.02.2026).

[24] Explaining the explainer: A first theoretical analysis of LIME // Proceedings of Machine Learning Research URL: https://proceedings.mlr.press/v108/garreau20a.html (дата обращения: 17.02.2026).

[25] Fooling lime and shap: Adversarial attacks on post hoc explanation methods // ACM Digital Library URL: https://dl.acm.org/doi/abs/10.1145/3375627.3375830 (дата обращения: 17.02.2026).

[26] Generalized distances between rankings // ACM Digital Library URL: https://dl.acm. org/doi/abs/10.1145/1772690.1772749 (дата обращения: 17.02.2026).

[27] Generalizing adversarial explanations with GradCAM // CVPR 2022 URL: https:// openaccess.thecvf.com/content/CVPR2022W/ArtOfRobust/html/Chakraborty_Generali zing_Adversarial_Explanations_With_GradCAM_CVPRW_2022_paper.html (дата обращения: 17.02.2026).

[28] Grad++ ScoreCAM: enhancing visual explanations of deep convolutional networks using incremented gradient and scoreweighted methods // IEEE Xplore URL: https://ieeexplore.ieee.org/abstract/document/10506957 (дата обращения: 17.02.2026).

[29] GradCAM helps interpret the deep learning models trained to classify multiple sclerosis types using clinical brain magnetic resonance imaging // ScienceDirect URL: https:// www.sciencedirect.com/science/article/abs/pii/S0165027021000339 (дата обращения: 17.02.2026).

[30] GradCAM: visual explanations from deep networks via gradientbased localization // ICCV 2017 URL: https://openaccess.thecvf.com/content_iccv_2017/html/Selvaraju_ GradCAM_Visual_Explanations_ICCV_2017_paper.html (дата обращения: 17.02.2026).

[31] GradCAM: Why did you say that? // arxiv.org URL: https://arxiv.org/abs/1611.07450 (дата обращения: 17.02.2026).

[32] GradCAM++ is equivalent to GradCAM with positive gradients // arxiv.org URL: https://arxiv.org/abs/2205.10838 (дата обращения: 17.02.2026).

[33] Influence functions of the Spearman and Kendall correlation measures // Springer URL: https://link.springer.com/article/10.1007/s102600100142z (дата обращения: 17.02. 2026).

[34] Interpreting neural ranking models using gradcam // arxiv.org URL: https://arxiv.org/ abs/2005.05768 (дата обращения: 17.02.2026).

[35] LeRF: Learning resampling function for adaptive and efficient image interpolation // IEEE Xplore URL: https://ieeexplore.ieee.org/abstract/document/11027639 (дата обращения: 17.02.2026).

[36] Lime: Less is more for mllm evaluation // ACL Anthology URL: https://aclanthology.org/2025.findingsacl.474/ (дата обращения: 17.02.2026).

[37] Metrics for evaluating 3D medical image segmentation: analysis, selection, and tool // Springer URL: https://link.springer.com/article/10.1186/s128800150068x (дата обращения: 17.02.2026).

[38] MORF: A framework for predictive modeling and replication at scale with privacyrestricted MOOC data // IEEE Xplore URL: https://ieeexplore.ieee.org/abstract/document/8621874 (дата обращения: 17.02.2026).

[39] Multilayer GradCAM: An effective tool towards explainable deep neural networks for intelligent fault diagnosis // ScienceDirect URL: https://www.sciencedirect.com/science/article/abs/pii/S0278612523001024 (дата обращения: 17.02.2026).

[40] Normalized AOPC: Fixing Misleading Faithfulness Metrics for Feature Attributions Explainability // ACL Anthology URL: https://aclanthology.org/2025.acllong.86/ (дата обращения: 17.02.2026).

[41] On the (in) fidelity and sensitivity of explanations // NeurIPS Proceedings URL: https:// proceedings.neurips.cc/paper/2019/hash/a7471fdc77b3435276507cc8f2dc2569Abstract.html (дата обращения: 17.02.2026).

[42] On the comparison of the Spearman and Kendall metrics between linear orders // ScienceDirect URL: https://www.sciencedirect.com/science/article/pii/S0012365X98000764 (дата обращения: 17.02.2026).

[43] Practical guide to SHAP analysis: Explaining supervised machine learning model predictions in drug development // National Library of Medicine URL: https://pmc.ncbi.nlm. nih.gov/articles/PMC11513550/ (дата обращения: 17.02.2026).

[44] Prediction of MoRFs based on sequence properties and convolutional neural networks // Springer URL: https://link.springer.com/article/10.1186/s13040021002756 (дата обращения: 17.02.2026).

[45] Root mean square error (RMSE) or mean absolute error (MAE)?–Arguments against avoiding RMSE in the literature // GMD URL: https://gmd.copernicus.org/articles/7/ 1247/2014/ (дата обращения: 17.02.2026).

[46] Sanity Checks for Saliency Metrics // Proceedings of the AAAI Conference on Artificial Intelligence URL: https://ojs.aaai.org/index.php/AAAI/article/view/6064 (дата обращения: 17.02.2026).

[47] ScoreCAM: Scoreweighted visual explanations for convolutional neural networks // CVPR 2020 URL: https://openaccess.thecvf.com/content_CVPRW_2020/html/w1/Wang_ ScoreCAM_ScoreWeighted_Visual_Explanations_for_Convolutional_Neural_Networks_CVPRW_ 2020_paper.html(дата обращения: 17.02.2026).

[48] ScoreCAM++: Class discriminative localization with feature map selection // IOP Science URL: https://iopscience.iop.org/article/10.1088/17426596/2278/1/012018/meta (дата обращения: 17.02.2026).

[49] SHAPbased explanation methods: a review for NLP interpretability // ACL Anthology URL: https://aclanthology.org/2022.coling1.406/ (дата обращения: 17.02.2026).

[50] Shapeiou: More accurate metric considering bounding box shape and scale // arxiv.org URL: https://arxiv.org/abs/2312.17663 (дата обращения: 17.02.2026).

[51] Slime: Stabilizedlime for model explanation // ACM Digital Library URL: https://dl. acm.org/doi/abs/10.1145/3447548.3467274 (дата обращения: 17.02.2026).

[52] Spearman’s rank correlation coefficient // The BMJ URL: https://www.bmj.com/content/349/bmj.g7327.full?__cf_chl_rt_tk=XaRR4pmQioT1ZqLJAUUPyCmoEueJpjeRjDdv29 brYPg17719402851.0.1.12yvLH3xXWNEBZ8mGC6OozOqJVmnrS9VJW38xLFHo_IU (дата обращения: 17.02.2026).

[53] The Kendall rank correlation coefficient // Τμήμα Μηχανικών Ηλεκτρονικών Υπολογιστών και Πληροφορικής Πολυτεχνική Σχολή URL: https://www.cs.uoi.gr/~pvassil/supervision/diplomatikes/extraMaterial/Correlation/Kendall/2007_KendallCorrel ation_shortExplanation.pdf (дата обращения: 17.02.2026).

[54] The problem of m rankings // JSTOR URL: https://www.jstor.org/stable/2235668 (дата обращения: 17.02.2026).

[55] Towards better process management in wastewater treatment plants: Process analytics based on SHAP values for treebased machine learning methods // ScienceDirect URL: https://www.sciencedirect.com/science/article/pii/S030147972102003X (дата обращения: 17.02.2026).

[56] Using sanity checks in deep neural networks predictions // AIP Conference Proceedings URL: https://pubs.aip.org/aip/acp/articleabstract/3211/1/030047/3346080/Usingsanitychecksindeepneuralnetworks (дата обращения: 17.02.2026).

[57] Visualizing the unseen: Exploring GRADCAM for interpreting convolutional image classifiers // ResearchGate URL: https://www.researchgate.net/profile/AbdelazizAbdelhamid/publication/372384935_Visualizing_the_Unseen_Exploring_GRADCAM_for_Interpreting_Convolutional_Image_Classifiers/links/64b40df5b9ed6874a524804c/VisualizingtheUnseenExploringGRADCAMforInterpretingConvolutionalImageClassifiers.pdf (дата обращения: 17.02.2026).

[58] What does LIME really see in images? // Proceedings of Machine Learning Research URL: https://proceedings.mlr.press/v139/garreau21a.html?trk=public_post_commenttext (дата обращения: 17.02.2026).

[59] Why model why? Assessing the strengths and limitations of LIME // arxiv.org URL: https://arxiv.org/abs/2012.00093 (дата обращения: 17.02.2026).

[60] Why Sanity Check for Saliency Metrics Fails? // OpenReview.net URL: https://openreview.net/forum?id=Pev2ufTzMv (дата обращения: 17.02.2026).

[61] «Why Should You Trust My Explanation?» Understanding Uncertainty in LIME Explanations // arxiv.org URL: https://arxiv.org/abs/1904.12991 (дата обращения: 17.02.2026).

[62] YOLOv6: A singlestage object detection framework for industrial applications // arxiv.org URL: https://arxiv.org/abs/2209.02976 (дата обращения: 17.02.2026).

АГЕНТНАЯ МОДЕЛЬ СОПОСТАВИМОГО ВЫБОРА ОБЪЯСНИТЕЛЯ ДЛЯ МОДЕЛЕЙ МАШИННОГО ОБУЧЕНИЯ

Авторы

Аннотация

Ключевые слова

Список литературы

Контакты

Карта сайта