Research Papers

Latest healthcare AI research from PubMed and arXiv. Discover breakthroughs in medical imaging, drug discovery, clinical decision support, and more.

Featured Research

Breakthrough papers handpicked by our editors

FeaturedResearch Paper

General-purpose large language models outperform specialized clinical AI tools on medical benchmarks

Specialized clinical artificial intelligence (AI) tools are entering medical practice despite scarce independent evaluation. We quantitatively evaluate two clinical AI tools, OpenEvidence and UpToDate Expert AI, built on large language models (LLMs) against three frontier LLMs: GPT-5.2, Gemini 3.1 Pro and Claude Opus 4.6. Our evaluation has three stages: (1) 500 MedQA questions testing medical knowledge, (2) 500 HealthBench items measuring alignment with clinicians and (3) the real clinical queries (RCQ) benchmark, built from 100 de-identified queries from physicians to a general-purpose language model in a live clinical environment. For the RCQ benchmark, 12 US clinicians performed randomized, blinded review of model outputs, producing 1,800 model–question annotations. Frontier LLMs outperformed clinical AI tools in all three evaluations. Clinical AI tools performed comparably to auto-enabled Google Search AI Overview on the RCQ. These findings highlight the need for independent, real-world evaluation of AI tools before they enter clinical settings.

Krithik Vishwanath, Anton Alyakin, et al.

Jun 12, 2026

FeaturedResearch PapermedRxiv

BEGA-UNet: Boundary-Explicit Guided Attention U-Net with Multi-Scale Feature Aggregation for Colonoscopic Polyp Segmentation

Accurate polyp segmentation from colonoscopy images is critical for colorectal cancer prevention, yet the generalization of deep learning models under domain shift remains insufficiently explored. We propose Boundary-Explicit Guided Attention U-Net (BEGA-UNet), a boundary-aware segmentation architecture that introduces explicit edge modeling as a structural inductive bias to enhance both segmentation accuracy and cross-domain robustness. The framework integrates three components: an Edge-Guided Module (EGM) with learnable Sobel-initialized operators to capture boundary cues, a Dual-Path Attention (DPA) module that processes channel and spatial attention in parallel, and a Multi-Scale Feature Aggregation (MSFA) module to encode contextual information across multiple receptive fields. Evaluated on the combined Kvasir-SEG and CVC-ClinicDB benchmarks, BEGA-UNet achieves 88.53% Dice and 82.51% IoU, outperforming representative convolutional and transformer-based baselines. More importantly, cross-dataset evaluation demonstrates strong robustness under domain shift, with BEGA-UNet retaining 83.2% of its in-distribution performance--substantially higher than U-Net (64.5%), Attention U-Net (47.5%), and TransUNet (53.1%). In a zero-shot setting on an entirely unseen dataset, the model further maintains 72.6% performance retention. Comprehensive ablation studies indicate that explicit boundary modeling plays a central role in improving generalization, while multi-scale context aggregation further stabilizes performance across domains. Feature distribution analyses support this observation by showing that edge-oriented representations exhibit markedly reduced cross-domain variability compared to appearance-driven features. Overall, BEGA-UNet provides an effective and interpretable solution for robust polyp segmentation, demonstrating that explicit boundary modeling serves as a critical inductive bias for ensuring reliability under clinical domain shifts.

Tong, T., Zhang, W., Zu, W.

Mar 6, 2026

Research Paper

Machine learning algorithms in jointly developing prediction models for subsequent school bullying victimization in Chinese children and adolescents

Existing evidence indicates that children and adolescents experiencing bullying victimization (BV) exhibit mental health deterioration, and such effects can persist into adulthood. As current decision-making tools are scarce, we aim to develop a tool to predict subsequent BV risk among Chinese youth. Data were retrieved from a three-wave prospective study which incorporated into the Mental Health Survey for Children and Adolescents in Yunnan (MHSCAY). Six common machine learning (ML) algorithms were used. We internally validated the models using 500 times bootstrap approach to assess discrimination, calibration, and utility. A total of 5345 participants aged 10–17 years completed the survey. The internal validation showed the logistic regression (LR) model slightly outperformed other ML algorithms and exhibited more evenly distributed individual-level prediction uncertainty. It was therefore selected as the final model, achieving an AUROC of 0.800 (95% CI: 0.785, 0.815), AUPRC of 0.519 (95% CI: 0.483, 0.553), calibration intercept of -0.001 (95% CI: -0.076, 0.069), calibration slope of 0.990 (95% CI: 0.930, 1.059), and Brier score of 0.122 (95% CI: 0.117, 0.128). Furthermore, the calibration plot indicated excellent precision, and positive net benefits were observed across broad threshold ranges. Fairness analysis revealed no predictive bias in key subpopulations. This novel predictive tool utilizes seven baseline predictors that are readily accessible to generate accurate, individualized predictions of subsequent BV risk in children and adolescents. Upon further validation, the model may facilitate risk stratification, thereby guiding resource allocation and informing targeted interventions for potential BV crises among Chinese children and adolescents.

Showing 1 to 12 of 590 results

Page 1 of 50