Specialized clinical artificial intelligence (AI) tools are entering medical practice despite scarce independent evaluation. We quantitatively evaluate two clinical AI tools, OpenEvidence and UpToDate Expert AI, built on large language models (LLMs) against three frontier LLMs: GPT-5.2, Gemini 3.1 Pro and Claude Opus 4.6. Our evaluation has three stages: (1) 500 MedQA questions testing medical knowledge, (2) 500 HealthBench items measuring alignment with clinicians and (3) the real clinical queries (RCQ) benchmark, built from 100 de-identified queries from physicians to a general-purpose language model in a live clinical environment. For the RCQ benchmark, 12 US clinicians performed randomized, blinded review of model outputs, producing 1,800 model–question annotations. Frontier LLMs outperformed clinical AI tools in all three evaluations. Clinical AI tools performed comparably to auto-enabled Google Search AI Overview on the RCQ. These findings highlight the need for independent, real-world evaluation of AI tools before they enter clinical settings.
Accurate polyp segmentation from colonoscopy images is critical for colorectal cancer prevention, yet the generalization of deep learning models under domain shift remains insufficiently explored. We propose Boundary-Explicit Guided Attention U-Net (BEGA-UNet), a boundary-aware segmentation architecture that introduces explicit edge modeling as a structural inductive bias to enhance both segmentation accuracy and cross-domain robustness. The framework integrates three components: an Edge-Guided Module (EGM) with learnable Sobel-initialized operators to capture boundary cues, a Dual-Path Attention (DPA) module that processes channel and spatial attention in parallel, and a Multi-Scale Feature Aggregation (MSFA) module to encode contextual information across multiple receptive fields. Evaluated on the combined Kvasir-SEG and CVC-ClinicDB benchmarks, BEGA-UNet achieves 88.53% Dice and 82.51% IoU, outperforming representative convolutional and transformer-based baselines. More importantly, cross-dataset evaluation demonstrates strong robustness under domain shift, with BEGA-UNet retaining 83.2% of its in-distribution performance--substantially higher than U-Net (64.5%), Attention U-Net (47.5%), and TransUNet (53.1%). In a zero-shot setting on an entirely unseen dataset, the model further maintains 72.6% performance retention. Comprehensive ablation studies indicate that explicit boundary modeling plays a central role in improving generalization, while multi-scale context aggregation further stabilizes performance across domains. Feature distribution analyses support this observation by showing that edge-oriented representations exhibit markedly reduced cross-domain variability compared to appearance-driven features. Overall, BEGA-UNet provides an effective and interpretable solution for robust polyp segmentation, demonstrating that explicit boundary modeling serves as a critical inductive bias for ensuring reliability under clinical domain shifts.
Existing evidence indicates that children and adolescents experiencing bullying victimization (BV) exhibit mental health deterioration, and such effects can persist into adulthood. As current decision-making tools are scarce, we aim to develop a tool to predict subsequent BV risk among Chinese youth. Data were retrieved from a three-wave prospective study which incorporated into the Mental Health Survey for Children and Adolescents in Yunnan (MHSCAY). Six common machine learning (ML) algorithms were used. We internally validated the models using 500 times bootstrap approach to assess discrimination, calibration, and utility. A total of 5345 participants aged 10–17 years completed the survey. The internal validation showed the logistic regression (LR) model slightly outperformed other ML algorithms and exhibited more evenly distributed individual-level prediction uncertainty. It was therefore selected as the final model, achieving an AUROC of 0.800 (95% CI: 0.785, 0.815), AUPRC of 0.519 (95% CI: 0.483, 0.553), calibration intercept of -0.001 (95% CI: -0.076, 0.069), calibration slope of 0.990 (95% CI: 0.930, 1.059), and Brier score of 0.122 (95% CI: 0.117, 0.128). Furthermore, the calibration plot indicated excellent precision, and positive net benefits were observed across broad threshold ranges. Fairness analysis revealed no predictive bias in key subpopulations. This novel predictive tool utilizes seven baseline predictors that are readily accessible to generate accurate, individualized predictions of subsequent BV risk in children and adolescents. Upon further validation, the model may facilitate risk stratification, thereby guiding resource allocation and informing targeted interventions for potential BV crises among Chinese children and adolescents.
Immune checkpoint blockade therapy has revolutionized cancer treatment and demonstrated significant clinical efficacy. However, conventional monoclonal antibody therapeutics still face numerous limitations. Peptide inhibitors, with their low molecular weight, ease of synthesis, cost-effectiveness, and minimal immunogenicity, offer a promising alternative by combining the high specificity of antibodies with the favorable tissue penetration of small molecules. As such, they represent a key direction for overcoming existing therapeutic bottlenecks and developing next-generation immunotherapies. Despite facing key challenges in clinical translation, particularly regarding metabolic stability and oral bioavailability, peptide-based inhibitors hold considerable potential to bridge the gap between antibodies and small-molecule drugs, positioning them as an important component of next-generation cancer immunotherapy. Currently, research in this field is increasingly shifting from traditional empirical screening to intelligent precision design, employing strategies such as rational design based on hotspot amino acids, AI-assisted drug discovery, and advanced delivery systems to optimize the activity, stability, and targeting properties of peptides. This review systematically outlines recent advances in immune checkpoint peptide-based inhibitors, aiming to provide a theoretical foundation for the rational design and clinical translation of this emerging class of therapeutics.
Spinal cord injury (SCI) is a highly disabling central nervous system disease with complex pathology, and targeted neuroprotective drugs remain clinically lacking. However, traditional molecular target screening and drug prediction methods are inefficient, costly, and poorly targeted, failing to meet clinical precision treatment needs. To address this, we introduced machine learning to construct a multi-dimensional data integration framework. First, we established normal, acute- and subacute-phase SCI mouse complete transection models, and RNA-seq combined with single-cell sequencing revealed acute-phase may occur extensive neuronal PANoptosis. Using WGCNA and MCC algorithms, 25 candidate genes for extensive neuronal PANoptosis in the acute phase were screened out. Then, we comprehensively applied machine learning algorithms including Elastic Net-GLM, Random Forest, Support Vector Machine, and LASSO to predict and prioritize potential molecular targets, identifying 13 possible core genes for extensive neuronal PANoptosis, including Tacc3, Aurka, Mcm6, Mcm5, Ripk1, etc. With the help of the Connectivity Map, drug prediction was performed on these 13 genes, and the 8 candidate drugs with neuroprotective effects were screened out. Through protein domain screening, it was verified via proof-by-contradiction assays that the drug Xaliproden can establish robust interactions with the 7XMK, 7FCZ and 7FD0 domains of Ripk1, a core molecule of the PANoptosome, via a network of multiple hydrogen bonds. This finding provides a novel screening strategy for neuroprotective drugs for spinal cord injury and is of great significance for promoting the establishment of a precision treatment system for the acute phase of injury.
Non-pedunculated colonic neoplasia (NPCN) is increasingly encountered due to expanded bowel cancer screening and improvements in high-definition endoscopy. Flat and sessile lesions carry higher risks of incomplete resection, recurrence and submucosal invasion than pedunculated polyps, making accurate optical diagnosis and appropriate technique selection essential for high-quality care. This review synthesises evidence from 2016 to 2026 providing a contemporary practice-focused update for clinicians delivering endoscopic resection services.
Advances in optical characterisation, including Narrow-band imaging International Colorectal Endoscopic classification, Japan NBI Expert Team classification and Kudo pit pattern classifications, have improved real-time prediction of histology and invasion depth, supporting decision-making between cold resection, endomucosal resection (EMR), endoscopic submucosal dissection (ESD) and surgical referral. Cold snare polypectomy and cold EMR have become preferred techniques for small and intermediate lesions due to excellent safety profiles and high complete resection rates. For larger lesions (≥20 mm), piecemeal EMR with systematic margin ablation using snare-tip soft coagulation now represents the standard of care, reducing recurrence to below 10%.
Emerging techniques such as underwater EMR, cap-assisted EMR and endoscopic full-thickness resection expand therapeutic options for fibrotic or non-lifting lesions. ESD remains crucial for en bloc resection when superficial submucosal invasion is suspected, though its use varies across the UK and international practice due to differences in training pathways and service configuration.
Aging populations face growing multimorbidity, while episodic clinical assessments fail to capture gradual physiological changes unfolding during daily life. Although wearable technologies enable continuous monitoring, single-modality systems provide incomplete and context-limited insight. This Perspective focuses on hybrid wearable sensors that integrate physical and chemical sensing for geriatric healthcare. Hybrid wearable sensing provides a pathway toward continuous, predictive, and personalized geriatric health management. By monitoring continuously multiple health parameters, such multimodal systems have distinct advantages for real-time monitoring, including early risk detection and more personalized health assessment through the integration of complementary physical and biochemical signals. We discuss recent advances in wearable physical sensors, alongside with emerging wearable chemical sensors, then argue that chem-phys hybrid integration enables more interpretable and clinically actionable assessment of aging trajectories than single-modality wearable systems. Finally, we discuss translational requirements and future prospects, including robust real-world operation, AI-driven inference, and integration with telemedicine and home-based care.
Objective Labral tears are common in young adults participating in high-impact physical activity. No longitudinal studies have investigated whether labral tears expedite cartilage loss. This longitudinal study investigated the association between labral tears and cartilage loss in young adults participating in high-impact physical activity.
Methods Study participants were high-impact athletes (soccer or Australian football) with and without hip and/or groin pain from the femoroacetabular impingement and hip osteoarthritis cohort study who underwent unenhanced 3T MRI at baseline and after 2–3 years. Labral tears and cartilage defects were scored semi-quantitatively. At baseline, labral tear presence (grade ≥2), subregion, number of subregions affected by labral tear, maximal labral score, labral sum score and presence of paralabral cysts were determined. A cartilage sum change score (difference between baseline and follow-up) was calculated for each hip (0–20). Negative binomial regression models were used to estimate if baseline labral tears were associated with cartilage loss, adjusting for sex, age, body mass index, alpha angle and baseline cartilage sum score.
Results 173 (343 hips) participants (82% with hip and/or groin pain; 22% women) with a median age of 26 years were included. The median International Hip Outcome Tool-33 total score was 69 at baseline. Follow-up MRIs were collected at a median of 2.1 years after baseline. 150 (87%) participants were still participating in high-impact physical activity at follow-up. Weak-to-moderate strength associations were identified between cartilage sum change score and baseline anterior labral tears (adjusted incidence rate ratio (a)IRR 1.46; 95% CI 1.01 to 2.09), paralabral cysts (aIRR 1.38; 95% CI 1.02 to 1.86) and labral sum score (aIRR 1.05; 95% CI 1.00 to 1.09).
Artificial intelligence-driven pain monitoring is emerging as a transformativeapproach in sports medicine by enabling objective, continuous, and individualizedassessment of pain-related dysfunction, injury risk, rehabilitation progress, and return-to- play readiness. Conventional pain assessment relies primarily on self-report, clinicianobservation, and periodic functional testing, which may be limited by athleteunderreporting, subjective variability, and poor capture of dynamic changes duringtraining or competition. This review examines the integration of wearable sensors, computer vision, facial expression recognition, workload analytics, and predictivemodeling for injury management. Wearable technologies provide real-time biomechanical
and physiological data, including movement asymmetry, joint loading, gait deviation, fatigue, and recovery status. Computer vision and pose estimation support non-invasiveanalysis of posture, motion quality, compensatory movement, and exercise performance. Predictive analytics and machine learning models may identify injury risk patterns, guiderehabilitation progression, and assist return-to-play decision-making. Multimodal sensor
fusion further strengthens pain monitoring by combining clinical, behavioral, biomechanical, physiological, and athlete-reported data into more comprehensivedecision-support systems. Despite this potential, challenges remain in data quality, model
validation, algorithmic transparency, device accuracy, privacy, and clinical
implementation. Future AI systems should be explainable, sport-specific, ethicallygoverned, and integrated with clinician oversight to improve personalized injuryprevention and recovery management.
Background and objectives: Pediatric large-volume brain arteriovenous malformations (AVMs) carry a substantial lifelong hemorrhage risk, neurological symptoms, and treatment morbidity. Single-session stereotactic radiosurgery (SRS) is often unsuitable due to constraints on dose-volume toxicity. Volume-staged SRS (VS-SRS) enables sequential dosing of large nidus volumes, potentially enhancing safety while maintaining efficacy. Evidence in children remains limited. We aimed to evaluate outcomes of VS-SRS for large AVMs in pediatric patients.
Methods: A multicenter retrospective cohort was assembled from 21 centers, including patients aged younger than 21 years treated with VS-SRS for AVMs >10 cm3. Clinical and radiological end points included obliteration, hemorrhage, and permanent symptomatic radiation-induced changes (RIC).
Results: A total of 103 patients were included (median age 14 years; IQR, 12-17). The median nidus volume at first stage was 18.2 cm3 (IQR, 12.3-25.6). Median prescription dose per stage was 17 Gy (IQR, 16-18). The median clinical follow-up from the first stage was 57.5 months (IQR, 25-138). Obliteration occurred in 42 of 103 patients (40.8%), with actuarial rates of 6.9% (95% CI: 2.8-14) at 3 years and 29% (95% CI: 20-39) at 5 years. Hemorrhage occurred in 17 of 103 patients (16.5%) during follow-up, and permanent RIC was observed in 9 of 103 patients (8.7%).
Alzheimer's disease (AD) plasma and cerebrospinal fluid (CSF) proteomics can distinguish AD from cognitively normal controls, but the generalizability of machine learning performance and the recurrence of biological signals across datasets require cautious interpretation. We developed an explainable artificial intelligence framework spanning two fluids and four ADNI proteomic datasets, covering 2082 modality specific samples, all analysed internally within ADNI. Phase 1 analysed plasma using a 119 analyte NULISA and targeted UPENN panel (n = 727; 216 CE, 511 controls). Phase 2 extended the analysis to CSF using SOMAscan7k, TMT-MS and targeted SET2, with Elecsys Aβ42, Aβ40, total tau and p-tau181 as anchor biomarkers. Only SOMAscan was subject-independent relative to Phase 1 plasma; TMT-MS and SET2 overlapped with Phase 1 for 96.0% and 97.7% of subjects and therefore are not independent replication cohorts. Under subject-level splits with fold internal preprocessing, we compared Elastic Net, Explainable Boosting Machines and gradient boosted trees with SHAP-based explanations.
Among the candidate pipelines, we selected the pipeline with the highest held-out test ROC AUC for each platform; the selected values were 0.927 in plasma and 0.954–0.973 across the three CSF datasets. Because the same held out test performance was used for pipeline selection and headline reporting, these are optimistically selected single-holdout estimates, not unbiased estimates of generalizable or clinical performance. Explanations identified five recurring biological axes within ADNI: cholinergic (ACHE), tau/14–3-3 (YWHAG, YWHAZ, YWHAB, YWHAE), neuro-axonal (NEFL, NEFH), microglial/complement (CHIT1, SMOC1, CHI3L1, C7, CFH) and synaptic (NPTXR, NPTX2, DLG4, SYT5, VSNL1, ELAVL2). CSF analyses showed synaptic vesicle-cycle enrichment (q = 2 × 10−6), and CSF YWHAG correlated strongly with total tau (ρ = 0.87). Cross-fluid directional concordance was modest overall (54–57%) but increased to 73–80% among mapped analyte/protein rows reaching q < 0.05 in CSF. These findings provide hypothesis-generating, internally supported evidence within ADNI. Independent external cohorts with locked pipelines are required to evaluate generalizable performance and biological reproducibility; the overlapping TMT-MS and SET2 analyses should not be interpreted as independent replication.
Background: With increasing interest in integrating Artificial Intelligence (AI) in the healthcare of Saudi Arabia, many benefits of using AI in the analysis of data, recognition of patterns, and early detection of diseases are being recognized by many physicians in the pediatric hematology-oncology clinical practice in Saudi Arabia. However, there are many concerns that AI has limitations, there are many ethical issues, and there is potential for job displacement. The objectives of this study are to explore the attitudes and perceptions of the physicians in the pediatric hematology-oncology clinical practice in Saudi Arabia towards the use of AI. Methods: A cross-sectional study carried out in Tabuk included 132 medical doctors. An online questionnaire consisted of a number of structured questions. Results: The study found that 74.2% of physicians had a positive view of the integration of AI in their clinical work. Significantly related to their views were their professional title (p=0.011) and years of work experience (p=0.019). The main advantages of using AI of 74.2% of participating physicians were stated to be efficiency, accuracy and cost-effectiveness. Other advantages and disadvantages were mentioned by 12.1% and 71.9% of participants respectively. The main challenges stated by them were privacy, interpretability and ethical concerns. However, 56.8% of the participating physicians were of the opinion that in using AI human physicians should remain primary decision makers for patients' treatment. Most of the participating physicians saw potential of AI in diagnosis and treatment of patients. Conclusions: Attitudes towards AI among pediatric hematology-oncology physicians are overwhelmingly positive as they recognize AI's potential to be very efficient and accurate. The vast majority of these physicians consider AI to be a complement to human capabilities and emphasize the need for physicians to oversee decisions made by AI.
The human gut microbiome-host system represents a recently unleashed chemo-biological realm of crucial importance in human biology and health. So much so that new therapeutic approaches targeting it are emerging to prevent and treat a broad range of conditions, including metabolic and cardiovascular diseases, infectious disorders, cancer, and neurodegeneration. From a drug discovery standpoint, this paradigm offers several distinctive advantages: it introduces novel therapeutic modalities (such as fecal microbiota transplantation, probiotics, prebiotics, and postbiotics), expands the biological search space to include the gut metagenome, unlocks new chemical space through microbial metabolites, and enables gut-localized pharmacokinetics with the potential to reduce systemic exposure and off-target effects. However, realizing this therapeutic potential critically depends on establishing causal links between specific microbiome features, microbial metabolites, and disease phenotypes. Achieving such causality requires the integration of diverse experimental and computational approaches across multiple scales, including epidemiological and clinical studies, metagenomics and longitudinal multi-omic profiling, gnotobiotic animal models, strain isolation and cultivation, biochemical and molecular analyses, and synthetic biology—supported by machine learning, bioinformatics, and cheminformatics. In this Perspective, we provide a concise overview of this rapidly evolving field. We review the gut microbiome–host system and the principal tools used to interrogate it, with an emphasis on approaches that enable causality inference. We further evaluate current strategies for therapeutic intervention and conclude with an assessment of key achievements to date, as well as the major challenges and opportunities that will shape the future of microbiome-based drug discovery.
Pasteurella multocida is a major zoonotic pathogen causing fowl cholera and other animal-derived infectious diseases, posing significant threats to the poultry industry and public health. Rapidly identifying its host preference is critical for disease control and understanding cross-species transmission. This study collected genomic data of P. multocida from four key hosts (chicken, cat, pig, cattle) in the NCBI Pathogen Database, yielding 814 qualified strains. Phylogenetic analysis showed host-correlated branching: cat strains formed a distinct clade, while cattle, pig, and chicken strains clustered with host-specific sub-clades. Multi-locus sequence typing (MLST) revealed strong host-ST associations (cattle with ST1, pig with ST11/10/3) and weaker links (cat with ST30, chicken with ST124). This single nucleotide polymorphism (SNP) matrix, alongside the host origin data, served as the feature set for training and evaluating five machine learning models: Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), Light Gradient Boosting Machine (LightGBM), and Extreme Gradient Boosting (XGBoost). Models performed excellently on test sets (receiver operating characteristic area under the curve, ROC-AUC: 0.962–0.973; accuracy: 0.8528–0.865), with lower precision for chicken strains. SHAP (SHapley Additive exPlanations) analysis identified high-impact SNPs: XGBoost relied on key SNPs, while RF showed distributed contributions. Genes with these SNPs enriched in metabolic processes, quorum sensing, and beta-lactam resistance, indicating common and host-specific adaptation strategies. Misclassifications (cat-chicken, cattle-pig) suggested cross-host transmission risks. This study integrates phylogenetics, machine learning, and functional genomics to decipher the genetic basis of P. multocida host adaptation, providing insights for fowl cholera control and cross-species transmission early warning with implications for poultry health management.