366+ open-access research outputs.
Comparative analyses of phylogenetic trees typically require identical taxon sets, however, in practice, trees often include distinct but overlapping taxa. Pruning non-shared leaves discards phylogene…
We present KS-PRET-5M, the largest publicly available pretraining dataset for the Kashmiri language, comprising 5,090,244 (5.09M) words, 27,692,959 (27.6M) characters, and a vocabulary of 295,433 (295…
The aim of survey statistics is to produce estimates with a minimal bias and a corresponding acceptable variance given a specific budget, preferable with a minor response burden for the participants. …
Tashiro and Tachibana proved that there exist no totally umbilical hypersurfaces in complex space forms with nonzero constant holomorphic sectional curvature, and it is also known that the shape opera…
Rapid identification of outbreaks in hospitals is essential for controlling pathogens with epidemic potential. Although whole genome sequencing (WGS) remains the gold standard in outbreak investigatio…
Kashmiri is spoken by around 7 million people but remains critically underserved in speech technology, despite its official status and rich linguistic heritage. The lack of robust Text-to-Speech (TTS)…
Let $k\ge1$ be an integer, and $(M,g)$ be a smooth, closed Riemannian manifold of dimension $2k+1\le n\le 2k+3$, or $(M,g)$ be locally conformally flat of dimension $n\ge 2k+1$. Applying the Bahri-C…
Accurate and timely identification of hospital outbreak clusters is crucial for preventing the spread of infections that have epidemic potential. While assessing pathogen similarity through whole geno…
AI agents are increasingly deployed in production, yet their security evaluations remain bottlenecked by manual red-teaming or static benchmarks that fail to model adaptive, multi-turn adversaries. We…
Large Language Models (LLMs) excel at language understanding but remain limited in knowledge-intensive domains due to hallucinations, outdated information, and limited explainability. Text-based retri…
We explore machine translation for five Turkic language pairs: Russian-Bashkir, Russian-Kazakh, Russian-Kyrgyz, English-Tatar, English-Chuvash. Fine-tuning nllb-200-distilled-600M with LoRA on synthet…
Advances in multi-modal large language models (MLLMs) have inspired time series understanding and reasoning tasks, that enable natural language querying over time series, producing textual analyses of…
Recent simulations have identified long-lived ``prompt cusps'' -- compact remnants of early density peaks with inner profiles $\rho\propto r^{-3/2}$. They can survive hierarchical assembly and potent…
Computational reproducibility, the possibility for independent researchers to exactly reproduce published empirical results, is fundamental to science. Despite its importance, the proportion of resear…
Optical Character Recognition (OCR) for low-resource languages remains a significant challenge due to the scarcity of large-scale annotated training datasets. Languages such as Kashmiri, with approxim…
In parts of Himachal Pradesh (Kullu and Mandi) and the Western Himalaya, village deities (\emph{devt\=a}) are carried through the landscape on shoulder-borne palanquins or ``raths.'' Participants ofte…
Large Language Models (LLMs) demonstrate remarkable fluency across high-resource languages yet consistently fail to generate coherent text in Kashmiri, a language spoken by approximately seven million…
This technical report presents the 600K-KS-OCR Dataset, a large-scale synthetic corpus comprising approximately 602,000 word-level segmented images designed for training and evaluating optical charact…
Quantitative characterization of cellular spatial organization is critical for understanding tumor progression and immune response. Recent advances in artificial intelligence (AI) enable large-scale s…
Matrix Assisted Laser Desorption/Ionization Mass Spectrometry (MALDI-MS) is a cornerstone in biomolecular analysis, offering precise identification of pathogens through unique mass spectral signatures…
Free open-access publishing with Google Scholar indexing.
Submission Guide →