Arrow Research search

Author name cluster

Xia Ning

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers
2 author rows

Possible papers

5

NeurIPS Conference 2025 Conference Paper

Analyzing Fine-Grained Alignment and Enhancing Vision Understanding in Multimodal Language Models

  • Jiachen Jiang
  • Jinxin Zhou
  • Bo Peng
  • Xia Ning
  • Zhihui Zhu

Achieving better alignment between vision embeddings and Large Language Models (LLMs) is crucial for enhancing the abilities of Multimodal LLMs (MLLMs), particularly for recent models that rely on powerful pretrained vision encoders and LLMs. A common approach to connect the pretrained vision encoder and LLM is through a projector applied after the vision encoder. However, the projector is often trained to enable the LLM to generate captions, and hence the mechanism by which LLMs understand each vision token remains unclear. In this work, we first investigate the role of the projector in compressing vision embeddings and aligning them with word embeddings. We show that the projector significantly compresses visual information, removing redundant details while preserving essential elements necessary for the LLM to understand visual content. We then examine patch-level alignment---the alignment between each vision patch and its corresponding semantic words---and propose a $\textit{multi-semantic alignment hypothesis}$. Our analysis indicates that the projector trained by caption loss improves patch-level alignment but only to a limited extent, resulting in weak and coarse alignment. To address this issue, we propose $\textit{patch-aligned training}$ to efficiently enhance patch-level alignment. Our experiments show that patch-aligned training (1) achieves stronger compression capability and improved patch-level alignment, enabling the MLLM to generate higher-quality captions, (2) improves the MLLM's performance by 16% on referring expression grounding tasks, 4% on question-answering tasks, and 3% on modern instruction-following benchmarks when using the same supervised fine-tuning (SFT) setting. The proposed method can be easily extended to other multimodal models.

ICLR Conference 2025 Conference Paper

ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery

  • Ziru Chen
  • Shijie Chen
  • Yuting Ning
  • Qianheng Zhang
  • Boshi Wang
  • Botao Yu
  • Yifei Li 0005
  • Zeyi Liao

The advancements of language language models (LLMs) have piqued growing interest in developing LLM-based language agents to automate scientific discovery end-to-end, which has sparked both excitement and skepticism about the true capabilities of such agents. In this work, we argue that for an agent to fully automate scientific discovery, it must be able to complete all essential tasks in the workflow. Thus, we call for rigorous assessment of agents on individual tasks in a scientific workflow before making bold claims on end-to-end automation. To this end, we present ScienceAgentBench, a new benchmark for evaluating language agents for data-driven scientific discovery. To ensure the scientific authenticity and real-world relevance of our benchmark, we extract 102 tasks from 44 peer-reviewed publications in four disciplines and engage nine subject matter experts to validate them. We unify the target output for every task to a self-contained Python program file and employ an array of evaluation metrics to examine the generated programs, execution results, and costs. Each task goes through multiple rounds of manual validation by annotators and subject matter experts to ensure its annotation quality and scientific plausibility. We also propose two effective strategies to mitigate data contamination concerns. Using our benchmark, we evaluate five open-weight and proprietary LLMs, each with three frameworks: direct prompting, OpenHands, and self-debug. Given three attempts for each task, the best-performing agent can only solve 32.4% of the tasks independently and 34.3% with expert-provided knowledge. These results underscore the limited capacities of current language agents in generating code for data-driven discovery, let alone end-to-end automation for scientific research.

ICML Conference 2024 Conference Paper

eCeLLM: Generalizing Large Language Models for E-commerce from Large-scale, High-quality Instruction Data

  • Bo Peng 0009
  • Xinyi Ling
  • Ziru Chen
  • Huan Sun 0001
  • Xia Ning

With tremendous efforts on developing effective e-commerce models, conventional e-commerce models show limited success in generalist e-commerce modeling, and suffer from unsatisfactory performance on new users and new products – a typical out-of-domain generalization challenge. Meanwhile, large language models (LLMs) demonstrate outstanding performance in generalist modeling and out-of-domain generalizability in many fields. Toward fully unleashing their power for e-commerce, in this paper, we construct ECInstruct, the first open-sourced, large-scale, and high-quality benchmark instruction dataset for e-commerce. Leveraging ECInstruct, we develop eCeLLM, a series of e-commerce LLMs, by instruction-tuning general-purpose LLMs. Our comprehensive experiments and evaluation demonstrate that eCeLLM models substantially outperform baseline models, including the most advanced GPT-4, and the state-of-the-art task-specific models in in-domain evaluation. Moreover, eCeLLM exhibits excellent generalizability to out-of-domain settings, including unseen products and unseen instructions, highlighting its superiority as a generalist e-commerce model. Both the ECInstruct dataset and the eCeLLM models show great potential in empowering versatile and effective LLMs for e-commerce. ECInstruct and eCeLLM models are publicly accessible through this link.

JBHI Journal 2019 Journal Article

Mining Directional Drug Interaction Effects on Myopathy Using the FAERS Database

  • Danai Chasioti
  • Xiaohui Yao
  • Pengyue Zhang
  • Samuel Lerner
  • Sara K. Quinney
  • Xia Ning
  • Lang Li
  • Li Shen

Mining high-order drug-drug interaction (DDI) induced adverse drug effects from electronic health record databases is an emerging area, and very few studies have explored the relationships between high-order drug combinations. We investigate a novel pharmacovigilance problem for mining directional DDI effects on myopathy using the FDA Adverse Event Reporting System (FAERS) database. Our paper provides information on the risk of myopathy associated with adding new drugs on the already prescribed medication, and visualizes the identified directional DDI patterns as user-friendly graphical representation. We utilize the Apriori algorithm to extract frequent drug combinations from the FAERS database. We use odds ratio to estimate the risk of myopathy associated with directional DDI. We create a tree-structured graph to visualize the findings for easy interpretation. Our method confirmed myopathy association with previously reported HMG-CoA reductase inhibitors like rosuvastatin, fluvastatin, simvastatin, and atorvastatin. New, previously unidentified but mechanistically plausible associations with myopathy were also observed, such as the DDI between pamidronate and levofloxacin. Additional top findings are gadolinium-based imaging agents, which however are often used in myopathy diagnosis. Other DDIs with no obvious mechanism are also reported, such as that of sulfamethoxazole with trimethoprim and potassium chloride. This study shows the feasibility to estimate high-order directional DDIs in a fast and accurate manner. The results of the analysis could become a useful tool in the specialists’ hands through an easy-to-understand graphic visualization.

IJCAI Conference 2015 Conference Paper

Multi-Task Multi-Dimensional Hawkes Processes for Modeling Event Sequences

  • Dixin Luo
  • Hongteng Xu
  • Yi Zhen
  • Xia Ning
  • Hongyuan Zha
  • Xiaokang Yang
  • Wenjun Zhang

We propose a Multi-task Multi-dimensional Hawkes Process (MMHP) for modeling event sequences where there exist multiple triggering patterns within sequences and structures across sequences. MMHP is able to model the dynamics of multiple sequences jointly by imposing structural constraints and thus systematically uncover clustering structure among sequences. We propose an effective and robust optimization algorithm to learn MMHP models, which takes advantage of alternating direction method of multipliers (ADMM), majorization minimization and Euler-Lagrange equations. Our experimental results demonstrate that MMHP performs well on both synthetic and real data.