Abhijay Sai Paladugu

About Me

I am a Master's student in Computational Data Science at Carnegie Mellon University, with expertise in machine learning, data science, and artificial intelligence. My research interests include Retrieval-Augmented Generation (RAG), working with LLM's, Natural Language Processing, Recommednation Systems and link prediction algorithms.

With experience in developing ML models for real-world applications and conducting research in various ML domains, I am passionate about creating solutions that make a significant impact.

Education

Carnegie Mellon University

Master of Computational Data Science

GPA: 4.0/4.0 | Pittsburgh, PA | 12/2025

Vellore Institute of Technology

B.Tech CSE with spec. in Data Science

GPA: 9.2/10 | Vellore, India | 05/2024

Professional Experience

08/2025 – Present

Data Science and Research Fellow

McKinsey & Company | Pittsburgh, PA

Creating data-driven solutions by building interview chatbots, automating data preparation workflows, & developing analytics dashboard websites.
Worked on building an app with an agentic system that performs taxonomy mappings from various badly formatted sources to a standardized taxonomy. The app also has a chatbot to help with answering questions and performing calculations on the data, and give well-formatted answers specific to client use cases.
Built a chatbot compatible with Power BI dashboards that uses an agent to route between 3 actions: RAG system for knowledge-based questions, Analytics SQL agent for data-related questions, and conversational agent for general normal conversations with the agent.

04/2025 – 08/2025

Machine Learning Research Intern

LTI, SCS CMU | Pittsburgh, PA

Built Critique-based RL agents (GRPO variant on VERL) for multi-hop, citation-grounded outputs; reduced GPU memory usage by >50% and improved training efficiency via sequence parallelism and GPU offloading.
Developed healthcare-specific DeepResearch agents for long-form reasoning on complex medical queries using structured prompting.

February 2025 - May 2025

Part-time / Research Assistant

Carnegie Mellon University | Pittsburgh, PA

Working on AI Patent Classification – Enhancing classifiers to identify AI-related patents from a dataset of 1M+ US patents, with the eventual goal of quantifying the effect of AI on the economies of various countries.

May 2023 - January 2024

Data Science Intern

Now Analytics, LLC | Davie, FL

Developed a machine-learning model to predict daily customer call volumes and staffing needs, replacing a manual weekly process saving 6-7 hours of manual work per day, resulting in significant cost savings.
Implemented Time series models like ARIMA and SARIMAX, reducing error percentage (MAPE) from 18.2% to 3.9%.
Designed a Hiring app for call center applicants integrating OpenAI's WhisperAI for audio handling (gave above 95% accuracy for our audio file transcripts) and TFIDF and Cosine similarity to score the audio quality of applicants with ReactJS and Django tech stack and SQLite DB.

May 2022 - September 2022

Machine Learning Intern

Curvelogics Advanced Technology Solutions | Trivandrum, India

Extracted vital information about the companies/job applicants using Spacy Transformers.
Visualized data to build a report using Python (WordCloud, Bar/Pie charts, etc) and identified trends and key skills sought by hiring teams.
Evaluated various ML classifiers like SVC, MLPClassifier, DecisionTreeClassifier, AdaBoostClassifier, RandomForest, Keras (Neural Networks) for sentiment analysis on tweets and Amazon reviews, achieving an AUC score of 0.73.

Research Work

Behavior Priming for RL and SFT trained Agentic Search

Submitting to ACL 2026

CMU, Pittsburgh | 06/2025 – 08/2025

https://arxiv.org/abs/2510.06534

Introduced Behavior Priming, a Supervised Fine-Tuning strategy with 4 essential behaviors, enabling more effective RL training for search agents.
Improved Qwen3-1.7B's post-RL performance from 13.9 → 22.3 across GAIA, WebWalker, and HLE benchmarks; sustained high policy entropy prevented premature policy collapse, unlocking greater self-improvement.

DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research

Under review at "ICLR 2026"

CMU, Pittsburgh | 02/2025 – 05/2025

https://arxiv.org/abs/2505.19253

Designed a standardized evaluation framework for DeepResearch agents (GPTResearcher, WebThinker, etc.) with reproducible pipelines and custom-built retrieval/search APIs.
Established benchmarking protocols enabling fair comparison of long-form reasoning agents, faster research in agentic search and evaluation.

Link Prediction using Single-Valued Neutrosophic Sets (SVNS)

Under review by "Expert Systems with Applications" Springer Journal

VIT, Vellore | May 2023 - August 2023

Introduced a unique framework for link prediction which can be used for movie recommendation systems, leveraging SVNS, cosine similarity, and regression models and tested on 3 datasets: MovieLens, MovieTweetings, and Anime Recommendation Dataset.
Assessed 20 ML regression models, achieving R² values of 0.91, 0.94, and 0.75, nearly doubling the baseline performance.
Models included Extra Trees, LightGBM, and XGBoost, and significant decreases in MAE, MSE, RMSE, RMSLE, MAPE were observed.

Detecting Artificially Generated Images using SWIN with Explainable AI

Under review by "Pattern Recognition" Elsevier Journal

VIT, Vellore | January 2024 - May 2024

Introduced a new architecture using SWIN Transformers and Grad-CAM to detect if an image is AI-generated or real and compared the performance of 4 variants of the model against various state-of-the-art (SOTA) detection models.
Trained the model using 3 SOTA image generation models namely Stable Diffusion XL, Latent Diffusion, and StyleGAN3. The model was tested on 14 SOTA image generation models, achieving the highest accuracy of 99.80%.

Integrating ML Algorithms in Graph Database for Link Prediction

Accepted by "International Conference on Informatics (ICI) 2023" IEEE Conference

VIT, Vellore | September 2022 - January 2023

Authored a paper highlighting the limitations of ML models and basic NLP techniques for link prediction, emphasizing the importance of advanced NLP techniques for more accurate insights and achieving the highest accuracy of 96.76%.
Examined the performance of various ML and NLP algorithms like XGBoost, RandomForest, ANN, and Word2vec embeddings.

Relevant Projects

Implementation of Sparse Attention Optimization

CMU Course Project

Implemented sparse attention mechanisms within the Needle framework, focusing on the "fixed" attention pattern that combines local windowed attention with strided global attention.
Developed both Python-level attention mechanisms with additive masking and custom CUDA kernels for accelerated sparse computation.
Implemented and compared multiple positional embedding mechanisms (Learned, Sinusoidal, and RoPE) to enable models to handle longer sequences beyond their training length.

Agentic LongBench: Evaluating Long-Context Capabilities in Agentic Search

CMU Research Project

Developed a comprehensive evaluation framework for assessing long-context capabilities in agentic search systems.
Designed evaluation protocols across four agentic domains (Search, Code, Reasoning, Tool-use) and integrated three synthetic long-context benchmarks into the framework.
Implemented a Consistency-Accuracy Index (CAI) metric to analyze task correlations and model consistency across different settings and tasks.

RAG Evaluation and Benchmarking

February 2025 - May 2025

Advisor: Dr. Chenyan Xiong

Collaborating with Amazon to design and implement an end-to-end multimodal Retrieval-Augmented Generation (mRAG) pipeline, integrating LLMs with specific image and text retrieval engines for diverse visual question answering datasets, thereby enhancing context accuracy and mitigating hallucinations.

Question Answering (Q&A) System for Pittsburgh and CMU

September 2024 - October 2024

Developed a retrieval-augmented generation (RAG) system using Langchain, FAISS and few-shot prompting to enhance the performance of the Mistral-7B model for factual Q&A about Pittsburgh and CMU achieving 61% F1-score with no fine-tuning.

ChunkedCodeRAG: Enhancing RAG for Code

October 2024 - December 2024

Improved the CodeRAG Bench by using fixed-size and semantic pre-retrieval chunking in the retrieval pipeline, obtaining ~31% increase in the baseline values on the ODEX coding tasks (open domain) dataset.

Movie Recommendation System in Production

March 2025 - May 2025

CMU SEAI Group Project

Built a scalable movie recommendation system using collaborative filtering with matrix factorization (SVD) and user segmentation.
Containerized the inference service with Docker and orchestrated using `docker-compose`; automated model retraining via cronjobs.
Used Prometheus & Grafana for monitoring model accuracy and system health; implemented A/B testing for model evaluation.
Tracked model lineage and versions with Weights & Biases (W&B) to ensure reproducibility and provenance of predictions.

Scalable Twitter Analytics Web Service

Jan 2025 – May 2025

CMU Cloud Computing Team Project

Engineered a cloud-native microservice system to process & serve insights over 1TB of Twitter data using EKS, RDS MySQL, and Spark on Azure.
Achieved 34k RPS under budget constraints via schema optimization, Java Vert.x web-tier, and AWS ARM-based instances with auto-scaled deployment.
Used Helm, Terraform, and GitHub Actions for end-to-end CI/CD; monitored performance via Prometheus, Grafana, and AWS CloudWatch.
Placed 5th in live benchmark test (77.5/80) by fine-tuning query pipelines, deploying sidecar authentication, and optimizing container orchestration.

Agentic Melody Generator with LlamaIndex

April 2025 - May 2025

Independent Research

Built a generative music system using LlamaIndex and a multi-agent framework consisting of Melody, Harmony, Rhythm, Arrangement, and Critique agents.
Each agent independently handled its musical aspect using natural language reasoning, contributing to the progressive refinement of musical pieces.
Implemented inter-agent communication using LLM-based agent orchestration and context-passing via dynamic memory modules in LlamaIndex.
Critique Agent evaluated the musical output and guided iterative improvements based on stylistic coherence and harmonic balance.
Exported generated melodies into MIDI format for playback and further processing with DAWs like Ableton and LMMS.

Older Projects

2023

Farm Laws Impact Analysis Using FCMs

January 2023 - March 2023

Analyzed the impact of Farm Laws in India (2020) on farmers' sentiments using Fuzzy Cognitive Maps (FCMs).
Analyzed crop yield, farmers' profits, and other Agri-sector influences.
Employed Python modules including Numpy, NetworkX, Pandas, and Matplotlib.

Poetry Generation using GPT-2

September 2023 - December 2023

Fine-tuned GPT-2 in order to generate poems based on the user's first line.
Built a simple interface where users can start a poem, and the model finishes it.

2022

Student Grade Prediction Analysis

January 2022 - March 2022

Performed in-depth data analysis on a Student Grade Prediction Dataset to visualize relationships between attributes.
Derived insights for predicting future/existing data trends from extensive visualization and analysis on a dataset of 30 attributes.
Utilized Python libraries like matplotlib, numpy, plotly, seaborn, and WordCloud.

Fake Job Posting Prediction System

July 2022 - October 2022

Developed a system to determine the legitimacy of job postings employing ML and NLP algorithms, achieving high accuracy and F1-score.
Implemented ML models: Logistic Regression, RandomForest, SVC, XGBoost, LSTM, and ANN.
Employed NLP methods: GloVe, BERT, and CountVectorizers.

Data Scientist & Machine Learning Engineer