Open to Opportunities

Data Scientist & ML Engineer
Building Intelligent Solutions

I build data systems that actually work in production. Over 4 years, I've shipped ML models that drive real business decisions - from churn prediction that saved millions to recommendation engines that boosted engagement by 20%. My sweet spot? Taking messy data and turning it into clean, scalable pipelines that teams can actually use.

4+
Years Experience
10+
Technical Projects
95%
System Reliability
Prudhvi Marpina

About Me

Prudhvi Marpina

I build data systems that drive business decisions. 4+ years of experience delivering end-to-end solutions across Python, React, cloud platforms, and ML frameworks. I turn complex problems into scalable systems that serve thousands of users.

Full-stack technical leader: from data pipelines processing millions of records to ML models in production, from React dashboards to cloud architecture on AWS/GCP/Azure. I speak both technical and business languages fluently.

Proven results: 25% faster reporting, 30% fewer failures, 12% engagement boost, 95% system reliability. I deliver measurable impact across data science, software engineering, and product roles.

Professional Journey

Software Engineer (Data Focus)

Aug 2025-Present

One Community | California, USA

Social Media Analytics
Time Series Forecasting
A/B Testing
25%
Time Reduction
5,000+
MongoDB Entries
12%
CTR Improvement
1,000+
Users Served

Developed a React frontend with Python API integration using OpenAI GPT to automate reporting summaries, reducing manual reporting time by 25% for 1,000+ users.

Created interactive Python dashboards using Plotly/Dash to visualize 5,000+ MongoDB time entries, providing managers with insights into keyword frequency and volunteer participation trends. Implemented A/B testing on automated post scheduling pipelines, achieving a 12% CTR improvement and successfully guiding the rollout of cron-based posting strategies.

React Python OpenAI GPT Plotly/Dash MongoDB A/B Testing

Data Scientist

Jun 2021-Jul 2023

Tech Friday | Hyderabad, India

ML Model Development
Data Engineering
Cloud Architecture
95%
Pipeline Reliability
25%
Faster Processing
20%
Engagement Boost
30%
Fewer Failures

Architected and implemented serverless data ingestion pipelines using Python, AWS Lambda, and S3 to capture transaction data from multiple APIs and databases, achieving over 95% reliability. Developed PySpark workflows to process, clean, and aggregate millions of transactions into feature tables in BigQuery, reducing data preparation time by 25%.

Deployed machine learning models including logistic regression for churn prediction and k-means clustering for customer segmentation as REST APIs, enabling real-time predictions. Established automated model retraining pipelines using Apache Airflow with comprehensive DAGs for feature refresh and model updates, implementing robust job monitoring that reduced pipeline failures by 30%.

Python AWS Lambda PySpark BigQuery Scikit-learn Apache Airflow REST APIs

Data Analyst

Jun 2019-May 2021

Ahlada Engineers Ltd | Hyderabad, India

Data Analysis
Dashboard Development
Process Optimization
15%
Faster Collections
60%
Less Reporting Time
15%
Better Fulfillment
100%
Data-Driven Culture

Optimized ERP data extraction processes by developing Python scripts using Pandas and SQL workflows integrated with Tally ERP, reducing monthly report preparation time by 30%. Implemented automated reporting solutions for receivables and delivery tracking using scheduled Python jobs, decreasing manual processing effort by 40%.

Designed and developed comprehensive Tableau dashboards connected to SQL databases to monitor plant output, material usage, and order fulfillment metrics, reducing reporting cycles from 5 days to 2 days. Collaborated with supply chain and finance teams to validate ERP data integrity, ensuring consistency and accuracy across 10+ key performance indicators used in monthly business reviews.

Python Pandas SQL Tableau Tally ERP Data Validation

Technical Expertise

Programming & Software Engineering

Languages
Python, TypeScript, React, Node.js, Java, C++, SQL, R
Frameworks & APIs
FastAPI, REST APIs, GitHub, GitHub Actions, CI/CD, Plotly/Dash
Core Concepts
Data Structures, Algorithms, OOP, System Design, Microservices
Data Libraries
Pandas, NumPy, Scipy, Matplotlib, Seaborn, Scikit-learn

Cloud & Distributed Systems

Cloud Platforms
AWS (Lambda, S3, DynamoDB, ECS, Redshift, SageMaker), GCP (BigQuery), Azure
Big Data & Processing
Apache Spark, PySpark, Kafka, Airflow, Databricks, Snowflake, Hadoop, Hive
Databases & Storage
MongoDB, BigQuery, Redis, ChromaDB, PostgreSQL, ETL/ELT
DevOps & Infrastructure
Docker, Kubernetes, Terraform, CI/CD, MLflow, Git, GitHub

Machine Learning & AI

ML Frameworks
PyTorch, TensorFlow, Scikit-learn, XGBoost, Hyperparameter Tuning, MLflow
NLP & LLMs
spaCy, Hugging Face, Large Language Models, RAG, Generative AI
Model Deployment
Model Deployment, Recommendation & Ranking Systems, Model Registry
Data Science
Statistics, Statistical Modeling, EDA, Feature Engineering, A/B Testing

Data Science & Analytics

Analytics & Visualization
Statistics, Statistical Modeling, EDA, Feature Engineering, A/B Testing, Causal Inference
Tools & Libraries
Pandas, NumPy, Matplotlib, Tableau, Power BI, Time Series Forecasting
Business Skills
Metrics Design, Experimentation, Stakeholder Management, ROI Analysis
Domain Expertise
Customer Analytics, Fraud Detection, Supply Chain, Time Series, Recommendation Systems

Technical Projects

Agentic BI: Conversational Data Analysis

Featured
Agentic BI Architecture - LangChain Agents with Mistral AI

Developed a conversational business intelligence system using LangChain agents and Mistral AI. The system automatically converts natural language queries to SQL, generates interactive Plotly visualizations, and provides AI-powered insights. Built with Streamlit for seamless user experience.

100%
Automated
3 Agents
Specialized
LangChain Mistral AI Streamlit Plotly SQLite
View on GitHub

UTDIGI - University Chatbot

UTDIGI University Chatbot - RAG System with LangChain

Built a RAG-powered chatbot using LangChain and Sentence Transformers. Implemented web scraping with Scrapy and Playwright to extract university content. Created FastAPI backend with Redis caching and Streamlit frontend for real-time conversations.

3,000+
Documents
25%
Relevance Boost
LangChain FastAPI ChromaDB Streamlit Scrapy Redis
View on GitHub

Market Optimization for Gardein

Market Optimization for Gardein - ML-Driven Business Analytics

Conducted comprehensive market analysis using Scikit-learn clustering and regression algorithms. Analyzed 3 years of sales data to identify $10M+ revenue opportunities. Created interactive visualizations with Pandas, Matplotlib, and Seaborn for regional market segmentation.

$10M+
Revenue Impact
25%
Sales Potential
Scikit-learn Pandas Matplotlib Seaborn Clustering Regression

TrustGuard ML: Blockchain Security

TrustGuard ML - Blockchain Fraud Detection System

Developed a fraud detection system for blockchain transactions using Scikit-learn algorithms. Processed 78,600+ metaverse transactions with Pandas data preprocessing. Implemented GridSearchCV for hyperparameter tuning and achieved 96% accuracy in fraud detection.

96%
Accuracy
20%
Security Boost
Scikit-learn GridSearchCV Pandas NumPy Regression Clustering
View on GitHub

EduBot: AI Knowledge Companion

EduBot AI Knowledge Companion - BART and spaCy NLP System

Built an NLP system using BART and spaCy for knowledge extraction. Integrated structured SPARQL queries with Wikipedia API data. Implemented text summarization and entity recognition, achieving 95% accuracy with 50% faster response times.

95%
Accuracy
50%
Faster Responses
BART spaCy NLTK Streamlit SPARQL Wikipedia API
View on GitHub

Sentiment Analysis: Israel-Palestine Conflict

Sentiment Analysis on Reddit Data - VADER and NLP Analysis

Conducted comprehensive sentiment analysis on Reddit discussions about the Israel-Palestine conflict using VADER sentiment analysis. Collected data using PRAW API, performed text preprocessing, and created visualizations to track sentiment trends over time.

VADER
Sentiment Analysis
Reddit API
Data Collection
Python PRAW VADER Pandas Matplotlib NLP Text Mining
View on GitHub

Education

Master's in Data Science and AI

The University of Texas at Dallas

GPA: 3.97 / 4.0

Bachelor's in Electrical and Electronics Engineering

Birla Institute of Technology and Science, Pilani

Awards & Recognition

Lars Magnus Ericsson Fellowship in Management

Prestigious scholarship awarded by The University of Texas at Dallas recognizing excellence in management and leadership within the technology sector. This fellowship highlights exceptional potential for driving innovation and organizational success in technology companies.

Certifications

Applied LLM and AI Agent Engineering

Gained expertise in advanced AI, specializing in LLMs, RAG, and Agentic AI. Built solutions using Google Gemini API and LangChain for robust RAG with embeddings and vector stores. Skilled in creating AI agents with function calling, LangGraph, prompt engineering, fine-tuning, and guardrails for reliable results.

LLMs RAG LangChain Agentic AI Prompt Engineering

AI Data Engineering Specialization

Gained hands-on experience in the data engineering lifecycle including ingestion, storage, transformation, and serving using Apache Spark, Airflow, Kafka, dbt, and AWS to build scalable data systems.

Apache Spark Airflow Kafka dbt AWS

SnowPro Associate: Platform Certification

Demonstrated expertise in Snowflake Data Cloud architecture, data loading, performance optimization, security, and governance across the platform.

Snowflake Data Cloud Performance Optimization Data Governance

Clubs & Organizations

Envision

Data Visualization Club, UT Dallas

Participated in data hackathons, visualization challenges, and technical events focused on Power BI, Tableau, and storytelling with data.

AWS Club

UT Dallas

Participated in cloud hackathons, workshops, and certification events focused on AWS technologies and career development.

Nirmaan

BITS Pilani Alumni NGO

Contributed to education and community development initiatives as part of a BITS Pilani alumni led NGO focused on inclusive and sustainable social impact.

Get in Touch

I'm always interested in discussing new opportunities in data science and machine learning. Let's connect!