Ishita

Driven to make AI safe, reliable, and accessible to all.

Founding Engineer and AI Lead @Storefox.ai

AI Researcher @Hashkraft

Computer Vision Research Intern @Coriolis

Research Intern @Conseil Européen pour la Recherche Nucléaire(CERN), France

Research Intern @DESY

Hyderabad, India

[Phone protected]

[Email protected]

ishita-codes-ai.in/

ishita

GitHub

ish-codes-magic

LeetCode

ishita_in_data

About

I'm an AI engineer with 3+ years of industry experience building robust, scalable, and intelligent systems at the intersection of machine learning, software engineering, and AI safety. My work spans the development of production-ready ML pipelines, large language model (LLM) applications, and scalable cloud infrastructure.

I currently work at Storefox.ai, where I lead the development of audio intelligence systems that convert raw retail conversations into actionable business insights using custom-built pipelines, robust LLM evaluations, and implementing best engineering practices. My contributions have not only driven measurable improvements in accuracy and efficiency, but also shaped product direction through client collaboration and team leadership.

Previously, I built LLM-driven platforms for hiring and chatbots for real estate sector at Hashkraft, developed face recognition systems at scale with Coriolis Technologies, and contributed to scientific ML research at institutes like CERN, DESY, and IISER Pune. I’ve worked across diverse domains—computer vision, NLP, anomaly detection, and particle physics—always focusing on applying AI in meaningful, interpretable, and responsible ways.

I'm also passionate about AI safety and alignment, having participated in the Whitebox Research AI Safety Fellowship, where I explored concepts in interpretability and worked on evaluating the unfaithfulness of chain-of-thought reasoning in LLMs.

My tech stack includes tools like Python, PyTorch, FastAPI, Docker, Kubernetes, and AWS, along with modern AI-focused platforms such as Portkey, Instructor, Claude, and ChatGPT. Whether it’s deploying deep learning models at scale, building RAG pipelines with vector DBs, or improving system reliability with tools like Ansible and Kafka, I enjoy working across the full stack to turn research into impact.

Let's connect and collaborate!

Stack

Experience

Storefox.ai

Led the end-to-end design and implementation of a scalable audio processing pipeline for analyzing customer-representative interactions in retail environments, delivering actionable insights to drive business performance in physical stores.
Developed a robust system where raw audio is captured and cleaned using Silero Voice Activity Detection (VAD) to remove silent segments, followed by transcription via Gemini LLM models optimized for noisy retail settings.
Designed a two-tiered processing architecture: internal clips were filtered and categorized using LLMs, while external clips were further analyzed to generate structured insights tailored to client needs.
Instrumental in the development, testing, and validation of each pipeline component—built custom evaluation datasets and benchmarks to compare model performance, prompts, and filtering strategies.
Achieved a consistent 80% accuracy across all insight-generation tasks, earning positive feedback from over 15 active clients.
Reduced audio processing time by 50% (from 3 minutes to 1.5 minutes) and cut cost per hour of audio from $100 to $20, enabling the introduction of multiple pricing tiers for scalability.
Acted as the technical lead and manager for 4 interns, guiding their workstreams and significantly improving team productivity through structured task delegation and mentoring.
Worked closely with clients to understand domain-specific needs and customized the pipeline accordingly, resulting in high client satisfaction and successful adoption across diverse use cases.
Collaborated cross-functionally with product, frontend, and backend engineering teams to align development with business goals, effectively multitasking across responsibilities and delivering high-impact results.

Azure Cloud
GCP
Speech-to-Text
AWS
AWS SageMaker
Prompt Engineering
LLMs
GPT/Gemini models
LLM evaluations
Instructor
MongoDB
Jinja2
Docker
Redis
Celery
Python
AI/ML
API design
Portkey

Hashkraft

Developed a scalable AI-driven hiring platform that streamlined candidate evaluation, matching, and communication, with an estimated 50% reduction in hiring time through automation and intelligent retrieval.
Leveraged cutting-edge LLMs (GPT-4) alongside Retrieval-Augmented Generation (RAG) pipelines using LangChain, integrating with Chroma vector database to enable real-time semantic candidate-job matching.
Conducted detailed benchmarking and evaluations of different LLM configurations, embeddings, and RAG architectures to optimize cost, latency, and accuracy.
Led the design and deployment of an AI-based real estate chatbot, allowing users to search, query, and retrieve property listings using natural language.
Utilized GPT-4 for conversational context handling and Elasticsearch for fast and accurate property data retrieval across diverse queries.
Designed the entire system architecture, ensuring efficient state management, query parsing, and document retrieval, resulting in highly responsive and accurate user interactions.
Deployed the chatbot to production using AWS services (EC2, S3, Lambda), ensuring scalability, security, and uptime.
Managed and mentored a team of two interns, overseeing their project pipelines, providing code reviews, and ensuring timely deliverables aligned with business goals.
Set up project tracking, documentation workflows, and weekly syncs, improving team collaboration and technical output.
Drove cross-functional collaboration between backend, frontend, and product teams to ensure seamless integration and user experience.

LLMs
GPT/Gemini models
RAG
Langchain
Pinecone
Chroma
NLP
AWS
Elasticsearch
Docker
Scrapy
Jira
GitHub
Whatsapp API
API design
Prompt Engineering
Python
Machine Learning
AI/ML

Coriolis Technologies

Developed an AI based surveillance software with a team of about 5-6 people.
Deployed on scale face recognition algorithms with an accuracy of 75% on Spark.
Optimised object detection on spark increasing processing speed from 16 to 30 frames/s.
Improved the loading time of the UI reducing it from 1-2 minutes to 15 seconds.
Developed a scalable and fault-tolerant system that can process 60 million images per day by deploying it on Kubernetes.
Automated the entire process of setting up our Kubernetes cluster using Ansible, reducing the deployment time from 3 hours to 15 minutes.
Managed a team of 7 full time interns working on 4 different projects, overseeing their onboarding and mentoring, increasing the productivity of the entire team by 13%.

Computer Vision
Face Recognition
Object Detection
Apache Spark
PySpark
Docker
Ansible
Apache Kafka
Elasticsearch
Opensearch
OpenCV
PyTorch
Github
Kubernetes
Ansible
DevOps
Team Management
Python
Machine Learning
AI/ML
System Optimization
UI Development

Conseil européen pour la Recherche Nucléaire (CERN)

Contributed to a joint project between CERN and IISER Pune, focused on synthetic data generation for the Higgs boson decay process to enable more robust statistical analysis in particle physics experiments.
Implemented the RealNVP normalizing flow model to generate synthetic data points from latent space, matching the original distribution of Higgs decay data from particle physics experiments.
After performing statistical analysis on synthetically generated data, we found that the statistical results closely matched the experimental values, confirming the viability of RealNVP models for data augmentation in scientific research.

Particle Physics
Synthetic Data Generation
RealNVP
Normalizing Flows
Generative Modeling
PyTorch
MADGraph
Statistical Analysis
Deep Learning
Python
Scientific Research
Data Augmentation
Machine Learning

Deutsches Elektronen-Synchrotron (DESY)

Developed a predictive anomaly detection algorithm to forecast system failures during data transfers between global university experiments and DESY's data unit.
Established and managed a distributed computing environment using Apache Spark to efficiently process terabytes of data, enabling scalable analysis for anomaly detection.
Applied logistic regression on high-dimensional data, achieving an 85% accuracy in predicting system downtimes.

Anomaly Detection
Predictive Analytics
Apache Spark
Distributed Computing
Logistic Regression
Big Data
Machine Learning
Python
Data Science
High-Dimensional Data

Indian Institute of Science Education and Research (IISER), Pune

Explored and evaluated various clustering algorithms to identify the most effective method for analyzing epigenetic data.
Developed and tested implementation pipelines for algorithms including K-Means, DBSCAN, Hierarchical Clustering, and others using Python and scikit-learn.
Designed sample test cases and applied clustering techniques to biological datasets, comparing results across algorithms for accuracy and reliability.
Assessed clustering quality using metrics such as the elbow method, silhouette score, and intra-cluster distance to determine optimal performance.
Gained experience in unsupervised machine learning, data preprocessing, and applying statistical metrics to real-world scientific datasets.

Clustering Algorithms
K-Means
DBSCAN
Hierarchical Clustering
Unsupervised Learning
Data Preprocessing
Statistical Analysis
Deep Learning
Python
Machine Learning
AI/ML
Data Science
Data Preprocessing
Statistical Analysis

Genmark.ai

Contributed to improving automation of client servicing tasks by streamlining API-based operations via the chatbot interface.
Designed and developed an agentic chatbot capable of understanding complex user queries and executing appropriate actions autonomously.
Integrated the chatbot with Genmark.ai's internal APIs, enabling real-time interaction and dynamic response generation based on API specifications.

Agentic AI
LLMs
Python
Flask
Firebase
GCP
API design
Prompt Engineering

University of Glasgow

Worked on a research-oriented project exploring how neural networks can be used to probe Effective Field Theory (EFT) couplings, inspired by the ATLAS experiment and the paper "Parameterized Machine Learning for High-Energy Physics".
Processed simulated event-level datasets using NumPy and Pandas, followed by exploratory data analysis (EDA) to understand feature distributions and parameter dependencies.
Developed and trained a parameterized neural network that incorporates both event features and physics parameters as inputs, enabling smooth interpolation across different EFT coupling values.
This project strengthened my knowledge of machine learning in high-energy physics, EFT parameterization, and designing neural networks that generalize over a range of theoretical parameters.

Machine Learning
Deep Learning
Python
Data Science
High-Energy Physics
Effective Field Theory
Parameterized Machine Learning
Neural Networks
AI/ML
Data Science
Data Preprocessing

AI Safety⁽²⁾

Selected as one of 20 fellows from a pool of over 1000 applicants for the prestigious AI Safety Fellowship organized by Whitebox Research.

Participated in a rigorous 3-month program focused on foundational and advanced concepts in machine learning and AI alignment, with a strong emphasis on technical understanding and interpretability.
Completed structured coursework and discussions on topics such as Model evaluations, SAEs, AI Interpretability, RLHF and more.
Engaged in weekly mentor-led sessions and peer-group discussions, reinforcing technical concepts through collaborative learning and problem-solving.
Developed a capstone research project in the final phase, investigating unfaithfulness in Chain-of-Thought (CoT) reasoning—analyzing where and how model-generated reasoning deviates from true causal chains.
Gained hands-on experience with theoretical alignment techniques and interpretability tools, sharpening my understanding of how to evaluate and constrain model behavior.

Machine Learning
AI Safety
AI Interpretability
Chain-of-Thought
RLHF
Model Evaluation
Research
Technical Writing

Capstone research project as part of the AI Safety Fellowship (Whitebox Research) focused on evaluating the faithfulness of Chain-of-Thought (CoT) reasoning in large multimodal models.

Studied how multimodal LLMs (Claude 3.7 Sonnet and Gemini 2.0 Flash Experimental) reason over semantically equivalent math problems presented in both text and image modalities using a custom-curated subset of the PutnamBench dataset.
Built a 5-stage end-to-end evaluation pipeline.
Designed a normalized unfaithfulness metric to compare reasoning across problems with varying lengths and complexities.
Found that both models showed comparable reasoning patterns and accuracy across modalities, with very low incidence of fully unfaithful shortcuts.
Identified limitations including compute constraints, token limits, and reliance on LLM-based auto-raters for CoT step evaluation and proposed future directions including expanding the benchmark across different domains and developing a Unfaithful Shortcuts Benchmark for more comprehensive faithfulness testing.

AI Safety Evaluations
Chain-of-Thought
Python
Research
LLMs
Prompt Engineering
Model Evaluation
Model Safety
Chain-of-Thought
Technical Writing

Projects⁽⁷⁾

Built an art analysis application using GPT-4 Vision API to generate comprehensive critiques of artwork across various media types.

Designed a structured analysis pipeline using OpenAI’s GPT-4 Vision, integrated through the OpenAI API for generating image-based art critiques.
Implemented prompt engineering with Jinja2 templating to craft dynamic, customizable prompts guiding GPT output toward detailed and coherent evaluations.
Utilised Pydantic models and the Instructor library to enforce strict type-safe, structured outputs from GPT responses—ensuring reliability and consistency.
Developed a clean, responsive Streamlit UI allowing users to upload artwork, enter their API key, and receive AI-powered evaluations with scoring and interpretation.
Provided detailed output including formal analysis, artistic interpretation, cultural context, constructive feedback, and granular scoring across multiple criteria.
Enhanced my skills in multimodal AI integration, prompt design, structured data handling, and deploying AI services in real-time interactive applications.

Machine Learning
Content-Based Filtering
Data Analysis
Python
Kaggle Dataset
Algorithm Development
Netflix API
Recommendation Engine

A movie and TV series recommendation website that suggests content based on users' previous watched content, saving time and effort in searching for content on Netflix.

Utilizes a content-based recommendation system algorithm
Developed using the Netflix Movies and TV Shows dataset from Kaggle (3.4 MB of data)
Provides personalized recommendations based on viewing history
Saves users 10-15 minutes daily through intelligent content suggestions
Implements efficient algorithms for content matching and user preference analysis

Machine Learning
Content-Based Filtering
Data Analysis
Python
Kaggle Dataset
Algorithm Development
Netflix API
Recommendation Engine

Personal project demonstrating my understanding of attention mechanisms and transformer architecture by recreating the Transformer model ("Attention Is All You Need") from scratch using PyTorch.

Built a modular encoder-decoder architecture with multi-head attention, positional encoding, and feed-forward layers, adhering closely to the original paper.
Designed each component (attention, layer norm, embeddings, etc.) as separate modules for clarity and flexibility.
Implemented a full training pipeline for English-to-French translation using the OPUS Books dataset, with masking, label smoothing, and BLEU score evaluation.
Integrated advanced training techniques such as learning rate scheduling, dropout regularization, and Adam optimizer with weight decay.
Used HuggingFace tokenizers for text preprocessing and sequence handling, ensuring vocabulary management and consistent batching.
Demonstrated ability to work with neural network layers, data loaders, optimizers, and attention mechanism directly in PyTorch.

Python
PyTorch
Computer Vision
Deep Learning
Image Classification
Convolutional Neural Networks
Transfer Learning
Model Training
Model Evaluation

Worked on a real-world data analysis project for a competition organised by Telangana Government in collaboration with Codebasics to extract actionable insights from public datasets.

Performed in-depth analysis across multiple domains including document registration (e-Stamps), transportation, industrial investments (TS-iPASS), and government schemes.
Analysed district-level revenue growth from stamp registration, identifying key contributing districts like Rangareddy and Medchal-Malkajgiri based on proximity to Hyderabad and industrial zone development.
Evaluated transport sales trends across Telangana districts from 2019–2023, identifying growth in electric and petrol vehicle sales and highlighting regional variations.
Assessed sector-wise investments through TS-iPASS data, correlating them with employment, infrastructure, and proximity to special economic zones (SEZs).Identified top 5 districts for commercial property investment based on multi-factor data analysis, including revenue trends, document registration, and industrial activity.
Proposed actionable recommendations for improving e-Stamp adoption, targeting infrastructure investments, and boosting agricultural and industrial growth.
This project enhanced my skills in data analysis, storytelling with data, geospatial and sectoral insights generation, and converting public data into development-focused recommendations.

Python
PyTorch
Computer Vision
Deep Learning
Image Classification
Convolutional Neural Networks
Transfer Learning
Model Training
Model Evaluation

Honors & Awards⁽⁴⁾

Certifications⁽⁵⁾

AI Trailblazer

Issued by: Verix

Issued on: 14.05.2025

Neural Networks and Deep Learning

Issued by: Coursera

Issued on: 03.04.2021

Introduction to TensorFlow for Artificial Intelligence, Machine Learning, and Deep Learning

Issued by: Coursera

Issued on: 31.03.2021

Machine Learning

Issued by: Coursera

Issued on: 25.10.2020

Programming for Everybody (Getting Started with Python)

Issued by: Coursera

Issued on: 23.10.2020

Ishita

Overview

Social Links

LinkedIn

GitHub

LeetCode

About

Stack

Experience

Storefox.ai

Founding Engineer and AI Lead

Hashkraft

Lead AI Researcher

Coriolis Technologies

Computer Vision Research Intern

Conseil européen pour la Recherche Nucléaire (CERN)

Research Assistant

Deutsches Elektronen-Synchrotron (DESY)

Summer Research Intern

Indian Institute of Science Education and Research (IISER), Pune

Research Assistant

Genmark.ai

AI Consultant

University of Glasgow

Research Assistant

AI Safety(2)

AI Safety Fellowship by Whitebox Research

Investigating Unfaithful Shortcuts in Chain-of-Thought Reasoning for Multimodal Inputs

Projects(7)

Art Critique GPT – AI-Powered Artwork Analysis Tool

Netflix Recommendation System

Transformer Model Implementation From Scratch

Data Analysis for Telangana Government (in collaboration with Codebasics)

Honors & Awards(4)

DESY Summer Research Fellowship 2021

NEST Exam 2018

JEE Advanced 2018

JEE Mains 2018

Certifications(5)

AI Trailblazer

Neural Networks and Deep Learning

Introduction to TensorFlow for Artificial Intelligence, Machine Learning, and Deep Learning

Machine Learning

Programming for Everybody (Getting Started with Python)

AI Safety⁽²⁾

Projects⁽⁷⁾

Honors & Awards⁽⁴⁾

Certifications⁽⁵⁾