Course: Data Mining (CSCI 57300)
Mentor: Prof. Mohammad Hasan
Engineered sequential key logs data from 11 to 146 features to capture its hidden sequential nature. Reduced dimensionality of data using PCA and further optimized SVM and LightGBM models using Bayesian hyperoptimization to predict essay scores.
Designed 3-layered multi-layer perceptron in PyTorch tuned hyperparameters using RandomizedSearchCV.
Achieved 35% boost on performance (from Baseline RMSE) through sequential data retention and transfer learning on transformers.
Course: Statistical Machine Learning (CSCI 57800)
Mentor: Prof. Hyeju Jang
Check whether the following list of words are used as a metaphor or literal, on training over the skewed dataset
Road
Light
Boat
Candle
Spice
Train
Ride
Got familiar with the NLTK, Spacy, and other word tokenization techniques
Used several machine learning models used in the course and techniques such as bagging and boosting y
Further, we used LSTM and BERT models to check the highest performance we can achieve using the Deep Learning Techniques
Access full project report here.
Course: Data Visualization (CSCI 55200)
Mentor: Prof. Shiaofen Fang
Visualized various datasets in unique forms as presented on the slides and the website, The goal was to effectively how represent complex data in the most interpretable form using the classic visualization library D3.js
Dashboard: Tableau_DataViz_ClimateChange
[ Project Slides | Project Report | Project Website | Github Link ]
Created different Visualization from the Iris dataset, the goal is to differentiate visually the three types of iris flowers: Setosa, Virginica, and Versicolor.
Input Dataset → iris.csv
Output Visualizations
Multibar Grouped Chart
Scatter Plot Matrix
Correlation Matrix Visualization
Histograms
Boxplots
Scatterplot Interactive
[ Project Slides | Github Link ]
From the given adjacency matrix created the below visualization.
Manually draw a node-link graph from the given adjacency matrix
Force Directed Layout Visualization using D3.js
Arc Diagram Visualization using D3.js
Radial Network Diagram Visualization using D3.js
[ Project Slides | Github Link ]
Coded the 3 VTK formats for data visualization
The CT Head in Volume16 data format (attached).
An Iron Protein dataset in vtk format (attached).
A 100^3 sampling data set of the quadratic function
[ Project Slides | Github Link ]
Course: Deep Learning (CS 59000)
Mentor: Prof. Mohammad Al Hasan
Detecting Crop Rows is significant for agricultural applications as it can assist in automated field monitoring, precision farming, and yield estimation.
Developed deep learning models that can accurately classify images as containing crop rows or not and where are they, using architectures UNet and LinkNet [ Project Report | Github Link | Kaggle Competition Link ]
Course: Applied Bayesian Statistics and Decision Theory (STAT 52900)
Mentor: Prof. Ben Boukai
Implemented various state-of-the-art models along with baseline logistic classification model on the dataset of car insurance claim decisions along with details of driver and car to compare it with the bayesian way of classification models
Modeled a variety of bayesian models with an informative and noninformative choice of priors and outperformed the baseline and state of art models in terms of accuracy and flexibility of interpretation.
Skills Gained: Bayesian Data Analysis, Hierarchical Modelling, MCMC Diagnostics, Classification Models ‐ Logistic, Bernoulli Logistic
[Project Slides | Project Report | Github Link | Medium Link ]