Projects
Small Business Loans Default Risk
Used sample data of almost 900,000 observations with 27 variables to predict the likelihood that a potential loan recipient would default on their loan.
Implemented logistic regression, stochastic gradient descent, and extra-randomized trees to find the best classification model.
Used cross-validation and L-1 regularization (LASSO) to increase confidence in the predictive ability of the model and prevent over-fitting.
Achieved an accuracy of 91.87% with the Extra Trees model.
Below are the results of my logit model for a 5-observation sample.
The middle column shows the model's assigned probability of a small business not defaulting.
Inversely, the right column shows the model's predicted probability that the small business will default on their loan.
COVID-19’s Impact on Residential Real Estate
Researched the effect that the COVID-19 pandemic had on home prices in 2020.
Collected and processed data on several variables that were used in my regression analysis and data visualizations.
Wrote a formal research paper and gave a 15-minute presentation about my findings.
Below are two visuals that I created for my presentation
The left plot shows the relationship between how a state handled the pandemic (state rank) and the state's population density. I used this plot and several characteristics of a simple regression on this data to justify that a state's population density is largely independent of it's rank.
On the right, the five "best" and five "worst" states are shown with their respective death coefficient.
I concluded that home prices increased more in states where the pandemic was less severe compared to states where the pandemic was harsher.
From a theoretical perspective, I believe this phenomenon is the result of people migrating to safer states, creating an increase in demand. Simultaneously, as people fled states that were hit hard by the pandemic, there was an increase of homes for sale, depressing residential real estate prices in these areas.
Optimal Production Allocation
Determined the optimal allocation of production for a hypothetical firm.
Created a linear program to find the maximum revenue given a set of constraints
Transformed the primal model into it's dual version and confirmed that both models give the same optimal revenue.
Below are two plots demonstrating the constraints and indicating the optimal solution.
The left graph is the primal version of the problem, while the right graph is the dual version.
Balance of Payments Analysis
Gathered data from the IMF to explore the relationship between Lebanon’s trade balance and exchange rate, and their role in the country’s current banking and sovereign debt crises.
Job Quits Forecast
Created a 1-year forecast of the number of people who quit their job each month using data from the Job Openings and Labor Turnover Survey.
Incorporated additional regressors and auto-regressive terms to best capture the trend, seasonal, and cyclical components of the time-series.
Used Akaike Information Criterion to select the best forecasting model.
Below is a plot of the number of people who quit their jobs in a given month (blue), and the fitted values of the optimal model (red).
Signal Smoothing
Created a program that when given a time-continuous, discrete-amplitude series, outputs a smoothed signal based on a user selected regularization factor.
Implemented a visualization tool that can plot the original series and several smoothed-versions.
Below is an example of the program's output.
The blue line is the original signal.
The other curves are the smoothed versions, to different degrees of smoothness based on the size of lambda.
Multi-Threaded Web Server
Modified a basic web server to allow multiple consumer threads capable of handling HTTPS requests.
Implemented condition variables to ensure mutual exclusion and a proper producer-consumer thread relationship.
Created a companion application that would routinely display various statistics about the consumer threads at a certain time interval.
COVID-19 State Clusters
Used COVID-19 deaths-per-capita in each US state to calculate 5 parameters for each state.
Employed these parameters to cluster states using hierarchical clustering with single and complete linkages, as well as k-means clustering, into a given number of clusters.
Breast Cancer Decision Tree
Utilized data on breast tumor characteristics to train a binary decision tree to predict if a tumor is malignant or benign with 93% accuracy.
Pruned the tree to a maximum depth of 6 and achieved an accuracy of 93%.
Carbon Footprint Calculator
Developed a Java application that allows the user to determine how much carbon they emitted on any given drive based on the make, model, and year of their car by extracting and cleaning data from the Bureau of Motor Vehicles.