Coding Projects

A collection of code for various purposes in various languages.
Code for projects without Github links can be provided upon request.

highlightsall bioinformaticsJavaPythonHTML/CSS, JSData Science
Spike Protein Predictor

Ongoing

COVID Evolution Modeler

Predicting COVID evolution based on previous mutation rates, with consideration for flanking residues and previous mutations.

PhD Student Finances

Ongoing

Analysis of Berkeley PhD Students

Investigating and visualizing financial and residential factors that affect Berkeley College of Engineering PhD student retention and yield rates. Image cr: UC Berkeley

Intermediate Mutations

Ongoing

Time-dependent mutation modeler

A Markovian approach to reconstructing intermediate mutations between two given sequences. Considers time factor.

Lab Resource Usage

Analysis of lab pages in CS course

Research on how students in an introductory CS course attend section and utilize resources over the course of a semester, with analysis on performance. Image cr: CS10

Codon Transcript Designer

Translation of proteins to codons

This algorithm predicts the best codons given an input sequence of amino acids. It combines a Monte Carlo and sliding window approach with consideration for CAI score, secondary structure, and free energy values.

Finances and Education

Analysis of Berkeley K-8 schools

Analysis of how factors (income, etc) affect quality of education in K-8 schools in the Berkeley. PCA is used to highlight areas of focus and necessary changes in education. Uses Tableau and Jupyter. Image cr: Berkeley Unified School District

Spam Email Classifier

Logistic regression-based grouping

Utilizes logistic regression and SKLearn to classify emails based on keywords. Employed cross validation techniques to ensure high accuracy (91% on a test set of 1000)

Metabolome Expansion

E. coli metabolite map generation

Incorporation of reaction and chemical data to generate a full map of reachable native chemicals in E. coli. Utilized ChemAxon for molecule standardization.

Protein Alignment Statistics

Amino acid parser

An algorithm that parses and analyzes amino acid alignments from SABmark. Generated statistics on substitutions and indels used for protein ancestral reconstruction.

Personal Website

A personal portfolio project

I self-taught HTML/CSS, JavaScript, and the Bootstrap software in order to create my first website!

Gitlet

A mini git-based version control system

This project mimics some of the features of Git. The following commands were implemented: init, add, commit, rm, log, global-log, find, status, checkout, branch, rm-branch, reset, merge. 

CS10 Website

Course website for CS10

Worked on a new student-oriented website for CS10, aka the Beauty and Joy of Computing, an undergraduate computer science course at UC Berkeley. This website was deployed for Summer 2020. Check it out at the link below!

Tablut

Game-playing AI and custom GUI

Tablut is a recreation of an ancient Nordic strategy board game. The algorithm is capable of playing a fully automated game in 2 minutes. I developed a heuristic that assists the AI in finding a win within 4 moves. The implementation also allows gameplay through either terminal or custom GUI.

Scheme Interpreter

Python-based interpreter for Scheme

An interpreter for a subset of the Scheme language. Terminal-based inputs of Scheme statements were translated into tokens, then run through a Read-Eval-Print loop for accurate evaluation.

Battleship

Two-player board game and AI

A terminal-based implementation of the board game Battleship, where players take turns guessing the positions of the opponent's ships. I also wrote an AI that is capable of making its next guess based on the status of its previous guesses.

Maps

Rating-based visualizer for restaurants

A visualization of Berkeley restaurant ratings using machine learning. Users can view resturaunts tailored to previous preferences (ratings, location). It utilizes a Voronoi diagram.

Movie Genre Classifier

K-nearest neighbor classifier for movies

A kernel estimation algorithm that classifies movies by genre. A secondary classification algorithm predicts the genre based on the frequency of 20 different words in the script.

Cardiovascular Disease Classifier

Case study on cardiovascular disease

This analysis focused on determining the likelihood of poor diet causing heart disease via hypothesis testing. The data analyzed spanned multiple decades.

Fibonacci Analysis

Analysis of Fibonacci implementations

An analysis of the accuracy and runtime of three different implementations of the Fibonacci sequence (recursive, iterative, Binet's formula). Originally written in Jupyter notebook.