Data Scientist / Bayesian / Traveller
I am a new-grad from Duke and a problem solver in Data Science. I live in Los Angeles, and specializes in exploring insights from data. Related Projects can be found on my github.
Summary of Course on Coursera and some useful Tips.
Review Ensemble Methods - Stacking and several papers, introduce my process of using stacking in Zillow Price Competition, and summarize some tips whiling doing stacking
Combine and track some interesting blogs/resource/projects in this post.
Since K-means algorithm has a critical problems that it might be blocked locally based on the initial random chosen centers, many related algorithn, such as K-means++, has been developed to solve this problem. This project reproducted Scalable K-means++ algorithm in Bahmani et al. paper, optimized their performance with Cython, multiprocessing, pySpark, and compared misclassification rate, clustering cost, and runtime performance on four large datasets. >> More Details
In this project, we use Gaussian Hidden Markov Model to catch the stock price pattern and forecast the stock price of seven high-tech companies, and compared it to traditional time series models. Instead of assuming independent observation, we consider Hidden Markov Model with correlated observations, by adding closing price difference. The four latent statuses in our HMM models can basically catch the increasing and decreasing pattern, and under some simple investment strategy, HMM model has a more stable and better pay-offs than traditional time series models.
.
.D