About Myself

I am a new-grad from Duke and a problem solver in Data Science. I live in Los Angeles, and specializes in exploring insights from data. Related Projects can be found on my github.

Statistics
Python
Machine Learning
R
Predictive Modeling
SQL
Data Mining
Hadoop
Recent Posts
Projects

Since K-means algorithm has a critical problems that it might be blocked locally based on the initial random chosen centers, many related algorithn, such as K-means++, has been developed to solve this problem. This project reproducted Scalable K-means++ algorithm in Bahmani et al. paper, optimized their performance with Cython, multiprocessing, pySpark, and compared misclassification rate, clustering cost, and runtime performance on four large datasets. >> More Details

In this project, we use Gaussian Hidden Markov Model to catch the stock price pattern and forecast the stock price of seven high-tech companies, and compared it to traditional time series models. Instead of assuming independent observation, we consider Hidden Markov Model with correlated observations, by adding closing price difference. The four latent statuses in our HMM models can basically catch the increasing and decreasing pattern, and under some simple investment strategy, HMM model has a more stable and better pay-offs than traditional time series models.

.

.D

Contact

juliaxu0304@gmail.com
Shoot me an email!