You may click on a project's title to see its github repository.
It is always a huge hassle to decide when looking for a used car. There are so many options to consider such as owner reviews, common technical issues, and local price of a car. Moreover, potential buyers might end up comparing it with other years or conditions of the same car as well as its price in neighbors for a better informed decision.
This project aims to guide people in their search for a used car. It brings the following for its users:
Data:
Model:
Among many estimators, a GroupbyEstimator that uses different Random Forest Regressors for each manufacturer group is adopted. It yields an R2 value of %98.75 (MAE: 453), while it is 92% (MAE: 941) on the test set. 3 different predictors (Linear Regression, Ridge Regression, and Random Forest Regressor) are used in 3 three different settings (A GroupbyEstiamtor for 'state' and 'manufacturer each, and one model for the entire dataset. The best performing model out of the 9 is adopted.
Candlestick chart is the most common way to observe the historical prices of a financial asset. Unlike line charts, by looking at a candlestick, one can identify an asset’s opening and closing prices, highs and lows, and overall range for a specific time frame. Despite the Efficient Market Hypothesis, this project uses the history of candle stick formation of the daily prices of a stock to predict whether it is a good time for a day-trade.
It would be unfair to expect shocking results from such a simple analysis, however, given that the data is freely available, this is a nice project to exersice reading big data from directory while training a deep network model. Moreover, it is possible to incorporate GDELT data to this model for a more sophisticated version.
Data:
Model:
Utilized Keras to train a Convolutional Neural Network (CNN) with apprx. 4 GB image data in batches. Since it is not possilbe to load all the data to the memory, the model is trained by reading from the directory in batches. As expected, a CNN model is uninformative when fed in with candle stick images since they do not contain systematic local patterns that can be used for prediction.
This particular challenge is perfect for data scientists looking to get started with Natural Language Processing. This project builds multiple machine learning models that predict which Tweets are about real disasters and which ones aren’t, using different cleaning processes, vectorization methods, and predictors. The competition description is summarized below. You may visit kaggle for more details about it.
Twitter has become an important communication channel in times of emergency. The ubiquitousness of smartphones enables people to announce an emergency they’re observing in real-time. Because of this, more agencies are interested in programatically monitoring Twitter (i.e. disaster relief organizations and news agencies). But, it’s not always clear whether a person’s words are actually announcing a disaster.
Data:
Model:
There are 12 different models that use Stemming or Lemmatizing during data cleaning process; Count, N-gram, or TfIdf vectorization methods; and finally Gradient Boost or Random Forest Classifier as a predictor. Tuning the hyper-parameters and comparing the model, I find that the best model is the one that uses Porter stemming in data cleaning, Count vectorization, and a Random Forest Classifier as a predcitor, which yields 79.3% accuracy, 85.8% precision and 62% recall.
As a part of the Udacity's AWS ML Foundations Course's in class exercises, I created a Python package that has General, Gaussian, and Binomial Distribution Classes and uses object oriented programming to create a Python package.
Bring back the childhood memories!.. This project programs our childhood's game, Snake, in Python using PyGame package.
We model the competition between a proprietary firm and an open source rival, by incorporating the nature of the GPL, investment opportunities by the proprietary firm, user-developers who can invest in the open source development, and a ladder type technology. We use a two-period dynamic mixed duopoly model, in which a profit-maximizing proprietary firm competes with a rival, the open source firm, which prices the product at zero, with the quality levels determining their relative positions over time. We analyze how the existence of open source firm affects the investment and the pricing behavior of the proprietary firm. We also study the welfare implications of the existence of the open source rival. We find that, under some conditions, the existence of an open source rival may decrease the total welfare.
The design of TIQ Flash ADC requires extensive amount of data. This work introduces a new tool that decreases the amount of data needs to be created. The proposed tool utilizes supervised learning to approximate required data, and returns a set of comparators for ADC design. It is observed that the approximation error is a few hundreds of μV for resulting set. Moreover, it allows the designer to predict the power, speed, and precision values of a potential design.
In this paper, I analyze the best information disclosure policy that an auctioneer can adopt according to different performance measures, namely players' payoffs, prize allocation efficiency, and aggregate effort. The significant feature of the analysis is that players have the ability to choose the distribution from which their own types are drawn. Using a two-player all-pay auction with two types setting, I show that the optimal disclosure policy depends on the ratio of the value of winning for a low type to the value of winning for a high type.
This paper analayzes a two-player two-stage asymmetric all-pay auction where the players choose a distribution over a common valuation set in the first stage, which then determines their valuation of winning in the auction stage. After observing their opponent's choice of distribution and their realized valuations, players play an all-pay auction in the second stage of the game. The equilibrium outcome of the game is characterized and in this outcome we show that one player assigns probability 1 to the highest type whereas the other player shares his probability equally between the highest and the lowest type in the support.