Data Mining: Student Projects

Project 3

  1. Liu: Is this product Top selling or not and predict the demand.

  2. Cairo: Income prediction based on census data (above 50K or not.

  3. Zoe: Beer recommendation

  4. Juhan Choi: Box Office
  5. Amin: Spotify Music Popularity Regressor

  6. Fuxian Li: Movie Genre Prediction from a plot description

  7. Xuan Wang: Image Recognition

  8. Dan Grennel County Primary Election Votes prediction for a democratic candidate (Clinton) code at github.

  9. Bohyun: CKD: Chronic Kidney Disease Prediction ***

  10. Vishal: Fruit image clustering app

  11. Moti: Irony Detection from tweets ***

  12. Steve Riesberger: Fentanyl Early Alert ***

  13. Steven Hanna: Predict political party from tweets

  14. Paul Thomson:
  15. HongTu Yan: Stock overreaction prediction (NO python code implemented?)
    stock overreaction info

  16. Enis: Train audioset model
    VGG inference: 2qRdw7Nh8qd2DZT_yTkaUz3liNK1
    Audioset model inference: drive/1ILwZH5NqtM-wVNyBDBSwpxag-6-AKabp

  17. Movie Genre Prediction models - Jason Fushian Li
    Comments: need performance info - accuracy for each model

Project 2

  1. Hongtu Yan
    Web App: Credit default prediction

  2. Bohyun
    Webapp: : bohyun/dm2019

  3. Moti: (Please make sure your web app works)
    Web App;: Predicting purchase

  4. Wang Xuan:
    Web App: Predict the sentiment of Yelp Review

  5. Vishal:
    Web App: Image object classifier

  6. Linmei Lin
    Predicting the cryptocurrency price.

  7. Steve Hanna
    Web app: Rain Predication
    Model building: Predictiong the rain tomorrow in Australia.

  8. Lu Liu
    Predict if a construction job requires self-professional certification or not?

  9. Amin:
    Dataset: Spotify music data
    Web App : Predict popularity of music.

  10. Mark Perlman:
    web app: Titanic survival classification

  11. Zoe
    Kaggle’s Students’ Academic Performance Dataset
    Webapp: Can we predict and classify a student's academic performance as either performing high, medium or low?
    Features broken up into three categories: demographic features, academic background features, and behavioral features

  12. Cairo
    dataset: UCLA Application information from at least 2009400 x 9 data points
    Webapp : whether a student will be accepted into a graduate program

  13. Steve Risenberg
    Dataset: what is cooking kaggle dataset
    Webapp: classify a recipe into one of 20 cuisines based on the ingredients of the recipe. The winner of the competition had an accuracy of about 82%, so my goal will be to see if I can get to 80%.

  14. Fushian Li
    Web app: Movie Genre prediction

  15. Juhan Choi
    Predict customer transaction or not.

  16. Enis:
    Webapp: building the news classifier for incendiary content and its website.

  17. Daniel Grenell
    Web app: Predicting Snow day.
    a probabilistic classifier for snow days in New York City
    Snow day data: Example of using wunderground API: Weather data:

Project 1

  1. Enis Berk: Short speech commands classification
    Kaggel competition dataset of 1 sec short command speech files (.wav).
    CNN model using tensorflow to classify sound into short command speech.

  2. Xuan Wang: NYC Government Employee Payroll data analysis
    Citywide payroll data set
    payroll analyses (high avearge payment, income by department)

  3. Fuxian Li: Crime Distribution in Boston
    crime data of Boston

  4. Juhan Cho - Housing price prediction
    dataset: Kaggel housing price dtaset

  5. Daniel Amin: Indie Music analysis
    UCI FMA a dataset for music Analysis data set
    Predict the sucess of a song

  6. Daniel Grenell: 311 Complaints data analysis in NYC
    311 Dataset of NYC in 2019?
    How have the response times to 311 complaints changed over time for different parts of the city?
    complaint types?

  7. Lu Liu , pdf
    NYC OpenData - DOB Job Application Filings dataset
    Webapp Building Construction Job Trends and Prediction for Real Estate Investment. -- recommend new investment on properties that llow construction projects.

  8. Steve Riesenberg: SHSAT testing schools
    Selective High School Admissions Test data
    identifying which schools have the most opportunity for increasing the number of low income students taking the SHSAT

  9. Vishal Bharti: NY Collision Analyses
    NYPD Collisions dataset.
    Discovering trends in motor accidents based on location, vehicle types, etc.

  10. Bohyun Lee: HPM data analysis
    human proteome map data - intensity profiles for hungdres of proteins (from Spectrometry)- humanprooteomemaporg data, paper in pubmed
    Identify protein expression patterns for each body sites and cell types .

  11. Linmei Huang: Cryptocurrency Price Development
    cryptocurrencies historical daily transactions data from website "" and other websites
    The price development of cryptocurrencies and volatility

  12. Mark Perelman: Shoes data analysis
    Women's shoe data
    Trends of prices

  13. Paul Thompson : Clinical Data analysis
    Clinical Research Trends

  14. Cairo Thompson: Loan application analysis
    Loan Applications and then Predictions

  15. Zoe Markovits: Twitter Sentiment analysis
    Twitter data on US airlines classified by sentiment.
    Working with sentiment analysis, and clustering algorithms to see if a natural pattern appears

  16. Hongtu Yan: Credit card analysis
    a very large data set of credt card
    Credit card default prediction.

  17. Angel Vizzuraga: Crime in NYC
    NYC crime data

  18. Moti Mounesan
    Customer purchase prediction
    Kaggle competition dataset

  19. Steven Hanna
    Data - Cervical Cancer dataset
    Create an accurate predictor of cervical cancer.
    exercise: Titanic Survival prediction