ABOUT

Seoul Apartment Price

src="nounproject"

Seoul's apartment price now averages $1 million.

Soaring house prices in Korea is a fatal problem. Aiming to visualize the data, predict the price rate at a certain time, we hoped to provide a clue to the clueless yong people who are saving money.

Price Prediction

There exists a vast amount of data provided from the ministry of Korea to analyze apartment prices of Korea. Whilst the prices are fluctuating but definitely skyrocketing, it is important to collect the valid data, and enact prediction for us to get the littlest grasp of the future prices. We aimed to add data visualization for a general explication on the trend, and future predictions.

Website Deployment

Numerous machine learning projects end as a project. We devised the algorithm and back-end developed it to a website, deploying the framework Django.



The 'provided data' from Dacon Tournament (from 2008 to 2017.11) was collected. Then, we had to further collect the data those from 2017.12 ~ 2012.12 from the Korean National Price Open System, through crawling. The columns used are ‘dong’, ‘jibun’, ‘Apt_name’, ‘use_area(m2)’, ‘transaction_year’, ‘transaction_month’, ‘date(1~10)’, ‘date(11~20)’, ‘date(21~)’, ‘floor’, ‘year_found’.

Raw Data Aparment Price data from Dacon 2008~2017, remaining four years from Korea Traffic Price Open System)
Data Analysis
Data Visualization
Prediction Modeling
Website Framework Construction
Back-end development
Front-end development

Data Processing

We then processed the data according to the date (early, middle, late) through one-hot encoding. We replaced the outlier and anomaly to dummy data. Converting the data to Object type, we label encoded the apartment name and dong (town name) and finally checked the irregularity in correlation. Following was the sorted data:

Machine Learning

Finding the right model was as follows. Rather than having random extraction as the train data, we manually selected data from fairly even distribution on the year, month and day data, and used the LignGBM model for the algorithmic modeling. Compared to XGBoost, Light GBM was deemed more efficient in terms of memory consumption as well.

LighGBM XGBoost
Histogram-based decision algorithm Sorted-based decision algorithm
=> Efficient in memory consumption
=> Faster training speed

Modeling

Setting the train data at 90% and test data at 10% with total amount of data (985,094), validation scoring was conducted. We analyzed that score metrics of RMSE was 0.09. Based on the python generated models and processed data, we aimed to deploy the Django Framework as the core framework to the prediction website. The website construction plan is as follows:

INPUT DATA
Price prediction launched
Visualization activated
main page loaded
prediction modeling activated
Website responds.

Deploying Django

Devised each 'town' to be opened (gu) and correlating street address to match (dong) in views.py. Formulated the data in views.py, with the parameters of the created model. Finally the website input and output service code is requested by post and returned in each command.
The predicted price, according to the input 'town', 'street', and 'apartment' will be visualized along with the final price, and the nearby parks and facilities on a map.

Data Visualization

Alongside the prediction, general data representation on seoul's fluctuating prices according to the selected variables are showcased in the website. Responsive visualization is conducted upon the users' requests.



Demonstration

The live-functioning website is deployed with pythonanywhere.com considering our model heavily relies on the python framework.
Nonetheless, since the embedded data is enormous and the LightGBM model takes time to produce the result, the website may load for a while, so if in hurry, here is the video of the final output.



⇽Previous Project
Sentiment Anomaly