Data Cleaning

No missing values (na)

Distribution of hoses with and without basic amenities (hotwater/airconditioning/basement)

Plot

Distribution of the variable seesm to be normally distribution. I will be using this as the dependent variable in my regression analysis

Distribution of housing prices given their number of parking slots


Data Processing

Dummy variables

Correlation Heatmap


Data Modelling OLS

I have excluded area because i believe it is highly correlated with all other amenities. Adding it to the regression model might cause multicollinearities

Results suggests that all my variables except for the intercept are significant at the 95% level. R^2 suggests that the model explains 50% of the data variation.

Variables used:

Residuals seems to be scattered randomly when fitted value increases. No signs of heteroskedasticity

Prediction

Parameters

Alternative for single linear regression and plots

Price ~ Area (OLS)