Data Analysis Project: House Sales in King County, USA

For this assignment, I used python to analyze and predict housing prices using attributes and features such as square footage, number of bedrooms, number of floors and so on.

Here is the dataset: House Sales in King County, USA

This report provides details of our attempt to predict house prices using various regression methods. I will make use of data of house prices in King County, Washington State, USA for sales in 2014 and 2015.

Our main, predictive question is:“Is it possible to predict the sale price of a house from information about that house such as the size, number of bedrooms, condition etc?”

The Dataset

The dataset contains house sale prices for King County, which includes Seattle. It includes home sold between May 2014 and May 2015

id :a notation for a house

date: Date house was sold

price: Price is prediction target

bedrooms: Number of Bedrooms/House

bathrooms: Number of bathrooms/bedrooms

sqft_living: square footage of the home

sqft_lot: square footage of the lot

floors :Total floors (levels) in house

waterfront: House which has a view to a waterfront

view: Has been viewed

condition: :How good the condition is Overall

grade: overall grade given to the housing unit, based on King County grading system

sqft_above :square footage of house apart from basement

sqft_basement: square footage of the basement

yr_built: Built Year

yr_renovated :Year when house was renovated

zipcode:zip code

lat: Latitude coordinate

long: Longitude coordinate

sqft_living15: Living room area in 2015 (implies-- some renovations) This might or might not have affected the lotsize area

sqft_lot15 : lotSize area in 2015 (implies-- some renovations)

The Questions

Question 1: Display the data types of each column using the attribute dtype, then take a screenshot and submit it, include your code in the image.

image

Question 2: Drop the columns "id" and "Unnamed: 0" from axis 1 using the method drop(), then use the method describe() to obtain a statistical summary of the data.

image

Question 3: use the method value_counts to count the number of houses with unique floor values, use the method .to_frame() to convert it to a dataframe.

image

Question 4: use the function boxplot in the seaborn library to produce a plot that can be used to determine whether houses with a waterfront view or without a waterfront view have more price outliers.

image

Question 5: Use the function regplot in the seaborn library to determine if the feature sqft_above is negatively or positively correlated with price

image

Question 6: Fit a linear regression model to predict the price using the feature 'sqft_living' then calculate the R^2. Take a screenshot of your code and the value of the R^2.

image

Question 7: Fit a linear regression model to predict the 'price' using the list of features:
• "floors"
• "waterfront"
• "lat"
• "bedrooms"
• "sqft_basement"
• "view"
• "bathrooms"
• "sqft_living15"
• "sqft_above"
• "grade"
• "sqft_living"

image

Question 8: Create a pipeline object that scales the data performs a polynomial transform and fits a linear regression model. Fit the object using the features in the question above, then fit the model and calculate the R^2. Take a screenshot of your code and the R^2.

image

Question 9: Create and fit a Ridge regression object using the training data, setting the regularization parameter to 0.1 and calculate the R^2 using the test data. Take a screenshot for your code and the R^2

image

Question 10: Perform a second order polynomial transform on both the training data and testing data. Create and fit a Ridge regression object using the training data, setting the regularisation parameter to 0.1. Calculate the R^2 utilising the test data provided. Take a screenshot of your code and the R^2.

image

End- Thank you for reading