Data Visualization Project (IBM Data Science Certification)

For this project I was tasked with demonstrating data visualization skills primarily using Matplotlib. Two visualization plots were required for this project: the first one is a plot summarizing the results of a survey that was conducted to gauge an audience interest in different data science topics. The second plot is a Choropleth map of the crime rate in the city of San Francisco.

A survey was conducted to gauge an audience interest in different data science topics, namely:

  1. Big Data (Spark / Hadoop)
  2. Data Analysis / Statistics
  3. Data Journalism
  4. Data Visualization
  5. Deep Learning
  6. Machine Learning

The participants had three options for each topic: Very Interested, Somewhat interested, and Not interested. 2,233 respondents completed the survey.
The survey results have been saved in a csv file and can be accessed through this link
If you examine the csv file, you will find that the first column represents the data science topics and the first row represents the choices for each topic.

Step One: Read the file into a pandas dataframe


image


Step Two:

Use the artist layer of Matplotlib to create a bar chart with the percentage of the respondents' interest in the different data science topics surveyed.

  1. Sort the dataframe in descending order of Very interested.
  2. Convert the numbers into percentages of the total number of respondents. Recall that 2,233 respondents completed the survey. Round percentages to 2 decimal places.
  3. As for the chart implement the following styling:
  • use a figure size of (20, 8),
  • bar width of 0.8
  • use color #5cb85c for the Very interested bars, color #5bc0de for the Somewhat interested bars, and color #d9534f for the Not interested bars.
  • use font size 14 for the bar labels, percentages, and legend
  • use font size 16 for the title, and,
  • display the percentages above the bars as shown above, and remove the left, top, and right borders.

  • Bar Chart

    Step Three: Create a Choropleth map to visualize crime in San Francisco to represent the total number of crimes in each neighborhood.

    The dataset can be found Here


    First, restructure the data so that it is in the right format for the Choropleth map. You will need to create a dataframe that lists each neighborhood in San Francisco along with the corresponding total number of crimes.
    Based on the San Francisco crime dataset, you will find that San Francisco consists of 10 main neighborhoods, namely:

    • Central
    • Southern
    • Bayview
    • Mission
    • Park
    • Richmond
    • Ingleside
    • Taraval
    • Northern
    • Tenderloin

    Bar Chart

    Second, We will need a GeoJSON file that marks the boundaries of the different neighborhoods in San Francisco. In order to save you the hassle of looking for the right file, I already downloaded it for you and I am making it available via this Link


    Bar Chart