As a part of the IBM Data Science professional program Capstone Project, we worked on the real datasets to get an experience of what a data scientist goes through in real life. Main objectives of this project were to define a business problem, look for data in the web and, use Foursquare location data to compare different neighborhoods of Toronto to figure out which neighborhood is suitable for starting a new restaurant business. In this project, we will go through all the process in a step by step manner from problem designing, data preparation to final analysis and finally will provide a conclusion that can be leveraged by the business stakeholders to make their decisions.
1. Description of the Business Problem & Discussion of the Background (Introduction Section):
Toronto, the capital of the province of Ontario, is the most populous Canadian city. Its diversity is reflected in Toronto’s ethnic neighborhoods such as Chinatown, Corso Italia, Greektown, Kensington Market, Koreatown, Little India, Little Italy, Little Jamaica, Little Portugal & Roncesvalles. One of the most immigrant-friendly cities in North America with more than half of the entire Indian Canadian population residing in Toronto it is one of the best places to start an Indian restaurant.
In this project we will go through step by step process to make a decision whether it is a good idea to open an Indian restaurant. We analyze the neighborhoods in Toronto to identify the most profitable area since the success of the restaurant depends on the people and ambience. Since we already know that Toronto shelter a greater number of Indians than any other city in Canada, it is a good idea to start the restaurant here, but we just need to make sure whether it is a profitable idea or not. If so, where we can place it, so it yields more profit to the owner.
Who will be more interested in this project? What type of clients or a group of people would be benefitted?
a) I’m using “List of Postal code of Canada: M” ( https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M ) wiki page to get all the information about the neighborhoods present in Toronto. This page has the postal code, borough & the name of all the neighborhoods present in Toronto.
b) Then I’m using “ https://cocl.us/Geospatial_data” csv file to get all the geographical coordinates of the neighborhoods.
c) To get information about the distribution of population by their ethnicity I’m using “Demographics of Toronto” ( https://en.m.wikipedia.org/wiki/Demographics_of_Toronto#Ethnic_diversity ) wiki page. Using this page I’m going to identify the neighborhoods which are densely populated with Indians as it might be helpful in identifying the suitable neighborhood to open a new Indian restaurant.
d) To get location and other information about various venues in Toronto I’m using Foursquare’s explore API. Using the Foursquare’s explore API (which gives venues recommendations), I’m fetching details about the venues up present in Toronto and collected their names, categories and locations (latitude and longitude).
From Foursquare API ( https://developer.foursquare.com/docs) , I retrieved the following for each venue:
Scraped the following Wikipedia page, “ List of Postal code of Canada: M ” in order to obtain the data about the Toronto & the Neighborhoods in it.
Assumptions made to attain the below DataFrame:
Wikipedia — package is used to scrape the data from wiki.
After some cleaning we got the proper dataframe with the Postal code, Borough & Neighborhood information.
Next important step is adding the geographical coordinates to these neighborhoods. To do so I’m extracting the data present in the Geospatial Data csv file and I’m combining it with the existing neighborhood dataframe by merging them both based on the postal code.
I’m renaming the columns to match the existing dataframe formed from ‘ List of Postal code of Canada: M’ wiki page. After that I’m merging both the dataframe into one by merging on the postal code.
Another factor that can help us in deciding which neighborhood would be best option to open a restaurant is, the distribution of population based on the ethnic diversity for each neighborhood. As this helps us in identifying the neighborhoods which are densely populated with Indian crowd since that neighborhood would be an ideal place to open an Indian restaurant.
Scraped the following Wikipedia page, “Demographics of Toronto” in order to obtain the data about the Toronto & the Neighborhoods in it. Compared to all the neighborhoods in Toronto below given neighborhoods only had considerable amount of Indian crowd. We are examining those neighborhood’s population to identify the densely populated neighborhoods with Indian population.
There were only six neighborhoods in Toronto which Indian population spread across so we are gathering the population, it’s percentage in each riding in those neighborhoods.
Foursquare API is very useful online application used my many developers & other applications like Uber etc. In this project I have used it to retrieve information about the places present in the neighborhoods of Toronto. The API returns a JSON file and we need to turn that into a data-frame. Here I’ve chosen 100 popular spots for each neighborhood within a radius of 1km.
Folium is a python library, I’m using it to draw an interactive leaflet map using coordinate data.