Exploring Toronto Neighborhoods - to open an Indian Restaurant

As a part of the IBM Data Science professional program Capstone Project, we worked on the real datasets to get an experience of what a data scientist goes through in real life. Main objectives of this project were to define a business problem, look for data in the web and, use Foursquare location data to compare different neighborhoods of Toronto to figure out which neighborhood is suitable for starting a new restaurant business. In this project, we will go through all the process in a step by step manner from problem designing, data preparation to final analysis and finally will provide a conclusion that can be leveraged by the business stakeholders to make their decisions.

1. Description of the Business Problem & Discussion of the Background (Introduction Section):

Problem Statement:Prospects of opening an Indian Restaurant in Toronto, Canada.

Toronto, the capital of the province of Ontario, is the most populous Canadian city. Its diversity is reflected in Toronto’s ethnic neighborhoods such as Chinatown, Corso Italia, Greektown, Kensington Market, Koreatown, Little India, Little Italy, Little Jamaica, Little Portugal & Roncesvalles. One of the most immigrant-friendly cities in North America with more than half of the entire Indian Canadian population residing in Toronto it is one of the best places to start an Indian restaurant.

In this project we will go through step by step process to make a decision whether it is a good idea to open an Indian restaurant. We analyze the neighborhoods in Toronto to identify the most profitable area since the success of the restaurant depends on the people and ambience. Since we already know that Toronto shelter a greater number of Indians than any other city in Canada, it is a good idea to start the restaurant here, but we just need to make sure whether it is a profitable idea or not. If so, where we can place it, so it yields more profit to the owner.

Source: Bethesda Indian Food Festival

Target Audience

Who will be more interested in this project? What type of clients or a group of people would be benefitted?

  1. Business personnel who wants to invest or open an Indian restaurant in Toronto. This analysis will be a comprehensive guide to start or expand restaurants targeting the Indian crowd.
  2. Freelancer who loves to have their own restaurant as a side business. This analysis will give an idea, how beneficial it is to open a restaurant and what are the pros and cons of this business.
  3. Indian crowd who wants to find neighborhoods with lots of option for Indian restaurants.
  4. Business Analyst or Data Scientists, who wish to analyze the neighborhoods of Toronto using Exploratory Data Analysis and other statistical & machine learning techniques to obtain all the necessary data, perform some operations on it and, finally be able to tell a story out of it.

2. Data acquisition and cleaning:

2.1 Data Sources

a) I’m using “List of Postal code of Canada: M” ( https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M ) wiki page to get all the information about the neighborhoods present in Toronto. This page has the postal code, borough & the name of all the neighborhoods present in Toronto.

b) Then I’m using “ https://cocl.us/Geospatial_data” csv file to get all the geographical coordinates of the neighborhoods.

c) To get information about the distribution of population by their ethnicity I’m using “Demographics of Toronto” ( https://en.m.wikipedia.org/wiki/Demographics_of_Toronto#Ethnic_diversity ) wiki page. Using this page I’m going to identify the neighborhoods which are densely populated with Indians as it might be helpful in identifying the suitable neighborhood to open a new Indian restaurant.

d) To get location and other information about various venues in Toronto I’m using Foursquare’s explore API. Using the Foursquare’s explore API (which gives venues recommendations), I’m fetching details about the venues up present in Toronto and collected their names, categories and locations (latitude and longitude).

From Foursquare API ( https://developer.foursquare.com/docs) , I retrieved the following for each venue:

  • Name: The name of the venue.
  • Category: The category type as defined by the API.
  • Latitude: The latitude value of the venue.
  • Longitude: The longitude value of the venue.

2.2 Data Cleaning

a) Scraping Toronto Neighborhoods Table from Wikipedia

Scraped the following Wikipedia page, “ List of Postal code of Canada: M ” in order to obtain the data about the Toronto & the Neighborhoods in it.

Assumptions made to attain the below DataFrame:

  • Dataframe will consist of three columns: PostalCode, Borough, and Neighborhood
  • Only the cells that have an assigned borough will be processed. Borough that is not assigned are ignored.
  • More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.
  • If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.

Wikipedia — package is used to scrape the data from wiki.

Dataframe formed from the scraped wiki page

After some cleaning we got the proper dataframe with the Postal code, Borough & Neighborhood information.

Dataframe from ‘ List of Postal code of Canada: M’ Wikipedia Table.

b) Adding geographical coordinates to the neighborhoods

Next important step is adding the geographical coordinates to these neighborhoods. To do so I’m extracting the data present in the Geospatial Data csv file and I’m combining it with the existing neighborhood dataframe by merging them both based on the postal code.

DataFrame with latitude & longitude of Postal codes in Toronto

I’m renaming the columns to match the existing dataframe formed from ‘ List of Postal code of Canada: M’ wiki page. After that I’m merging both the dataframe into one by merging on the postal code.

Merged new dataframe with info about Neighborhoods, borough, postalcode, latitude & longitude in Toronto

c) Scrap the distribution of population from Wikipedia

Another factor that can help us in deciding which neighborhood would be best option to open a restaurant is, the distribution of population based on the ethnic diversity for each neighborhood. As this helps us in identifying the neighborhoods which are densely populated with Indian crowd since that neighborhood would be an ideal place to open an Indian restaurant.

Scraped the following Wikipedia page, “Demographics of Toronto” in order to obtain the data about the Toronto & the Neighborhoods in it. Compared to all the neighborhoods in Toronto below given neighborhoods only had considerable amount of Indian crowd. We are examining those neighborhood’s population to identify the densely populated neighborhoods with Indian population.

Scraping the wiki page

There were only six neighborhoods in Toronto which Indian population spread across so we are gathering the population, it’s percentage in each riding in those neighborhoods.

TORONTO & EAST YORK population distribution by ethnicity

NORTH YORK population distribution by ethnicity

SCARBOROUGH population distribution by ethnicity

ETOBICOKE & YORK population distribution by ethnicity

d) Get location data using Foursquare

Foursquare API is very useful online application used my many developers & other applications like Uber etc. In this project I have used it to retrieve information about the places present in the neighborhoods of Toronto. The API returns a JSON file and we need to turn that into a data-frame. Here I’ve chosen 100 popular spots for each neighborhood within a radius of 1km.

Dataframe with venues in each neighborhood along with the category info of the venues.

3. Exploratory Data Analysis:

3.1 Folium Library and Leaflet Map

Folium is a python library, I’m using it to draw an interactive leaflet map using coordinate data.

code to draw the folium map