Preliminary Study to Open a New Chinese Restaurant in West Java using KMeans Clustering Analysis

As the culinary businesses are going hot today, it is important to conduct a preliminary study using data analytics before opening up your new business in a certain area

Mohamad Nur Syahril Kaharu
8 min readFeb 17, 2021

West Java as the most populous province and one of the biggest economy in Indonesia has many potential to start a new business, especially a culinary business. There are a lot of restaurants that scattered across the province, many of them is a local restaurant like Indonesian or Sundanese restaurant. Although there are also a lot of Chinese Restaurant, but this type of restaurant still less a lot compared to the local restaurant. This project aims to help people, particularly to a new started businessman to open their new Chinese Restaurant on West Java. It will help them to make their business decision easily based on the distribution of the Chinese Restaurant on West Java. In this project, I’m creating a hyphotetical assumptions that doesn’t include the other variables to consider such as economy outlook of the city (inflation rate, unemployment number, etc) or the market behaviour of people in the city which are also an important consideration. Nonetheless, this recommendation is still an important consideration to make this business decision.

1. Data Collection

The dataset is from https://simplemaps.com/data/id-cities. The dataset consist of 8,912 prominent cities in Indonesia, their province, and also the longitude and latitude of each cities and other relevant information. Then the data is filtered to find only the city on West Java (or ‘Jawa Barat’ on the dataset). The final data that will be used in this project would be the list of the cities in West Java and their longitude and latitude. Because the limitation of API calls we use (will be explained later), the city list is intentionally limited to only the top 300 cities on West Java based on the population. This is the map of the West Java and its top 300 city (representing as red circle).

Map of the West Java and its top 300 city

Foursquare API

Foursquare API is used to get the nearby Chinese Restaurant using its ‘explore’ API calls. This is one of the most powerful location services API. This Foursquare API allows us to find all venues and events within an area of interest, including Chinese Restaurant as long as the geospatial information such as longitude and latitude is provided.

3. Methodology

Some of tools that used in this project:

  1. Indonesia Cities Geospatial data (https://simplemaps.com/data/id-cities)
  2. Foursquare API
  3. Folium Map
  4. Kmeans Clustering

The aim is to find the Chinese Restaurant venues around each city. The Foursquare API is used to do that with ‘explore’ API call to get the venues around an area of interest, as long as the longitude and latitude information is provided. The longitude and latitude information is got from the Indonesia Cities Geospatial dataset that had to be downloaded as csv file from https://simplemaps.com/data/id-cities website. After that, data is clustered using KMeans method based on the amount of Chinese Restaurant for each city. Finally the Folium Map is used visualize the clustering result on actual map.

Data Collection

In this stage, the data is collected from csv file ‘id_cities.csv’ that had been downloaded from https://simplemaps.com/data/id-cities website. This data consist of 8,912 prominent cities in Indonesia, their province, and also the longitude and latitude of each cities and other relevant information. Pandas library is used to read the file using the pd.read_csv() method.

# get the dataframe from csv file
indo = pd.read_csv('id_cities.csv')
indo.head()

Data Preprocessing

There are 8912 data from the dataset. Since the area of interest is on West Java (or ‘Jawa Barat’), the other data will be dropped.

wj1 = indo[indo['admin_name']=='Jawa Barat'].reset_index(drop=True)
wj1.head()

Feature Selection

Only ‘city’, ‘lat’, ‘lng’, and ‘admin_name’ features are needed. The other features can be dropped using drop() method from Pandas library. Also the ‘admin_name’ feature can be renamed to ‘province’ using rename() method that also from Pandas library.

wj2 = wj1.drop(['iso2', 'capital', 'population', 'population_proper', 'country'], axis = 1).rename({'admin_name' : 'province'}, axis=1)wj2.head(10)

Feature Engineering

There are 1658 city in West Java. Because of the limitation of Forsquare API calls (only 950 regular calls per day), the city list is intentionally limited to only the top 300 cities on West Java based on the population.

wj = wj2.head(300)
wj

Foursquare API Calls

The ‘explore’ call is used to find all venues and events within an area of interest, including Chinese Restaurant. A function is made to process every city on the list. After that, the ‘Venue Category’ is obtained and can be counted for each of city.

Foursquare API setting :

# Foursquare API setting
CLIENT_ID = '' # your Foursquare ID
CLIENT_SECRET = '' # your Foursquare Secret
VERSION = '20210213' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Function to make API calls :

# Function to get all the venue categories
def getNearbyVenues(names, latitudes, longitudes, radius=10000):

venues_list=[]
for name, lat, lng in zip(names, latitudes, longitudes):
print(name)

# create the API request URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}'.format(
CLIENT_ID,
CLIENT_SECRET,
VERSION,
lat,
lng,
radius
)

# make the GET request
results = requests.get(url).json()["response"]['groups'][0]['items']

# return only relevant information for each nearby venue
venues_list.append([(
name,
lat,
lng,
v['venue']['name'],
v['venue']['categories'][0]['name']) for v in results])
nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
nearby_venues.columns = ['Neighbourhood',
'Neighbourhood Latitude',
'Neighbourhood Longitude',
'Venue',
'Venue Category']

return(nearby_venues)

One Hot Encoding

One hot encoding is a method to convert the categorical data into numeric data. In this project, one hot encoding is used to calculate the weight of each ‘Venues Category’ to each city, representing how much the certain Venues Category is appeared on each city. After converting each ‘Venue Category’ to numerical values using get_dummies method on pandas, pandas mean method can be used to find this values. And then after that, the dataframe can be filtered to only the ‘Chinese Restaurant’ column, because that’s the interest of this project.

# one hot encoding
wj_onehot = pd.get_dummies(wj_venues[['Venue Category']], prefix="", prefix_sep="")
# add neighborhood column back to dataframe
wj_onehot['Neighbourhood'] = wj_venues['Neighbourhood']
# move neighborhood column to the first column
fixed_columns = [wj_onehot.columns[-1]] + list(wj_onehot.columns[:-1])
wj_onehot = wj_onehot[fixed_columns]
print(wj_onehot.shape)
wj_onehot.head()

KMeans Clustering

This is the most important part since we aim to cluster the city based on the Chinese Restaurant distribution. We can cluster them into 3 cluster (also to see its pattern).

# import k-means from clustering stage
from sklearn.cluster import KMeans
# set number of clusters
clusters = 3
wj_clustering = wj_asian.drop(['Neighbourhood'], 1)# run k-means clustering
kmeans = KMeans(n_clusters=clusters, random_state=0).fit(wj_clustering)
# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

4. Result and Discussion

This is the clustering result visualized on the map :

Clustering Result on Map

As the result above, showed that there three cluster cities on West Java based on the Chinese Restaurant distribution of each cities. Cluster 0 (red circle) is cities who have lowest (or even zero) Chinese Restaurant on it. Mostly the cities are in sub-urban area (except for Bandung). This cities are recomended for opening new a Chinese Restaurant (of course other variables are also required to consider, ex : market behavior, city economy outlook, etc. but this is out of this project scope). Cluster 2 (green circle) is cities who are in the middle in terms of the number of chinese restaurant. Mostly this cluster is around big cities such as Depok, Bogor, and Karawang. The recommendation for these cities is very depend on the other variables. But because mostly on this cluster is big cities, of course we still encouraging people to start their Chinese Restaurant business in here because the market in big cities are also big. Cluster 1 (purple circle) is just one city, it’s around Cibadak. This cluster is not recommended for opening the Chinese Restaurant because there are a lot of competitor in here. Of course this recommendation is just based on the number of competitor side. There are many variables to be considered such as the economy outlook of the city (inflation, unemployment number, etc.) or the market behaviour. Nonetheless, this recommendation might be one of the prominent consideration to start a new Chinese Restaurant business in West Java. At least we already know that there are a lot of Chinese restaurant around Cibadak, and we don’t recommend the city to be the place for opening the Chinese Restaurant.

Limitations and Suggestions for Future Research

In this project, to solve where is the good city to open a new chinese restaurant business on West Java, the variable is only one factor, the number of chinese restaurant on the city. In real case, there are many variables to consider for open a new business in a location. One of the most important variable is economy outlook in the city (such as GDP, inflation, and unemployment) and market behavior. But this recomendation might be one of the prominent consideration to start a new Chinese Restaurant business in West Java. In the future research, the other variables must be considered to obtain a better result. The other limitation is the geospatial dataset that is limited to just per city, not per postal code. So the analysis cannot be done on the city level, must be wider on province or state level.

5. Conclusion

As the result showed, we recommend the cities on the clusters 0 to be considered as one of the best location to open a new Chinese restaurant. The consideration must include other variables such as economy outlook in the city (such as GDP, inflation, and unemployment) and market behaviour residents on the city. Some of the cities on cluster 2 are also recommended (around big cities). Cibadak is the only city in cluster 1 which is not recommended for opening the Chinese Restaurant because there are a lot of competitor in here.

--

--