In this article, we will analyze the Canada Museum dataset from Kaggle. It provides information about museums located in Canada. This collection of museums could be useful in predicting the price of Airbnb accommodation in Toronto; for example you may have noticed that the rent of a listing increases the nearer it is to points of interest, like museums, restaurants, cafes, and so on.
Let’s take a look at the dataset:
import pandas as pd import geopy l_cols= ['Name','Street Address','City','State','Zipcode'] df = pd.read_csv('/kaggle/input/canada-museums/museums list CAN.csv', encoding = "ISO-8859-1",usecols=l_cols) df = df[df.City=='Toronto'] df.head()
OpenAI
From this overview, we can see that different columns address information, like the name of the museum, the street address, the city, the state, and the zip code.
For the next steps, we need a unique column that merges all the information of these columns. How can we do it? The most efficient way to concatenate more than two columns is by using pandas.series.str.cat()
, which allows us to specify the separator we want between one column and another:
df['Country'] = 'Canada' l_cols_concat = ['Street Address','City','State','Zipcode','Country'] df['unique_address'] = df['Name'].str.cat(others=df[l_cols_concat], sep=',',na_rep='') df.head()
OpenAI
We can create a string variable ‘address1’ that contains the unique address of the first row:
address1 = df['unique_address'].iloc[0] print(address1) # Bizune Event Gallery,452 Richmond St W,Toronto,ON,M5V 1Y1,Canada
OpenAI
We will use it in the next steps to experiment with different geocoding services in a single address. Once it’s clear how Geopy extracts the location, it can be extended to an entire column of a pandas dataframe. We will try the following major providers:
- Google Maps
- OpenStreetMap
- ArcGIS
Leave a Reply