Calculating travel time and distance using Google Maps API in R.
Let’s say you needed to visit a location and needed to know how far it was and long it would take. Chances are you’ll fire up your phone or computer and head over to Google Maps, punch in the locations and be on your merry way. But what happens if you needed to visit dozens or hundreds or dare I say thousands of locations? Enter the Google Maps API and the gmapsdistance package in R.
To do this, we’re going to write code that’s going to instruct the Google API to give us back the distance and time from one location to another and store it in a useful data frame. The data that we’ll use is a list of IDNYC locations from NYC Open data because it has the latitude and longitude data already in the dataset.
Warning, a shameless self promotion ahead. If you want to learn more on using API in R or NYC Open Data, check out my post Using R to access 311 service request from NYC Open Data using Socrata Open Data API, and the RSocrata package
Step 1 — Set up a Google API Account and get your API key.
You’ll need a Google account and some form of payment. Google gives you $200 worth of API credit each month. I’ve been dabbling with this API for a few days now and I’ve spent a whopping $0 dollars, but Google has tons of ways to monitor your usage. See more on their pricing here.
To get started head over to the Google Maps API page and click “Get Started”. You’ll be prompted to setup your account:
Step 2 — Set up a Google API Project
After your profile is complete, you’ll be taken to your “API Dashboard” home page. Click on “Enable APIS and Services”
You’ll be presented with the various APIs Google has to offer.
Search for “Distance Matrix API” and go ahead and “Enable” it.
Once it’s enabled, head back to your API dashboard and click on the “Credentials” link on the left side of your navigation plane.
Here you’ll see you will see the Distance Matrix API that was just enabled as well as the API key. We’ll need the API key later!
Step 3- Get the IDNYC Locations data from NYC Open Data
Head over the to IDNYC Locations data from NYC Open Data page and download the CSV dataset and save it in your R project library.
Step 4 — Fire up R and install the gmapsdistance package and load the data.
Before we begin, we’ll also use the dplyr package to clean the data.
We’re going to use the gmapsdistance package to get the information that we need, so go ahead and read the documentation and install package:
install.packages(“gmapsdistance”)
Then load the packages and the data:
library(gmapsdistance)
library(stringr)
library(dplyr)
IDNYC<-read.csv(“IDNYC_Locations.csv”)
Step 5 — Clean the data and prep the data
Taking a look at the dataset, Each row is a different IDNYC location. The column of data that we’re most interested in is the coordinate fields (latitude and longitude). After taking a look at the data, you’ll see that NYC Open data gives it to us in a single column named “Location.1”:
This is pretty handy because the gmapsdistance package needs the coordinate data in the following format:
latitude+longitude which looks like this: 40.589779+-73.79928
So we need to clean the data up a bit. We need to do:
Remove the open and closed parenthesis
#remove the “(“ from the values
IDNYC$Location.1<-str_remove(IDNYC$Location.1, “[)]”)
#remove the “)” from the values
IDNYC$Location.1<-str_remove(IDNYC$Location.1, “[(]”)
Replace the the comma with a “+” symbol
#replace the “,” with a “+”
IDNYC$Location.1<-str_replace(IDNYC$Location.1, “,”, “+”)
Remove the black space between the “+” and longitude value
IDNYC$Location.1<-str_replace_all(IDNYC$Location.1, fixed(“ “), “”)
Changing the coordinates from:
(40.589779, -73.79928) to 40.589779+-73.79928
Step 6 — Create a column that has your starting coordinates.
We’ll need a starting point to calculate our travels. For this exercise, I’m going to pick one of my favorite places in the city, Central Park, the corner of 59th street and 5th ave (40.764684, -73.973144). To do so, we use the following code:
#create a column with our starting ponts
IDNYC$start<- c(‘40.764684+-73.973144’)
Now our data frame has the starting location column and the destination column
Step 7 —Make the Google Maps API request and save the results as a dataframe.
We’ll first start with creating two values an origin and a destination:
#create orgin and destinations
origin = IDNYC$start
destination = IDNYC$Location.1
Notice, the origin is the Central Park location we created in the previous step and the destination is the different IDNYC locations.
Now we’re going use the gmapsdistance function nested in the as.data.frame function.
#create a data frame with travel time and distance
IDNYC_results = as.data.frame(gmapsdistance(
origin = origin,
destination = destination,
mode = “driving”,
combinations = “pairwise”,
traffic_model = “optimistic”,
dep_date = “2020–11–16”,
dep_time = “09:00:00”,
key=”YOUR API CODE HERE”))
Few things to note:
origin = This is where you want to begin your travels.
destination = This is where you want to end
mode = This is how you are going to get there (car, walking, bicycle, etc.)
combinations = We’re doing pairwise because we’re going to “pair” the origin and destination with in each row.
traffic_model = This package is very neat, it gives you the option to estimate how the traffic is going to be.
dep_date = What date will you doing your travels.
dep_time = What timewill you doing your travels.
key=Inside the quotation marks you put your API key.
Note: for dep_date and dep_time, this must be in the future. So if you’re following along, change the date and time OR take those lines out and the package will default to now.
To read more about the various functions, I highly suggest you review the documentation.
After you run this code, you should have a new data frame named IDNYC_results.
The two most noteworthy columns are named Time.Time column, which is the amount of time it takes to get from the origin to the destination in seconds:
and the Distance.Distance column which is the total distance it takes to get from the origin to the destination in meters:
Step 8 — Clean the results data frame and merge with the original dataset.
You’ll see that the results data frame that we created doesn’t have any of the information from the original dataset (it would be nice to know which location we’re going to).
So using the dplyr package:
library(dplyr)
We going to use the select function keep Time.de, Time.Time, Distance.Distance columns. and the mutate function to convert time from seconds into minutes and distance from meters to miles and lastley, we’re going to use the rename function to change the name of the Time.de column to Location.1 to match the column name in the original dataset.
IDNYC_results2<- IDNYC_results%>%
select(Time.de, Time.Time,Distance.Distance) %>%
mutate(time_mins = Time.Time/60,
distance_miles = Distance.Distance*0.000621371)%>%
rename(Location.1=Time.de)
Note: Time.de is the destination coordinates, we’re going to use this to match to the original data frame.
Step 8 — Merge the results with our original dataset.
Now we’re going to use the merge function to join the original data
IDNYC_final <-merge(IDNYC, IDNYC_results2, by=”Location.1”)
Boom! We now have travel distance and time all in one dataset.
I hope you found this useful. Leave a comment if you have any questions. Thanks for reading!