Analyzing Unique Ingredients in World Cuisines

Certain ingredients are often staples of particular world cuisines. The use of hard cheeses in Italian cooking, and the use of masalas in Indian cooking are two particularly well-known examples. We sought out to discover what ingredients are most uniquely associated with other various cuisines.

Visualizing Online Calendar Data on Plotly Graphs


A very common use of line charts is to visualize some quantity as a function of time. These are commonly called time series graphs and they allow trends in that quantity to be analyzed for changes. Sometimes these changes result from behavior. For instance,

Introduction To Web Scraping and Data Cleaning

There are many instances where we find a list of useful items or a table placed on a web page that can help us enhance our analysis or even form the data for our projects. Most often, copy pasting off of the web page does not work very well and can take hours to complete. This is the situation we were faced with when trying to a list of organizations for one of our clients’ projects.

We are working with Dave Grace, who is the Director of Christian education at a prominent church in Washington DC, in order to create a program to assess the energy efficiency of churches in DC, Maryland and Virginia. In the process of planning this program, one task was to get a list of churches in these areas from the National Capital Presbytery website. The site has the list of churches in the following format:

The churches are listed out alphabetically on the website as links with the address and phone number as text below each link. In this post we describe the steps to go about creating a routine for scraping such data from a website.

Calling libraries for web scraping

The first thing we do is to load the necessary packages required for webscraping in R. We use the rvest package for scraping and stringr to clean the data.


Accessing relevant data

We then define the url containing the information we need, and read_html() function reads in the url and returns the information as an xml document.

First we want to retrieve the church names from the page. In order to do this we need to pull out the tags/ elements /nodes containing the church name in the XML. The function html_nodes() helps us do this, and we need to pass in the element name, which can be retrieved using a CSS selector tool called SelectorGadget. It can easily be installed as an extension on chrome browsers. In order to use it, we go to the url, click on the SelectorGadget icon and then right click on the element that we want to pull out. The box to the bottom right of your browser shows the selector. Right clicking again can deselect the element. We discover that the selector is called “p :nth-child(1)” for the church names, and this is passed into the function  as –  html_nodes(“p :nth-child(1)”)

Once the nodes/elements are accessed using html_nodes() function, the actual content is retrieved using html_text().

url <- ""
church_names_temp <- url %>% 
  read_html() %>%
  html_nodes("p :nth-child(1)") %>%

Now we move on to scrape the actual contact information for each church. The contacts paragraph is selected using SelectorGadget again which gives us the selector name as “.church-info p”

church_info_temp <- url %>%
  read_html() %>%
  html_nodes(".church-info p") %>%

Data Cleaning

This part is specific to the kind of resulting text your scraping retrieves. We faced the issues of having blank rows/churches and duplicated rows for each church, that need to be removed.

# Remove blanks
church_names <- church_names_temp[church_names_temp != ""]

# We notice all names are duplicated 
names_table <- table(church_names)

# A couple of Churches have the same name (seen as 4 duplicates) which we do not want to remove, save these
dont_remove <- names(names_table)[names_table == 4]

church_names <- church_names[!duplicated(church_names)]
church_names <- sort(c(church_names, dont_remove))

Splitting the paragraphs of content

Regular expressions are useful in pulling out the appropriate parts from the messy text containing escape sequences. In this case, we needed to clean up the text and extract the phone numbers, addresses and state zip codes separately to put them in separate columns.

# separate on the carriage returns
church_info_split <- strsplit(church_info_temp, "\n")

# some phone numbers had ".", some had "-" separators
phone_numbers <- sapply(church_info_split, function(x) {
  gsub("\\.", "-", gsub(".*Phone: ([0-9]+[.-][0-9]+[.-][0-9]+).*", "\\1", paste(x, collapse = " ")))

fax_numbers <- sapply(church_info_split, function(x) {
  gsub("\\.", "-", gsub(".*Fax:([0-9]+[.-][0-9]+[.-][0-9]+).*", "\\1", paste(x, collapse = " ")))
fax_numbers[nchar(fax_numbers) > 12] <- NA

addresses <- sapply(church_info_split, function(x) {
  gsub("  ", " ", str_trim(gsub("\t", "", gsub("(.*)Phone:.*", "\\1", paste(x, collapse = " ")))))

addresses_split <- strsplit(addresses, ", ")
state_zip <- sapply(addresses_split, function(x) { x[length(x)] })
state_zip_split <- strsplit(state_zip, " ")

final_zips <- sapply(state_zip_split, `[`, 2)
final_states <- sapply(state_zip_split, `[`, 1)

city <- sapply(addresses_split, function(x) { x[length(x) - 1] })

addresses_temp <- sapply(addresses_split, function(x) {
  return(str_trim(paste(x[1:(length(x) - 2)], collapse = " ")))

Final Dataset!

The individual vectors of data are finally merged as columns in a dataframe. This can be written onto disk with a write.csv()

final_df <- data.frame(
  Name = church_names,
  Address = addresses_temp,
  City = city,
  State = final_states,
  Zip = final_zips,
  Phone = phone_numbers,
  Fax = fax_numbers

##                            Name                  Address         City
## 1   Adelphi Presbyterian Church          9401 Riggs Road      Adelphi
## 2     Aldie Presbyterian Church 32260 Meeting House Lane        Aldie
## 3 Arlington Presbyterian Church          P. O. Box 41810    Arlington
## 4          Ashburn Presbyterian       20962 Ashburn Road      Ashburn
## 5         Bealeton Presbyterian    6415 Schoolhouse Road     Bealeton
## 6           Berwyn Presbyterian      6301 Greenbelt Road College Park
##   State        Zip        Phone          Fax
## 1    MD      20783 301-434-6337         <NA>
## 2    VA      22001 703-327-3090         <NA>
## 3    VA      22204 703-920-5660 703-920-8474
## 4    VA      20147 703-729-2021 703-729-0051
## 5    VA 22712-0166 540-439-2375         <NA>
## 6    MD      20740 301-474-7573         <NA>

Here is the full dataset for you to explore:

1Adelphi Presbyterian Church9401 Riggs RoadAdelphiMD20783301-434-6337NA
2Aldie Presbyterian Church32260 Meeting House LaneAldieVA22001703-327-3090NA
3Arlington Presbyterian ChurchP. O. Box 41810ArlingtonVA22204703-920-5660703-920-8474
4Ashburn Presbyterian20962 Ashburn RoadAshburnVA20147703-729-2021703-729-0051
5Bealeton Presbyterian6415 Schoolhouse RoadBealetonVA22712-0166540-439-2375NA
6Berwyn Presbyterian6301 Greenbelt RoadCollege ParkMD20740301-474-7573NA
7Bethesda Presbyterian7611 Clarendon RoadBethesdaMD20814301-986-1137301-986-1230
8Boyds Presbyterian19901 White Ground RoadBoydsMD20841301-540-2544301-540-4975
9Bradley Hills Presbyterian6601 Bradley BoulevardBethesdaMD20817301-365-2850301-365-6218
10Brambleton Presbyterian42395 Ryan Road Suite 112B #633BrambletonVA20148-4858703-542-8530NA
11Brazilian Bible Church20701 Frederick RoadGermantownMD20876301-802-1743NA
12Brentsville Presbyterian12305 Bristow RoadBristowVA20136703-368-2546NA
13Burke Presbyterian5690 Oak Leather DriveBurkeVA22015703-764-0456703-764-1853
14Bush Hill Presbyterian4916 Franconia RoadAlexandriaVA22310703-971-1171703-971-9007
15Calvary Presbyterian6120 North Kings HighwayAlexandriaVA22303703-768-8510703-768-7690
16Capitol Hill Presbyterian Church201 Fourth Street SEWashingtonDC20003202-547-8676202-547-2182
17Catoctin (The) Presbyterian Church15565 High StreetWaterfordVA20197540-882-3058540-882-4683
18Centreville Presbyterian15450 Lee HighwayCentrevilleVA20120703-830-0098703-830-8375
19Chesterbrook Taiwanese Presbyterian2036 Westmoreland StreetFalls ChurchVA22043703-241-2433NA
20Chevy Chase PresbyterianOne Chevy Chase Circle NWWashingtonDC20015202-363-2202202-537-2916
21Christ Presbyterian12410 Lee-Jackson HighwayFairfaxVA22033703-278-8365NA
22Christ the King Presbyterian Church6301 Greenbelt RoadBerwyn HeightsMD20787240-217-9960NA
23Christian Community Presbyterian3120 Belair DriveBowieMD20715301-262-6008NA
24Church of the Covenant2666 Military RoadArlingtonVA22207703-524-4115703-524-4248
25Church of the Pilgrims2201 P Street NWWashingtonDC20037202-387-6612202-387-6614
26Church of the Redeemer Presbyterian1423 Girard Street NEWashingtonDC20017202-832-0095NA
27Clarendon Presbyterian1305 North Jackson StreetArlingtonVA22201703-527-9513703-524-4511
28Clifton Presbyterian12748 Richards LaneCliftonVA20124703-830-3175703-830-6618
29Colesville Presbyterian12800 New Hampshire AvenueSilver SpringMD20904301-622-4555301-625-3095
30Community Presbyterian1122 Oronoco StreetAlexandriaVA22313703-683-4164NA
31Covenant Presbyterian12700 Black Forest Lane #204WoodbridgeVA22192703-583-4090NA
32Darnestown Presbyterian15120 Turkey Foot RoadDarnestownMD20878301-948-9127301-948-9135
33Eastminster Presbyterian5601 Randolph StreetHyattsvilleMD20784301-864-1149NA
34Ebenezer Presbyterian14508 Telegraph RoadWoodbridgeVA22182703-492-7172703-492-7174
35Emmanuel Indonesian Presbyterian Church215 Montgomery AvenueRockvilleMD20850301-500-4018NA
36Ewe Church of America1700 Spencerville RoadSpencervilleMD20914240-669-9286NA
37Fairfax Presbyterian10723 Main StreetFairfaxVA22030703-273-5300703-591-4246
38Fairlington Presbyterian3846 King StreetAlexandriaVA22302703-931-7344703-931-6062
39Faith Presbyterian4161 South Capitol St SWWashingtonDC20023202-562-2035NA
40Falls Church Presbyterian225 East Broad StreetFalls ChurchVA22046703-532-6518703-532-6594
41Fifteenth Street Presbyterian1701 15th Street NWWashingtonDC20009202-234-0300NA
42First Korean Presbyterian7610 Newcastle DriveAnnandaleVA22003703-354-9223NA
43First Presbyterian7610 Newcastle DriveAnnandaleVA22003703-941-3300703-941-0845
44First Presbyterian601 North Vermont StreetArlingtonVA22203703-527-4766703-527-2262
45First United of Dale City14391 Minnieville RoadWoodbridgeVA22193703-670-7834703-670-7834
46Furance Mountain Presbyterian Church12946 James Monroe HwyLeesburgVA2017612946 James Monroe Hwy, Leesburg, VA 20176 Phone: NA
47Gaithersburg Presbyterian610 South Frederick AvenueGaithersburgMD20877301-948-9418301-869-3043
48Garden Memorial Presbyterian1720 Minnesota Avenue SEWashingtonDC20020202-678-0772NA
49Geneva Presbyterian11931 Seven Locks RoadRockvilleMD20854301-424-4346301-340-0265
50Georgetown Presbyterian3115 P Street NWWashingtonDC20007202-338-1644202-338-4797
51Good Samaritan PresbyterianPO Box 925WaldorfMD20604-0925301-843-1335301-645-4134
52Grace Presbyterian5924 Princess Garden ParkwayLanham SeabrookMD20706301-577-1092301-577-7483
53Grace Presbyterian7434 Bath StreetSpringfieldVA22150703-451-2900703-451-3313
54Greenwich Presbyterian15305 Vint Hill RoadNokesvilleVA20181703-754-7933703-753-3683
55Heritage Presbyterian8503 Fort Hunt RoadAlexandriaVA22308703-360-9546703-360-7389
56Hermon Presbyterian Church7801 Persimmon Tree LaneBethesdaMD20817301-365-4454NA
57Hope Presbyterian1100 Enterprise RoadMitchellvilleMD20721301-249-7774301-249-9606
58Idylwood Presbyterian7617 Idylwood RoadFalls ChurchVA22043703-573-3027NA
59Immanuel Presbyterian1125 Savile LaneMcLeanVA22101703-356-3042703-790-0756
60Indo Pak Presbyterian641 Dranesville RoadHerndonVA20170703-787-0275NA
61Indonesian-American Presbyterian3211 Paul DriveSilver SpringMD20902240-505-5446NA
62John Calvin Presbyterian6531 Columbia PikeAnnandaleVA22003703-256-3644703-941-3341
63Kirkwood Presbyterian8336 Carrleigh ParkwaySpringfieldVA22152703-451-5320703-451-1959
64Knox Presbyterian7416 Arlington BoulevardFalls ChurchVA22042703-560-5288703-560-6603
65Korean Presbyterian800 Hurley AvenueRockvilleMD20850301-838-0766301-838-3060
66Laurel Presbyterian7610 Sandy Spring RoadLaurelMD20707301-776-6665301-776-6665
67Leesburg Presbyterian207 West Market StreetLeesburgVA20176703-777-4163703-777-4666
68Lewinsville Presbyterian1724 Chain Bridge RoadMcLeanVA22101703-356-7200703-356-7334
69Litchfield Presbyterian135 West Bowen StreetRemingtonVA22734135 West Bowen Street, Remington, VA 22734 Phone: NA
70Little Falls Presbyterian6025 Little Falls RoadArlingtonVA22207703-538-5230703-538-6725
71Manassas Presbyterian8201 Ashton AvenueManassasVA20109703-369-2058703-330-8827
72Mizo Presbyterian610 South Frederick AvenueGaithersburgMD20877610 South Frederick Avenue, Gaithersburg, MD 20877 Phone: NA
73Mount Vernon Presbyterian2001 Sherwood Hall LaneAlexandriaVA22306703-765-6118NA
74National Presbyterian4101 Nebraska Avenue NWWashingtonDC20016202-537-0800202-686-0031
75Neelsville Presbyterian20701 Frederick RoadGermantownMD20876301-972-3916301-972-5563
76New Hope Presbyterian17930 Bowie Mill RoadOlneyMD20855301-987-8989301-987-9010
77New York Avenue Presbyterian1313 New York Avenue NWWashingtonDC20005202-393-3700202-393-3705
78Northeastern Presbyterian2112 Varnum Street NEWashingtonDC20018202-526-1730202-526-5900
79Northern Virginia Korean Presbyterian4211 Evergreen LaneAnnandaleVA22003703-941-3338NA
80Northminster Presbyterian7720 Alaska Avenue NWWashingtonDC20012202-829-5311NA
81Northwood Presbyterian1200 University Blvd WestSilver SpringMD20902301-593-1180301-649-1155
82Oaklands Presbyterian14301 Laurel Bowie RoadLaurelMD20708301-776-5833NA
83Old Presbyterian Meeting House323 South Fairfax StreetAlexandriaVA22314703-549-6670703-549-9425
84Patuxent Presbyterian23421 Kingston Creek RoadCaliforniaMD20619301-863-2033301-863-8004
85Poolesville Presbyterian17800 Elgin RoadPoolesvilleMD20837301-972-7452NA
86Potomac Presbyterian10301 River RoadPotomacMD20854301-299-6007301-299-9438
87Prince George’s Community Church10111 Martin Luther King Jr. Highway Suite 200ABowieMD20720301-218-4802NA
88Providence Presbyterian9019 Little River TurnpikeFairfaxVA22031703-978-3934703-978-4306
89Riverdale Presbyterian6513 Queens Chapel RoadUniversity ParkMD20782301-927-0477301-699-2156
90Riverside Presbyterian20 Pidgeon Hill Drive Suite 109SterlingVA20165703-444-3528703-444-8660
91Rock (The) Presbyterian Church800 Hurley AveRockvilleMD20850301-838-0766NA
92Rockville Presbyterian215 W. Montgomery AvenueRockvilleMD20850301-762-3363301-762-5823
93Rockville United Church355 Linthicum StreetRockvilleMD20851301-424-6733301-738-7695
94Saint Mark Presbyterian10701 Old Georgetown RoadRockvilleMD20852301-530-0600301-530-2613
95Sargent Memorial Presbyterian5109 N.H. Burroughs Ave NEWashingtonDC20019202-396-1710202-396-0708
96Silver Spring Presbyterian580 University Blvd. EastSilver SpringMD20901301-439-4646301-439-4647
97Sixth Presbyterian5413 16th Street NWWashingtonDC20011202-723-5377202-723-8416
98Southminster Presbyterian Church7801 Livingston RoadOxon HillMD20745301-567-1510NA
99St. Andrew Presbyterian711 West Main StreetPurcellvilleVA20132540-338-4332540-338-4333
100St. Matthew Presbyterian4001 Bel Pre RoadSilver SpringMD20906301-598-4400301-598-4401
101Taiwanese Presbyterian7410 Needwood RoadDerwoodMD20855301-942-1133NA
102Takoma Park Presbyterian310 Tulip AvenueTakoma ParkMD20912301-270-5550301-270-8405
103Trinity Presbyterian651 Dranesville RoadHerndonVA20170703-437-5500703-437-4861
104Trinity Presbyterian Church5533 North 16th StreetArlingtonVA22205703-536-5600703-536-2815
105United Christian Parish of Reston11508 North Shore DriveRestonVA20190703-620-3065703-707-0622
106United Korean Presbyterian7009 Wilson LaneBethesdaMD20817301-229-0000301-229-0200
107United Parish of BowiePO Box 1571BowieMD20717-0171301-249-6411301-249-6411
108Unity Presbyterian4401 Brinkley RoadTemple HillsMD20748301-449-7686NA
109Universal Evangelical Church1523 Forest Glen RoadSilver SpringMD20910301-593-0861NA
110Vienna Presbyterian124 Park Street NEViennaVA22180703-938-9050703-938-8264
111Warner Memorial Presbyterian10123 Connecticut AvenueKensingtonMD20895301-949-2900301-933-7704
112Western Presbyterian2401 Virginia Ave. NWWashingtonDC20037202-835-8383202-835-8376
113Westminster Presbyterian400 Eye Street SWWashingtonDC20024202-484-7700202-484-8544
114Westminster Presbyterian Church2701 Cameron Mills RoadAlexandriaVA22302703-549-4766703-548-1505
115Wheaton Community Church3211 Paul DriveSilver SpringMD20902301-949-2742NA