Analyzing Unique Ingredients in World Cuisines

Certain ingredients are often staples of particular world cuisines. The use of hard cheeses in Italian cooking, and the use of masalas in Indian cooking are two particularly well-known examples. We sought out to discover what ingredients are most uniquely associated with other various cuisines.

Using the World Cuisine Recipes page on AllRecipes.com, we selected the featured recipes from all 17 cuisines available. This is a non-exhaustive list – The recipes are numerous, and the scrape was done on the recipes displaying on the page prior to scrolling down. In total, 470 recipes were scraped. Each recipe appears as a card like this:

The cards contain URLs to the full recipes. Utilizing Web Scraping techniques, we created a recipes dataset in the following format:

recipe_df %>%
    sample_n(5) %>%
    kable("html") %>%
    kable_styling(bootstrap_options = c("striped", "hover"))
Cuisine Name Ingredients
United States Minnesotas Favorite Cookie 1 cup butter, softened 1 ½ cups brown sugar 2 eggs 2 teaspoons vanilla extract 2 ½ cups all-purpose flour 1 teaspoon baking powder ¼ teaspoon salt 1 cup milk chocolate chips ½ cup semisweet chocolate chips 2/3 cup toffee baking bits 1 cup chopped pecans
Mediterranean Baked Falafel ¼ cup chopped onion 1 (15 ounce) can garbanzo beans, rinsed and drained ¼ cup chopped fresh parsley 3 cloves garlic, minced 1 teaspoon ground cumin ¼ teaspoon ground coriander ¼ teaspoon salt ¼ teaspoon baking soda 1 tablespoon all-purpose flour 1 egg, beaten 2 teaspoons olive oil
Australian and New Zealander Black Bean and Salsa Soup 2 (15 ounce) cans black beans, drained and rinsed 1 ½ cups vegetable broth 1 cup chunky salsa 1 teaspoon ground cumin 4 tablespoons sour cream 2 tablespoons thinly sliced green onion
Thai Goong Tod Kratiem Prik Thai Prawns Fried with Garlic and White Pepper 8 cloves garlic, chopped, or more to taste 2 tablespoons tapioca flour 2 tablespoons fish sauce 2 tablespoons light soy sauce 1 tablespoon white sugar ½ teaspoon ground white pepper ¼ cup vegetable oil, divided, or as needed 1 pound whole unpeeled prawns, divided
United States Kendras Maid Rite Sandwiches 2 pounds ground beef 1 chopped onion ¾ cup ketchup 2 tablespoons brown sugar 2 tablespoons distilled white vinegar 1 tablespoon Worcestershire sauce 2 teaspoons prepared yellow mustard ½ teaspoon salt 16 hamburger buns, warmed

The next step is to use the tidytext package to process the ingredients list for each cuisine, and use it to determine the most unique ingredients. We first create a new words dataset which filters out stop words, as well as words that are more associated with measurements or cooking parameters rather than actual recipe ingredients.

recipe_words <- recipe_df %>%
    mutate(Ingredients = gsub("[0-9]", "", Ingredients)) %>%
    unnest_tokens(word, Ingredients) %>%
    count(Cuisine, word, sort = TRUE) %>%
    ungroup() %>%
    filter(!(word %in% c("teaspoon", "cup", "ounce", "tablespoons", 
                         "chopped", "teaspoons", "tablespoon", "ground", "fresh", 
                         "can", "sauce", "cups", "plain", "piece", "temperature",
                         "jar", "round", "delicious", "degrees", "minced", "dried",
                         "grated"))) %>%
    anti_join(stop_words)

recipe_words %>%
    sample_n(5) %>%
    kable("html") %>%
    kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Cuisine word n
European lemonade 1
Australian and New Zealander unsalted 1
Korean roast 2
Middle Eastern half 2
Canadian squash 1

This data provides a count of the occurrences of a particular word in a particular cuisine. We can now easily get the top n words for each cuisine like so (In this blog, we’re displaying just Indian and Italian for readability):

recipe_words %>%
    group_by(Cuisine) %>%
    top_n(5) %>%
    arrange(Cuisine, desc(n)) %>%
    filter(Cuisine %in% c("Indian", "Italian")) %>%
    kable("html") %>%
    kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Cuisine word n
Indian salt 30
Indian pepper 26
Indian oil 24
Indian garlic 21
Indian onion 20
Italian cheese 30
Italian pepper 30
Italian salt 28
Italian garlic 21
Italian oil 19

In Indian recipes, salt and pepper are the most commonly occurring ingredient, while in Italian recipes, cheese rises to the top. However, salt, pepper, and cheese are likely common in many cuisines. The real question is what are the most unique ingredients? To determine that, we can use Term Frequency Inverse Document Frequency (TF-IDF) to create a measure of uniqueness. From there, we can plot the top TF-IDF values for each cuisine to visualize the results.

## Create a TF-IDF column
tf_words <- recipe_words %>%
    bind_tf_idf(word, Cuisine, n)

## Plot the top 8 words per cuisine by TF_IDF
tf_words %>%
    arrange(desc(tf_idf)) %>%
    mutate(word = tools::toTitleCase(word)) %>%
    mutate(word = factor(word, levels = rev(unique(word)))) %>% 
    group_by(Cuisine) %>% 
    top_n(8) %>% 
    slice(1:8) %>%
    ungroup %>%
    ggplot(aes(word, tf_idf, fill = Cuisine)) +
        geom_col(show.legend = FALSE) +
        labs(x = NULL, y = "tf-idf") +
        theme_minimal() +
        scale_fill_manual(values = colorRampPalette(ptol_pal()(12))(length(unique(tf_words$Cuisine))),
                      guide = guide_legend(nrow=2)) +
        facet_wrap(~Cuisine, ncol = 3, scales = "free") +
        coord_flip() +
        ylab("Term Frequency - Inverse Document Frequency")

plot of chunk unnamed-chunk-4

Now, unique words rise to the top. We see Masala in Indian cooking, Sesame in Korean cooking, and Garbanzo in African cooking. The best part is these concepts can apply far beyond recipes – Any text analysis can use these ideas to determine unique words across some grouping variable. Look for more blogs on text analysis coming soon which will extend on these ideas.