r - Extract table values from website -


is there way scrap coordinates here?

i know must like:

library(rvest) library(stringi)  url <- "http://www.imo.org/en/ourwork/environment/pollutionprevention/airpollution/pages/emission-control-areas-%28ecas%29-designated-under-regulation-13-of-marpol-annex-vi-%28nox-emission-control%29.aspx" page <- html(url)  coords <- page %>% html_nodes(".") %>% html_text() 

but not sure how find put in html_nodes.

i trying run firebug in order find out it's mess (i don't have experience though web scrap or using firebug).

slightly different approach:

library(sp) library(rvest) library(stringi) library(hrbrthemes) library(tidyverse)  target_url <- "http://www.imo.org/en/ourwork/environment/pollutionprevention/airpollution/pages/emission-control-areas-%28ecas%29-designated-under-regulation-13-of-marpol-annex-vi-%28nox-emission-control%29.aspx"  pg <- read_html(target_url) 

now have page we'll need proper elements, coordinates in format makes hard use we'll convert them go, using helper function:

dms_to_dec <- function(x) {    html_text(x) %>%      stri_replace_first_regex("ยบ ", "d") %>%      stri_replace_first_regex("′ ", "'") %>%      stri_replace_first_regex("″", "") %>%      stri_replace_all_regex("[ \\.]", "") %>%      char2dms() %>%      as.numeric.dms()  } 

now, target each table, pull out individual data elements insanely stored (each) in single <td> wrapped (each) in <p> tags. we'll yank them out , make single data frame, using table # column group.

html_nodes(pg, "table.ms-rtetable-default") %>%    map_df(~{     data_frame(       point = html_nodes(.x, xpath=".//td[1]/p") %>% xml_double(),       latitude = html_nodes(.x, xpath=".//td[2]/p") %>% dms_to_dec(),       longitude = html_nodes(.x, xpath=".//td[3]/p") %>% dms_to_dec()     )   }, .id = "table_num") -> regions  

let's take look:

group_by(regions, table_num) %>%    summarise(n_points = n()) ## # tibble: 8 x 2 ##   table_num n_points ##       <chr>    <int> ## 1         1       47 ## 2         2      206 ## 3         3       45 ## 4         4       55 ## 5         5       47 ## 6         6      206 ## 7         7       45 ## 8         8       55 

and, better "look":

ggplot(regions, aes(longitude, latitude, group=table_num)) +    geom_path(aes(color=table_num)) +   ggthemes::scale_color_tableau() +   coord_map("polyconic") +   theme_ipsum_rc(grid="xy") 

enter image description here

looks abt right, too:

library(rgdal)  usa <- readogr("http://eric.clst.org/wupl/stuff/gz_2010_us_outline_500k.json") usa_map <- fortify(subset(usa, r_statefp != "02" & l_statefp != "02"))  ggplot() +    geom_map(data=usa_map, map=usa_map, aes(x=long, y=lat, map_id=id), color="#2b2b2b", size=0.15, fill="white") +   geom_path(data=regions, aes(x=longitude, y=latitude, group=table_num, color=table_num)) +   ggthemes::scale_color_tableau() +   coord_map(xlim=c(-180, -47)) +   theme_ipsum_rc(grid="xy") 

enter image description here


Comments

Popular posts from this blog

networking - Vagrant-provisioned VirtualBox VM is not reachable from Ubuntu host -

c# - ASP.NET Core - There is already an object named 'AspNetRoles' in the database -

android - IllegalStateException: Cannot call this method while RecyclerView is computing a layout or scrolling -