r - Extract table values from website -
is there way scrap coordinates here?
i know must like:
library(rvest) library(stringi) url <- "http://www.imo.org/en/ourwork/environment/pollutionprevention/airpollution/pages/emission-control-areas-%28ecas%29-designated-under-regulation-13-of-marpol-annex-vi-%28nox-emission-control%29.aspx" page <- html(url) coords <- page %>% html_nodes(".") %>% html_text()
but not sure how find put in html_nodes.
i trying run firebug in order find out it's mess (i don't have experience though web scrap or using firebug).
slightly different approach:
library(sp) library(rvest) library(stringi) library(hrbrthemes) library(tidyverse) target_url <- "http://www.imo.org/en/ourwork/environment/pollutionprevention/airpollution/pages/emission-control-areas-%28ecas%29-designated-under-regulation-13-of-marpol-annex-vi-%28nox-emission-control%29.aspx" pg <- read_html(target_url)
now have page we'll need proper elements, coordinates in format makes hard use we'll convert them go, using helper function:
dms_to_dec <- function(x) { html_text(x) %>% stri_replace_first_regex("ยบ ", "d") %>% stri_replace_first_regex("′ ", "'") %>% stri_replace_first_regex("″", "") %>% stri_replace_all_regex("[ \\.]", "") %>% char2dms() %>% as.numeric.dms() }
now, target each table, pull out individual data elements insanely stored (each) in single <td>
wrapped (each) in <p>
tags. we'll yank them out , make single data frame, using table # column group.
html_nodes(pg, "table.ms-rtetable-default") %>% map_df(~{ data_frame( point = html_nodes(.x, xpath=".//td[1]/p") %>% xml_double(), latitude = html_nodes(.x, xpath=".//td[2]/p") %>% dms_to_dec(), longitude = html_nodes(.x, xpath=".//td[3]/p") %>% dms_to_dec() ) }, .id = "table_num") -> regions
let's take look:
group_by(regions, table_num) %>% summarise(n_points = n()) ## # tibble: 8 x 2 ## table_num n_points ## <chr> <int> ## 1 1 47 ## 2 2 206 ## 3 3 45 ## 4 4 55 ## 5 5 47 ## 6 6 206 ## 7 7 45 ## 8 8 55
and, better "look":
ggplot(regions, aes(longitude, latitude, group=table_num)) + geom_path(aes(color=table_num)) + ggthemes::scale_color_tableau() + coord_map("polyconic") + theme_ipsum_rc(grid="xy")
looks abt right, too:
library(rgdal) usa <- readogr("http://eric.clst.org/wupl/stuff/gz_2010_us_outline_500k.json") usa_map <- fortify(subset(usa, r_statefp != "02" & l_statefp != "02")) ggplot() + geom_map(data=usa_map, map=usa_map, aes(x=long, y=lat, map_id=id), color="#2b2b2b", size=0.15, fill="white") + geom_path(data=regions, aes(x=longitude, y=latitude, group=table_num, color=table_num)) + ggthemes::scale_color_tableau() + coord_map(xlim=c(-180, -47)) + theme_ipsum_rc(grid="xy")
Comments
Post a Comment