web scraping - Using an API for web scrapping -
for summer project, trying extract particular information website. upon doing research, came across concept of 'api' perform this. beginner, understand api can used communicate website in question.
the website want scrape information not provide own api. came across great tool called wrap api. tool builds api on top of website. question once have api, how data website? perhaps not understand whole concept of api project. end result want build ui such user can query data he/she requires website.
sorry wordy question. novice @ this, , love learn more. highly appreciated.
i think if want scraping (i mean data html page) can without api or implement api call , return data: that's api concept know (so little things), ask , api returns ting , status code. code (javascript), it's function:
var uni_e = function scrape_events_uni() { //passo l'url da prendere var url = 'http://webmagazine.unitn.it/calendario/ateneo/week'; //invio una richiesta per accedere all'html request(url, function(error, response, html){ if(!error){ //carico l'html var $ = cheerio.load(html); //instanzio variabili utili: lista dei titoli, lista degli urls, lista degli oggetti da inserire nel json var title = []; var urls = []; var json = []; //seleziono gli elementi che contengono il titolo e lo salvo in title $('.titolo-evento').each(function(){ var data = $(this); title.push(data.children().first().text()); }); //seleziono tag <a> solo degli eventi e prendo il parametro href $('.cal-ateneo-visible').each(function(){ var data = $(this); urls.push(data.children().first().attr("href")); }); } else { return false; } //aggiungo tutti gli oggetti alla lista json for(var = 0; < title.length; i++){ //oggetto temporaneo per salvarmi gli elementi come unico oggetto da pushare in json var obj = {title : "", url : ""}; obj.title = title[i]; obj.url = urls[i]; json.push(obj); } //scrivo tutti gli oggetti salvati in un file json fs.writefile('eventi_uni.json', json.stringify(json, null, 2), function(err){}); }); return true;}
i newbee maybe done in easier way. still loaded page's html code cheerio (jquery plugin) , found data selecting html tags jquery, , added them list. needed built json file, think can send data directly.
instead if want data site api call api (if site has one, facebook example) , api should return data. api implemented owner of data. if there no api think you'll have suggested scrape.
sorry if not precises, that's first answer! sorry italian comments too, project
Comments
Post a Comment