full text search - Is it possible to obtain, alter and replace the tfidf document representations in Lucene? -
hej guys,
i'm working on ranking related research. index collection of documents lucene, take tfidf representations (of each document) generates, alter them, put them place , observe how ranking on fixed set of queries changes accordingly.
is there non-hacky way this?
your question vague have clear answer, esp. on plan :
take tfidf representations (of each document) generates, alter them
lucene stores raw values scoring :
- collectionstatistics
- termstatistics
- per term/doc pair stats : postingsenum
- per field/doc pair : norms
all data managed lucene , used compute score given query term. custom similarity class can used change formula generates score.
but have consider search query made of multiple terms, , way scores of individual terms combined can changed well. use existing query classes (e.g. booleanquery, disjunctionmax) write own.
so depends on want of note if want change raw values stored lucene going rather hard. you'll have write custom lucene codec , query stack take benefit of new data.
one nice thing should consider possibility store arbitrary byte[] payloads. way store value have been computed outside of lucene , use in custom similarity or query. please see following tutorials: getting started payloads , custom scoring lucene payloads may give ideas.
Comments
Post a Comment