REINA at WebCLEF 2006: Mixing Fields to Improve Retrieval

Main Authors: G.-Figuerola, Carlos, Zazo, Ángel F., Alonso-Berrocal, José-Luis, Rodríguez-Vázquez-de-Aldana, Emilio
Other Authors: Nardi, A., Peter, C., Vicedo, J.L.
Format: BookSection NonPeerReviewed application/pdf
Bahasa: eng
Terbitan: , 2006
Subjects:
Online Access: http://eprints.rclis.org/13966/1/figuerola2006reina.pdf
http://eprints.rclis.org/13966/
Daftar Isi:
  • This paper describes the participation of the REINA Research Group of the University of Salamanca at WebCLEF 2006. The task in that we have participated this year is the Monolingual Mixed Task in Spanish. To select web pages of the EuroGov collection in Spanish, the wide collection was processed with a language guesser, searching for pages in Spanish. All pages in the .es domain were also pre-selected. Our focus, this year, is to test pre-retrieval ways of mixing elds or elements of information in web pages, as well as to test the retrieval capacity of these elds. Mixing terms from several sources in a only index can be achieved, in retrieval systems based on the vector space model, operating on the term frequency in the document, if we use a tf * idf schemaof weigthing. BODY eld is, by the way, the most powerfull from the point of view of retrieval, but ANCHORS of backlinks add a considerable improvement. META elds, nevertheless, contribute little to the improvement in retrieval.