Percolation critical exponents: Difference between revisions

From formulasearchengine
Jump to navigation Jump to search
en>Anrnusna
No edit summary
en>Rjwilmsi
m →‎Exponents for standard percolation: Journal cites, added 1 DOI, added 1 Bibcode using AWB (10499)
 
(One intermediate revision by one other user not shown)
Line 1: Line 1:
{{context|date=June 2012}}
Hello! My name is Nestor. <br>It is a little about myself: I live in France, my city of Annemasse. <br>It's called often Eastern or cultural capital of . I've married 1 years ago.<br>I have 2 children - a son (Janice) and the daughter (Leon). We all like Association football.<br><br><br>http://brokerltd.com - ad network<br><br>my homepage; [http://brokerltd.com/pub_cpm_network.html cost per mil]
The '''Binary Independence Model''' (BIM)<ref name="cyu76" /><ref name="jones77"/> is a probabilistic information retrieval technique that makes some simple assumptions to make the estimation of document/query similarity probability feasible.
 
==Definitions==
The Binary Independence Assumption is that documents are binary vectors. That is, only the presence or absence of terms in documents are recorded. Terms are [[independence (probability theory)|independently]] distributed in the set of relevant documents and they are also independently distributed in the set of irrelevant documents.
The representation is an ordered set of Boolean variables. That is, the representation of a document or query is a vector with one Boolean element for each term under consideration. More specifically, a document is represented by a vector ''d = (x<sub>1</sub>, ..., x<sub>m</sub>)'' where ''x<sub>t</sub>=1'' if term ''t'' is present in the document ''d'' and ''x<sub>t</sub>=0'' if it's not. Many documents can have the same vector representation with this simplification. Queries are represented in a similar way.
"Independence" signifies that terms in the document are considered independently from each other and  no association between terms is modeled. This assumption is very limiting, but it has been shown that it gives good enough results for many situations. This independence is the "naive" assumption of a [[Naive Bayes classifier]], where properties that imply each other are nonetheless treated as independent for the sake of simplicity. This assumption allows the representation to be treated as an instance of a [[Vector space model]] by considering each term as a value of 0 or 1 along a dimension orthogonal to the dimensions used for the other terms.
 
The probability ''P(R|d,q)'' that a document is relevant derives from the probability of relevance of the terms vector of that document ''P(R|x,q)''. By using the Bayes rule we get:
 
<math>P(R|x,q) = \frac{P(x|R,q)*P(R|q)}{P(x|q)}</math>
 
where ''P(x|R=1,q)'' and ''P(x|R=0,q)'' are the probabilities of retrieving a relevant or nonrelevant document, respectively. If so, then that document's representation is ''x''.
The exact probabilities can not be known beforehand, so use estimates from statistics about the collection of documents must be used.
 
''P(R=1|q)'' and ''P(R=0|q)'' indicate the previous probability of retrieving a relevant or nonrelevant document respectively for a query ''q''. If, for instance, we knew the percentage of relevant documents in the collection, then we could use it to estimate these probabilities.
Since a document is either relevant or nonrelevant to a query we have that:
 
<math>P(R=1|x,q) + P(R=0|x,q) = 1</math>
 
=== Query Terms Weighting ===
Given a binary query and the [[dot product]] as the similarity function
between a document and a query, the problem is to assign weights to the
terms in the query such that the retrieval effectiveness will be high. Let <math>p_i</math> and <math>q_i</math> be the probability that a relevant document and an irrelevant
document has the <math>i^{th}</math> term respectively. Yu and Salton,<ref name="cyu76" /> who first introduce BIM, propose that the weight of the <math>i^{th}</math>
term is an increasing function of <math>Y_i =  \frac{p_i *(1-q_i)}{(1-p_i)*q_i}</math>. Thus, if <math>Y_i</math> is higher than <math>Y_j</math>, the weight
of term <math>i</math> will be higher than that of term <math>j</math>. Yu and Salton<ref name="cyu76" /> showed that such a weight assignment to query terms yields better retrieval effectiveness than if query terms are equally weighted. Robertson and Sparck<ref name="jones77"/> later showed that if the <math>i^{th}</math> term is assigned the weight of <math>log Y_i</math>, then optimal retrieval effectiveness is obtained under the Binary Independence Assumption.
 
The Binary Independence Model was introduced by Yu and Salton.<ref name="cyu76" /> The name Binary Independence Model was coined by Robertson and Sparck.<ref name="jones77"/>
 
==Recent work==
Wu et al.<ref>{{citation | url=http://doi.acm.org/10.1145/1076034.1076179 | title=A retrospective study of probabilistic context-based retrieval | year=2005 | author=H. C. Wu | coauthors=R. W. P. Luk, K. F. Wong, K. L. Kwok, W. J. Li}}</ref> proposed a new probabilistic retrieval model which weights terms according to their contexts in documents. The term weighting function they used is similar to the [[language model]] and the binary independence model.
 
Roelleke et al.<ref>{{citation | title=Probabilistic Logical Modelling of the Binary Independence Retrieval model | url=http://www.dcs.qmul.ac.uk/~thor/papers/2007/Roelleke&Wang:ICTIR:2007.pdf | author=Thomas Roelleke | coauthors=Jun Wang | year=2007}}</ref> investigates the probabilistic relational implementations of BIM under the use of probabilistic relational algebra for the integration of Information Retrieval in databases.
This work surges as a result of the interest shown by Surajit et al.<ref>{{citation | author=Surajit Chaudhuri | coauthors=Gautam Das, Vagelis Hristidis, Gerhard Weikum | title=Probabilistic information retrieval approach for ranking of database query results | url=http://doi.acm.org/10.1145/1166074.1166085 | year=2006}}</ref> in applying the knowledge of probabilistic models for information retrieval in structured data, to investigate the problem of ranking the answers to a database query when many tuples are returned.
 
Zhao and Callan (2010)<ref>Zhao, L. and Callan, J., Term Necessity Prediction, Proceedings of the 19th ACM Conference on Information and Knowledge Management (CIKM 2010). Toronto, Canada, 2010.</ref> showed a connection between <math>p_i</math>, the probability of a term appearing in relevant documents, and the [[vocabulary mismatch]] problem.  This leads to effective ways of predicting the <math>p_i</math> probability for test queries without relevance judgments using training queries with known relevance judgments.  Term weighting using this probability could potentially lead to 50-80% gain in retrieval performance over stronger baseline retrieval models such as Language Modeling with perfect estimates of the probability.  Further developments following this line of idea points at the Boolean Conjunctive Normal Form expansion technique which could lead to 50-300% gain over unexpanded keyword retrieval.<ref>Zhao, L. and Callan, J., Automatic term mismatch diagnosis for selective query expansion, SIGIR 2012.</ref>
 
== See also ==
 
* [[Bag of words model]]
 
==Further reading==
* {{citation | url=http://nlp.stanford.edu/IR-book/html/htmledition/irbook.html | title=Introduction to Information Retrieval | author=Christopher D. Manning | coauthors=Prabhakar Raghavan & Hinrich Schütze | publisher=Cambridge University Press | year=2008}}
* {{citation | url=http://www.ir.uwaterloo.ca/book/ | title=Information Retrieval: Implementing and Evaluating Search Engines | author=Stefan B&uuml;ttcher | coauthors=Charles L. A. Clarke & Gordon V. Cormack | publisher=MIT Press | year=2010}}
 
==References==
{{Reflist|refs=
<ref name="cyu76">Clement T. Yu, Gerard Salton: Precision Weighting - An Effective Automatic Indexing Method. J. ACM 23(1): 76-88 (1976)</ref>
<ref name="jones77">S.E. Robertson and Sparck Jones, K.: Relevance weighting of search terms. Journal of the American Society for Information Science, 27: 129–146 (1976)</ref>
}}
 
[[Category:Information retrieval]]
[[Category:Probabilistic models]]

Latest revision as of 12:52, 23 November 2014

Hello! My name is Nestor.
It is a little about myself: I live in France, my city of Annemasse.
It's called often Eastern or cultural capital of . I've married 1 years ago.
I have 2 children - a son (Janice) and the daughter (Leon). We all like Association football.


http://brokerltd.com - ad network

my homepage; cost per mil