Introduction
With the rapid development of the internet and the fast increase of the Web information presently, it is difficult for the users to search their required information from the tremendous information resource, as finding a needle in a haystack. Fortunately the problem could be resolved by search engine technology. However, because traditional text retrieval system has significant limitations on the search of mathematical formulas and mathematical symbols, it cannot satisfy the user demand for mathematical formulas search in many aspects such as science, mathematics, engineering and technology and so on. Meanwhile, with the enhancement of computer storage on mathematical Content, as well as the deepening support of a variety of browsers in the form of mathematics, it is feasible to carry out the research on the search engine about mathematical formula.
MathSearch mainly focuses on the two important and difficult problems on mathematical formula search. In other words, MathSearch pays great attention to how to establish a common, powerful query language of mathematics and how to construct a mathematical Content index structure which is easy to be stored and inquired.
With respect to establishment of mathematical query language, MathSearch proposes a mathematical query language (Math Query Language, MQL) which is based on extension of XML and accords with the MathML specification. The query language implements the wildcard query expression and the combination query expression by defining a series of metadata labels which is based on MathML specification. These labels have their attributes which can be used to refine inquiry description and enhance the effective of query expression.
As regards construction of the mathematical Content index, in order to support simultaneously the Presentation query and semantic query of mathematical formula, MathSearch establishes both the Contentbased Index and the Presentationbased Index for mathematical Content. The Contentbased Index mainly uses the abstracttree inverted index structure, while Presentationbased Index mainly uses linear Ngrams inverted index structure. In addition, the paper also describes the weight evaluation method for each Subformula during the index establishment of a formula. The method can be used to optimize query results and improves recall ratio and correlation of the search engine.
Requirement
To satisfy the user's need of the search of the mathematical formula and mathematical symbol in the science ,mathematics , projects and the technology , various mathematical search engines come into being . MathSearch , which is based on formula ,is one of the network mathematical search engines that can retrive mathematical content .
Field of Rearch
MathSearch mainly studies the mathematical search data source between the limits and the transformation of the turn of expression , the mathematical formula query language , the mathematical formula index , the performance and the qualiaty of mathematical search system , and the return demostration of the search result .
Main components
The MathSearch server, which has provided the formidable support,is consisted of three parts :Network server, index server and search server.
Search target format
The MathSearch search mathematical formula's main search object formats are the MathML standard type, the OpenMath form, the Infix form and the LaTex form .
Specific Search
MathSearch supports the concrete queries including the structure queries , the semantics queries , the wildcard character queries , the combination queries and the abstract queries .
Search content
MathSearch can search for the homepage , the documents , the material which are containing of the mathematical formula and the mathematical symbol.
