An Innovative Data Mining Tool Will Let You Create Your Own Search Engine
The Minerazzi project is a platform that enables you to develop search engines based on a particular topic without the need for programming knowledge. Minerazzi is Dr. Edel Garcia’s brainchild and it aims to enable anyone to develop on-topic and small search indexes. Anyone with any technical background can be involved in learning and data mining through discovery by developing these search indexes.
At first, the Minerazzi project was intended as an indexing project. Later, it was diluted and altered numerous times. Few weeks later, the presentation of the concept of project at SES New York in the year 2012, Dr. Garcia had taken the project out of MIC(Microsoft Innovation Center at Inter American University of Puerto Rico). It had been redesigned later as a self-service search platform. After 1 year, Minerazzi ran on beta on a regular basis with the help of administrators and public librarians.
The users can begin mining phone number, email addresses and various other keywords from the search result pages once an index is developed. The Minerazzi also enables you to determine keyword sets with common features like byte size, number of occurrences, etc.
For businesses, Minerazzi enables a company to develop searchable and small index relevant to any particular set of data. Even a competitor index can be developed rapidly for employees to mine and search with things such as market information, services and products. This kind of topic-specific and special index is ideal for the researchers to search, share and store information.
Minerazzi is relatively easy to operate. Choose your vertical – sports, news etc or use something that is more meaningful such as resources of internal department, local music scene and Minerazzi helps you in indexing and searching documents on that particular topic. Then, Minerazzi crawls the web looking for your documents and adds them to your index when it finds the matches. After this, the data can be searched by co-workers, clients, friends or anyone who has the share access.
To help manage the data that is crawled, Minerazzi utilizes 11 distinct interactive search modes. Few modes make sense such as OR which works for documents that match any specified term and AND which includes all terms in your search. Other search modes such as PROXIMITY, EXCLUSIVE OR, NOR, NOT AND are present which enable you to specify a number and 2 terms in any order of your choice that are separated by no more than the number you select.
Sound is the science behind these modes. By looking at 2 metrics – ration of EXACT/AND search results and AND/OR results offer few major signals. These ratios also offer significant clues about the content and nature of a search engine index.
Minerazzi places the users at the central point of search experience. It enables the users to communicate more with returned data beyond just staring down a catalogue of links and clicking. Using Minerazzi, the users can communicate at query time with the help of search result pages extracting the data that matter to them and performing something with that data.