Applying Findability to Mine Query Logs for BI: Preliminaries

Thanks for sharing pointers and for giving hints to the question: “Can anyone suggest references about mining query logs for BI and CEM?” (http://www.forum.santini.se/2012/05/mining-query-logs-for-bi-and-cem/). Pls feel free to add comments to the blog post, if more suggestions come to your mind.

The question of this week is: “How can I profitably use query logs for making better business decisions and predict future trends?”

Citing from (Rud, Olivia (2009). Business Intelligence Success Factors: Tools for Aligning Your Business in the Global Economy. Hoboken, N.J: Wiley & Sons. ISBN 978-0-470-39240-9.), Wikipedia states: “Business intelligence (BI) is defined as the ability for an organization to take all its capabilities and convert them into knowledge, ultimately, getting the right information to the right people, at the right time, via the right channel. This produces large amounts of information which can lead to the development of new opportunities for the organization. When these opportunities have been identified and a strategy has been effectively implemented, they can provide an organization with a competitive advantage in the market, and stability in the long run (within its industry)”.

The same article also says: ”BI uses both structured and unstructured data, but the former is easy to search, and the latter contains a large quantity of the information needed for analysis and decision making. Because of the difficulty of properly searching, finding and assessing unstructured or semi-structured data, organizations may not draw upon these vast reservoirs of information, which could influence a particular decision, task or project. This can ultimately lead to poorly-informed decision making”.

In my view, among the overwhelming amount of unstructured texts that businesses produce — such as “e‐mails, memos, notes from call centers and support operations, news, user groups, chats, reports, letters, surveys, white papers, marketing material, research, presentations and Web pages“.” [DM Review Magazine, February 2003 Issue] — query logs appear to be a quite handy textual genre that can be used for BI: each query is a very short text, presumably containing many NOUNS and to a minor extent ADJECTIVES and VERBS. (This is my initial hypothesis. If you know research on linguistic analysis of query logs, pls do not hesitate to point it out to me. )

Noun is traditionally the part of speech that bears most of the semantic meaning and expresses explicitly the user’s information need. The other parts of speech in the query can help disambiguate nouns in order to get a better understanding of the meaning of the whole query.

The main advantage of using a genre such as query logs for BI is the following:

being a short text genre, mostly expressed in “keywordese”, i.e. the kind of sublanguage/jargon we use to communicate with search engines (that is, a language without article, without prepositions, and other stop words, without much syntax or hedges, etc.), query logs are skimmed texts that require no cleaning from redundancies or rhetorical ornaments, and reduced pre-processing.

What about aggregating query log terms and create a facetted search tool for BI? This would imply extracting semantic relations among query terms, quantifying sematically-related queries, and applying other simple text analytic techniques to create contextualized, structured and customized information through facet-ization. The fatet-ization would allow the BI practictioner the flexibility to decide how granular the information should be to become actionable…

You might object: we could use simple search to find “actionable information”. But simple search is unrewarding, because is based on single terms.

”a search is made on the term felony. In a simple search, the term felony is used, and everywhere there is a reference to felony, a hit to an unstructured document is made. But a simple search is crude. It does not find references to crime, arson, murder, embezzlement, vehicular homicide, and such, even though these crimes are types of felonies” [Source: Inmon, B. & A. Nesavich, "Unstructured Textual Data in the Organization" from "Managing Unstructured data in the organization", Prentice Hall 2008, pp. 1–13]

You might also object: we could use Splunk or Sematext or… but how flexible and granular are they? Would be nice to make a comparison [to be continued].

Latest Images

Trending Articles

Latest Images