With more and more unstructured information not only on the public internet but also in the enterprise the need to manage this information and extract knowledge from it is increasingly in demand in commercial enterprises. There are lots of reasons why this is of interest including improving customer satisfaction by identifying customer concerns from customer conversations, equipment failures from field service technician notes, and even identifying fraud from bogus emails. In addition there is a need to link unstructured information to master data. An example here would be relating documents such as invoices to customers or emails with structured customer data. Also emails on product quality issues need to be related to product master data for example. Call logs in call centres is yet another example.
Business thirst for new information and to extract additional knowledge for greater business insight is still growing at a pace. From this the business need is to be able to integrate and analyse structured and unstructured information and so expand beyond traditional BI systems focussed on structured data into the world of unstructured analytics. This market is massive and it is clear that traditional BI vendors have still not fully switched on to the value of this market and so we see a lot of fairly unknown vendors (at least unknown to the traditional BI professional) pushing into the BI market and claiming market share. Vendors such as Attensity, Clarabridge, ClearForest, Endeca, Fast and InXight are all doing well in this space. All these vendors are in the world of Enterprise Search and text analytics. It is no surprise that BI vendors are starting to introduce partnerships with these relatively new kids on the BI block and you could easily see some acquisitions possibly occurring here in the next 12-18 months as large BI vendors expand their platforms to gear up for the sea of unstructured content they may be asked to analyse.
Text mining and search analytics are just a few examples which will make it easier to find information and to generate dynamic taxonomies from Search results that can then be used to rapidly zoom in on what you are looking for. It is also true that text analytics can generate XML documents. So for example if a customer writes an email along the lines of “My name is Mike Ferguson and yesterday I bought a new Panasonic HD TV from ABC Electronics in Manchester……” I can extract from this a customer name, a manufacturer name, a product name and a store name which could be brought together in an XML document about a sales transaction. Therefore customer detail could be taken from this and deeper insight extracted from content like emails, blogs, wikis, web chat, …..etc. Once this is extracted I can then start to analyse it with traditional BI tools and visualise new information and analyses. This extraction of terms from unstructured content can facilitate a richer more productive multi-faceted search experience which to any BI professional looks like OLAP for search. Here the user can drill down and slice and dice search results any way they like to help navigate quickly to the information they want.
With social network tagging also exploding on the scene, we can also analyze popular tags and help people identify dominant ways in which people are categorising relevant information that may be of interest to them.
This is a new area for BI and I expect it to grow very rapidly over the next year. I shall be writing an article on this in the next few months on the B-EYE-Network.
If you are already doing work in this area, please share your experiences. It would be great to hear some case studies.