Posts Tagged ‘Mike Ferguson’

Cloud Based BI – Understanding The Options Is the Biggest Barrier

Tuesday, June 22nd, 2010

Last week I was in Munich to present at the annual TDWI (The Data Warehouse Institute) conference on “Business Intelligence and Data Management in a Cloud Computing Environment”.  It was a very well attended conference with some great speakers and sessions.  My session focused on the following:

  • What is Cloud Computing and why use it as a deployment option?
  • Why Cloud BI? – What are the requirements for a public cloud or externally hosted BI system?
  • Understanding what is on offer – The Cloud BI Marketplace
  • Getting data into a cloud based BI system
  • Managing access to cloud based BI systems and analytic applications
  • Integrating cloud based BI systems with on-premise systems
  • Pros and cons of deploying on the cloud?
  • Getting started with Cloud based BI

Bear in mind that both public cloud and private cloud based BI were under discussion even though the hype seems all around public cloud or externally hosted BI systems.   Looking at these points it is the third bullet down that for me is the clear inhibitor to cloud based BI adoption.In other words the lack of understanding as to what exactly is on offer.  And there is a lot on offer. On the public cloud we have everything from plain Infrastructure as a Service (IaaS)  all the way through to Software as a Service (SaaS) based packaged analytic applications. On the private cloud several BI platforms are already running on virtualisation software such as VMware and/or Microsoft Hyper-V.  However there seems very little in the way of best practice advice on do’s and don’ts when it comes to deploying BI systems on a private cloud based virtualised environment.

In total I came up with 6 options, the last of which is simply where many of us are today i.e. BI systems not deployed on a cloud (whether it be public or private).  The options are as follows:

1.     Public cloud based IaaS for a BI system

2.     Public cloud or externally hosted BI/DW PaaS for building your own cloud-based BI system

  • Multi-vendor or single-vendor BI PaaS offerings

3.     Public cloud or externally hosted SaaS BI packaged analytical applications

4.     Public cloud or externally hosted SaaS BI for operational reporting on cloud based operational data

5.     Private cloud based BI system running internally

6.     Dedicated hardware based BI system (this is what most companies have today)

Option 1 is simply subscribing to an IaaS vendor like Savvis, Amazon, Rackspace or GoGrid  where you pay as you use on hardware and systems software and then buying and deploying your own ETL, DBMS and BI software (assuming they have no restrictions on what they will support).  I am not sure that this is attractive enough on its own without a BI/DW Platform as a Service (PaaS) as well.

Option 2 is the BI/DW Platform as a Service (PaaS) option on public cloud or even externally hosted.  Here you find another choice however. Should you choose a multi-vendor DW/BI PaaS or a single-vendor offering.  An example of a multi-vendor option is the RightScale/Talend/Vertica/Jaspersoft PaaS offering on Amazon EC2.  A single vendor PaaS offering (of which there are several on offer) would be GoodData, or SAP BusinessObjects On-Demand. Others include Birst, Indicee and PivotLink.  A key question here is going to be “Is Data Integration included?”  Clearly in the multi-vendor offering mentioned there is an ETL solution such as Talend in the above example.   Data integration is very much file based with BI/DW PaaS vendors i.e. you upload files of data and then there is some processing of that data to load it into the PaaS DW/BI database.  Several single-vendor PaaS offerings give you only fairly lightweight data integration once data is uploaded.  Certainly not full blown ETL with built-in data quality that you might be used to in a data centre. In fact if you are looking for full blown DQ you are going to be disappointed in most cases.  The ‘get out’ clause is you can add your own script but what about metadata lineage and auditability once the script writer has left for a better job?  A vendor like SAP (mentioned earlier) does have ETL (SAP BuisnessObjects Data Integrator) available but only if you subscrible to their Advanced Edition of SAP BusinessObjects On-Demand (there are 3 editions on offer).  I was even more surprised to see that SAPs BI/DW PaaS offering uses Microsoft SQL Server as the database and not BW.  I would expect that to change to SybaseIQ fairly soon. GoodData on the other hand have refreshingly recognised that you may want to go beyond the data integration you get out-of-the-box on subscription and have gone the extra mile to provide pre-built integration with cloud based data integration tools such as Informatica Cloud, SnapLogic and Boomi. Therefore you can use these tools to integrate your data before passing the data sets to them. The alternative to all of this is to do the lions share of the data integration in-house before uploading data files.

Option 3 is a fast growing market with many relatively new vendors (e.g. Cloud9 Analytics, Rosslyn Analytics, Lixto) as well as traditional mainstream vendors e.g. SAS, IBM Cognos.  The attraction here is a pre-built solution ready to go. These will clearly appeal to small and medium size businesses (SMBs) and even lines of business in some large organisations.  While we see horizontal applications looking at Salesforce.com data, spend analysis and pricing (to name a few), I am predicting that vertical analytic apps on the cloud will appear.

Option 4 is simply using a cloud based reporting system on operational data typically from a cloud based transaction processing system such as Salesforce.com.  In fact it would seem that Salesforce.com is dominating this space. An example here is SAP BusinessObjects CrystalReports.com for Salesforce.

Option 5 is private cloud based BI systems. The largest private cloud based BI system I know of is IBM’s internal Blue Insight which is based on IBM System Z and IBM Cognos 8 BI.  An estimated 200000 IBMers are using this.  IBM have since launched the Smart Analytics Cloud, a private cloud offering for large enterprises based on the same technologies.  However it is still early days for BI deployments on internal private clouds. There appears to be more support coming from developer forums than vendors at present.  From what I can see, companies are taking a ‘toe in the water’ approach to deploying on virtualized environments. No doubt, confidence will grow over time.  However does everything need to move to private cloud? Many companies with very large EDW initiatives may be reluctant to move to private clouds until they prove their scalability and lower TCO.   This issue here is should ETL, DBMS and BI platform all be on the same virtual servers? Should each have their own virtual server configuration? What is that configuration? Can I adjust it? etc. etc. I don’t think there will be a mad rush to put a 100TB DW on virtual servers.  I do like the fact that vendors like Microstrategy have given this some serious consideration and have released a private cloud enterprise edition of Microstrategy 9.  MicroStrategy components are packaged as Virtual Appliances and tuned for expected load. These Virtual Appliances contain fully configured software components and the number of running virtual appliances can be adjusted to accommodate specific performance goals. This is a damn sight better than just saying to a customer “it’s up to you, just deploy it and you figure out the virtual server configuration”  What Microstrategy have done is to allow you to adjust the underlying assigned physical resources to satisfy performance demands and have made available administrative facilities to control virtualized MicroStrategy environment.

It is early days in Cloud based BI. I recommend looking at your requirements and then match the options available to your needs

I would be interested if any of you have experiences in this area. Do’s and Don’ts. What works, what doesn’t.  Please share them by placing your comments.

Follow me on twitter

Chasm Not Crossed as A Sensor Data Tsunami Comes Over The Horizon

Monday, June 21st, 2010

Just over a week ago I spent a day at SensorExpo in Chicago to present on Complex Event Processing (CEP) discussing how CEP engines, Predictive Analytics, business rules can be used to analyse sensor emitted event data in-motion to facilitate business optimisation.  This was a very busy conference.  I estimated at least 2000-3000 people on the exhibition floor with maybe 400 on the conference.  I found around 100 vendors with all kinds of sensor devices on show exhibiting their products and services.  To my surprise however I had only heard of 2 of the vendors. IBM and Texas Instruments.  The floor was heaving with people looking to instrument their business operations to measure everything from movement, temperature, energy consumption, stress, heat, fluid volumes, pipeline flows and RFIDs.  There were analog devices and digital devices.  When taking to the vendors the big common denominator was that they are all trying to collect the data from sensor networks and RFIDs to analyse it.  Yet other than IBM there was not a single BI vendor in sight. Not even a single complex event processing (CEP) vendor in sight.   I was shocked because this market is clearly booming.   What was even more surprising was that I could not find an IT professional anywhere. 99.9% of all delegates and speakers were engineers.

Attending some of the case studies I found some fantastic applications of the use of sensor networks and RFIDs.  Healthcare with sensors all over hospitals with equipment and patients all tagged with RFIDs.  The return on investment in this case was fraud prevention on equipment (theft mainly) and process improvement for patients.  Another session I attended was one on monitoring stress in all the bridges in the US – over 700000 of them.  Some of the stats being quoted by the speakers were staggering.  “Well we are emitting, 3 events per minute from every sensor on a 7×24 hour basis. After 6 months operating like this we have over 20 PETABYTES of data”.  You read it right 20 PETABYTES.   A lot of the technical focus at the conference was on energy harvesting to prolong sensor battery life,  but the business message was clear as a bell.  Process optimisation, preventative maintenance and cost reduction comes from instrumenting business operations.  Manufacturing production lines, supply chains, product distribution, asset management.  You name it, they’re measuring it.

So I have to ask, where are all the BI vendors? Where are all the analytical DBMS vendors? Where are the CEP products, the real-time dashboards and predictive analytical models for automated analysis?  This is an operational BI gold mine.  Yet there are no mainstream vendors in sight bar IBM (at least someone there is switched on to what is happening).  The volume of data coming over the horizon from the adoption of sensor networks and RFIDs is nothing short of massive.  What is also clear is that this is already going on in enterprises and IT are blissfully unaware of it in the main.  Clearly IT BI professionals have got to get in touch with their Engineering colleagues and engineers have got to be made aware of mainstream data integration, analytical database and BI platform technologies as well as CEP software of course.  In my 29 years in the industry, I don’t think I have ever seen a chasm between IT and business not even explored never mind crossed.  Yet the value of CEP and mainstream DW/BI to this market is nothing short of enormous.   It is symptomatic of a young market heaving with engineers that has yet to be tied into mainstream IT to exploit far more robust software than is being used on this data at present.  What an opportunity. What a huge opportunity.  It most certainly is going to re-define large databases when we have to set them up for analysis of historical event data emitted by these devices.  CEP has to go there. CEP vendors have to get out of just being in the financial markets and wake up to a ton of data in motion being emitted by the growing number of devices.  An article I read recently said that Sensors empower an Internet of Things.  Well, those things are coming over the horizon emitting a Tsunami of data. It is time CEP and DW/BI vendors woke up an smelt the coffee and became aware of this rapidly growing market.  CIOs had better take heed too because they are going to have to integrate it into mainstream IT.

Microsoft Opens Up Collaborative and Self-Service BI

Monday, June 21st, 2010

Just over a week ago I was invited to attend an analyst briefing at the Microsoft BI conference in New Orleans that was running alongside the Microsoft TechEd conference.  The conference itself was very well attended with several thousand delegates.  Several things were on show at this event including SharePoint 2010, SQL Server 2008 R2, Office 2010,  PowerPivot, PerformancePoint services 2010. Also on show was  SQL Server Data Warehousing Edition (also known as the Madison project) – the massively parallel edition of SQLServer that will be shipped later this year.

The one thing that stood out for me was the seismic shift towards collaborative BI.   As my friend Colin White so aptly put it in the analyst briefing, “Microsoft have brought BI to collaboration rather than collaboration to BI”.  This is an important point because what it is says is that there is little point adding collaborative features to a BI platform if these are not the services associated with a mainstream collaborative platform.  There is far more value in integrating a BI platform with the company collaboration software to tap into things like collaborative workspaces, presence awareness, unified communication, shared calendar etc. etc.  In Microsoft”s case this is of course the SharePoint product which has become viral in most organisations.

It is no surprise therefore that Microsoft’s BI initiative is built around 3 main components and not just SQL Server.  These are:

  • Office,
  • SharePoint
  • Microsoft SQL Server 2008 R2

Note that SQL Server 2008 R2 includes StreamInsight, Microsoft’s complex event processing (CEP) engine and Microsoft Master Data Services

While there we were take through an excellent demo to show the power of collaboration and what it can do when integrated with BI.  It even included the Microsoft Round Table device which although it has been available for some four years, was the first time I have actually encountered one.

What the demo showed me was the speed with which BI and BI ‘components’ can be spread among a community of users. My conclusion is that integration of SQL Server 2008 R2 with Sharepoint 2010 takes this to another level in that the rate that business intelligence can be shared it is almost ‘twitter speed’.  For those of you using twitter, you will know that as soon as something of interest breaks, re-tweets can spread it across masses of people in a matter of minutes.  This is the feeling I got during the demo.  It fuels mass sharing, mass reuse and mass development of BI applications and artifacts.  In particular reports and dashboards. It certainly fits with Microsoft”s vision of BI for everyone.

Several new features open up the flood gates for collaborative BI to share intelligence with other without the need for IT. For example,

BI reports can be managed by Sharepoint in document libraries. You can also preview reports before opening them up.

Also Microsoft is fueling development by business users on the back of what power users have done, thereby bypassing IT.  This is because there is now a capability whereby Microsoft ReportBuilder 3.0 can access PowerPivot workflows uploaded to SharePoint sites.  You can also export to Excel from PowerPivot.  Power users using PowerPivot (originally referred to as Gemini), can take data from different data sources (including newly supported Atom feeds), merge and join that data. Relationships between tables can be managed inside of PowerPivot.  PowerPivot power users can then create workflows that process this data and can upload these to Sharepoint sites.  ReportBuilder 3.0 (or any BI client) can then treat the PowerPivot workflow as a data source.  Not only that but ReportBuilder can create report parts which are sharable in a report part gallery do that other users can reuse them by simply dragging an dropping the report parts onto a new report for rapid development without having to know the detail underneath.

Hopefully by now you have got the picture – power users building their own workflows in PowerPivot, publishing them to SharePoint, other users using them as data sources in reports, report parts created, and a gallery of parts to be shared across a community of users.  Powerful stuff, and we are not done yet.

In Sharepoint 2010 there is a new site template called Business Intelligence Center.  What you can now do is create a new site in SharePoint using the Business Intelligence Center template. This template includes chart web parts and Excel services workbook access. It also includes a PerformancePoint library so that you can start building your dashboard very rapidly including access to reports and report parts. With is mechanism, Microsoft is opening up dashboard development to the masses and also allowing ‘social’ performance management whereby dashboards and/or dashboard components can be rated.  All this integrated with SharePoint and Office is in my opinion going to take self-service BI development to another level that it could easily have a ‘popcorn effect’ with masses of BI being produced rapidly and IT nowhere in sight.  There is no doubt that it opens up the flood gates for business innovation and sharing.  Personalised dashboard development using PerformancePoint Services 2010 integrated with SharePoint 2010.

A Question of Governance?

My only concern with this is the issue of governance.  What Microsoft have done is to put mass development in the hands of the business.  If you think upi have seen anythng on self-service BI, just wait until SharePoint 2010, Office 2010 and SQL Server 2008 R2 move into production in your shop. You ain’t seen nothing yet.

However I see very little with respect to data governance. What about business glossaries? What about metadata lineage?  In a world of increasing regulation and legislation to prevent corporate catastrophes, can anything be audited? Can it be tracked back to where the data come from? How has the data been transformed by the power users? iWhat does the data mean?  I have as yet seen little from Microsoft in the form of metadata management and data governance despite the fact that Master Data Services is also delivered as part of this SQL Server release.  While there is no doubt that this is coming (confirmed by the Microsoft guys I spoke with on the exhibition floor booth) my only fear is will be too late.  Will the horses have already bolted with self-service BI unstoppable and off down a track without lineage to help users know that the data is trusted.

Equally, scorecard and dashboard development is bottom up. Everyone (with authority) can create their own scorecards and dashboards rapidly but there appears to be no framework whereby these can be slotted into a multi-level  strategy management unlike say SAP with SAP Strategy Management.  So what is the answer? Is it all bets are off and just let the business figure out the best way to manage on the back of socially rated scorecards and dashboards?  What happened to business strategy?  Many companies set a strategy at executive level and want enterprise wide business strategy execution.   This latter approach is top-down.  What Microsoft is fueling is bottom up.  My opinion is we need both and not one or the other.

Freedom Versus Governance – A Delicate Balancing Act

It is pretty clear then that setting aside the new SQL Server Data Warehousing Edition, this is very much a Collaborative BI release by Microsoft.  It is a major leap forward in what the business users can do for themselves.  We have two forces at work here.  Freedom versus governance.  We have to get the balance right.  Too much freedom and we could have chaos with no ability to audit what has been done or whether the BI is trusted. Too much governance and we put innovation in a straight jacket or kill it altogether.   All I would say is that IT had better get a data governance program underway soon to control data all the way out to data marts and cubes. If that is done then there is no doubt that the business can be empowered to innovate which is what should happen. Without a data governance program however, I think it is really going to be hard to get alignment with what the business is doing given the sheer speed of development that is now possible with this release.  Let’s hope governance, innovation and collaboration are a winning combination.

MDM and Cloud Computing

Monday, September 28th, 2009

Having read David Linthicum’s blog on MDM and Cloud computing about the impact on data of applications moving off premise, I have to say that I couldn’t agree more with him. What David is pointing out is that the fracturing of data caused by the adoption of cloud computing raises the importance of MDM in keeping disparate data synchronised.

This brings back memories of Business Process Outsourcing adoption several years back and what it did to companies that had no business process integration in place before they outsourced some process activities. The result of that strategy was that it fractured processes even more in many cases and sent some of the data outside the enterprise making it more difficult to get at.  As applications go off premise there is a real danger MDM could get out of reach. It requires MDM to start to get implemented to get control over data. SalesForce.com data is already coming inside the enterprise via ETL tools into DWs. Several ETL vendors support this. I just don’t think that there has been many bringing it back in to populate MDM. Siperian has some case studies of their MDM customer working with cloud applications – in particular SalesForce.com. What it does say, is that pursuing a cloud computing strategy on external cloud based virtualized servers without a data governance strategy, could very well wreak havoc on any enterprise.

With virtualization being high on the agenda of many CIOs, I would suggest that they should also keep an eye on risk management and compliance otherwise they could well cause make it harder to achieve trusted data. Without MDM, a cloud computing deployment strategy certainly puts an Enterprise Data Quality Firewall and data integration services high up the agenda priority list!

Enterprise Data Governance – Cheers Arthur!

Friday, September 25th, 2009

First of all let me apologise to all my readers for not having blogged for a while. This year has turned out to be manic – crazily busy. I also confess to having become addicted to twitter – a “tweetaholic” where I have been micro-blogging. If you want to see my tweets you can do so here.  So I return to my blog the day after “Arthur Day” – 250 years ago yesterday a certain young Irishman named Arthur Guinness started a beer making company in Dublin.

My topic today is that exciting topic of Enterprise Data Governance.  From research I did in a survey it was clear that many companies at the end of 2008 were not fully underway with Enterprise Data Governance in terms of getting their data under control and into a trusted, well managed state. Many had more to do in terms of organising themselves together with getting the necessary technology and processes in place to do this.  But the question I get asked the most is how do you know how well or poorly your company is governing its data? There are a few questions you can ask that will give you a good inkling.  These are as follows:

  • Do you know what data exists in your enterprise?
  • Do you have an inventory of data items in use?
  • How many names have you got for the same data item?
  • How many metrics with same name but with different formulae?
  • Do your Excel metrics formulae, DBMS metrics formulae, BI tool metrics formulae, ETL tool calculations, … all agree?

If the answer is no to any of these questions, what chance do you stand of remaining compliant or of trusting your data? If you don’t know how many different variations of a data item exist in your enterprise how can you govern your data? Some other questions to ask here from a business perspective are:

  • How many times do your core processes break because of dirty data?
  • Have your company ever messed up an order and angered a customer because of dirty data?
  • In terms of compliance, do you trust your data enough to tell it to a judge?

In my opinion what companies actually need is an interactive data map so that you can press a button and see where your customer data is or where your order’s data is.  In order to be able to do this you need to have common data definitions for your customer data attributes and for your order data attributes etc. In fact you need to have a common set of enterprise wide definitions for all core entity data, transaction data and metrics.

Having established this, the next step is to discover where you data actually is. Therefore Data Discovery technology (albeit a new area in data management) is critical help do this. In fact I would go as far as to say that without data discovery technology it is very difficult to get data under control. Increasingly therefore we are seeing vendors acquire or build this kind of software.  Once you data is located you need to map disparate data definitions for the same data to common enterprise wide definitions to be able to see where data is. Physical column names, data models, BI tool semantic layers, reports, SPREADSHEETS, files, XML schema, Access databases… If you can’t tie all these to the same corporate definitions how to you govern data?  This is where data dictionaries/ business glossaries are key.  Lineage matter.

Ultimately the objective is to get consistency across the enterprise. You have to unravel your spaghetti ball. This means you need to get organised correctly, get the right technologies in place and get the right processes in place for enterprise data governance.

Master data is also part of the program. You need to find out where your master data is maintained. How is it synchronised? What screens on what applications are used to update it? Do you know? MDMis not as simple as it looks. It is often a multi-year investment. So ask yourself do you want to start with read-only or read write? If  you buy an MDM system and you start updating master data centrally, what happens if you are still updating it also in other applications? Companies need a well thought out strategy for MDM as part of the Data Governance program. I will be addressing this in further blogs in the near future.

For now though, let me salute my fellow Irish countryman. Arthur Guinness. If you are stressed out on Data Governance at the end of a hard week there is nothing like a decent pint to help you unwind. Cheers!

BI as a Services (BIaaS) – Will Google Move In On This Opportunity?

Thursday, August 27th, 2009

Most of you by now have probably found it difficult to avoid the hype around Software as a Service (SaaS). For many of us today this is already a reality in our business. You only have to look at the huge uptake of Saleforce.com by small and medium size businesses (SMBs) to realize that there is certainly a place for this in many companies. With respect to the BI market there is no doubt that there is also considerable growth in BI as a Service (BIaaS)and it would appear that many BI vendors are eagerly setting out the stall on the net to jump into this market of hosted BI Services. Given that many BI products are already service enabled and also that many BI vendors have BI portal products there is no doubt that they are technically ready. They are however missing one thing – data, your data. Either they point their tools at you databases and access them over the net or they will need a ready supply of data from any BIaaS subscriber. If you already use SalesForce.com you can bet that all BI vendors entering this market will do so with an ETL adapter for SalesForce to get at your data on your behalf.

Of course SalesForce.com itself is no doubt keen on the BIaaS market and is already active in offering added value in terms of BI to existing clients.

Nevertheless, while simplicity, ‘point your browser and go’ and cool pre-built reports and graphs are the obvious attraction, there are implications when adopting BI as a Service in any business. The most important of these is that companies may need to supply their data to BI SaaS providers for upload to BIaaS sites so that ‘instant BI’ can be made available back to them via hosted web enabled BI tools and pre-built reports. There are also privacy regulations that have to be adhered to in this kind of situation not least the UK Data Protection Act. Companies need assurances on data protection as well as data security and should consider the implications of this in terms of giving BIaaS providers their precius operational data to be managed off-site. This is after all, the crown jewels of any business and there is no doubt that BIaaS providers would jump at the chance to know much more about their clients and would be sitting on a potential old mine with all that data. Reliability of such a service is also paramount so that BI is available when you need it. Companies considering this option should also think about what happens if they need their data back in-house and how easy is it to get it back from a BI SaaS provider. It would be madness to subscribe to such as service and overlook this requirement.

I have wondered about the potential size of the BIaaS market but it was not until I was looking at iGoogle a while back that I realised the real potential of BIaaS. Google have been steadily adding more and more services to their portfolio and are now making these services available over the internet for you to personalise with your own portal via iGoogle. All you have to do is click “Add Stuff” on iGoogle to see the huge number of instant services you can add to your portal. So what am I driving at here? My question is this. How long before Google enters the enterprise SaaS market with a vengence? Both SaleForce itself and a BI vendor could easily be a target to this Internet search giant. If you could get at hosted enterprise services in a SaaS offering just by using iGoogle “Add Stuff” to add it to your portal then how many SMBs would do it? That is a huge question. My guess however is that if it is as easy as Google are making it to add stuff on iGoogle today then the uptake by SMBs could be enormous. All Google have to do is solve the data upload problem and deliver vertical data marts for the industry of your choosing and they would no doubt get the attention of SMBs. So for those watching the BI market for mergers and acquisitions, I would not exclude Google from the mix. We may well see a very big splash if Google decides to move on the BI market to open up its stall as a BIaaS provider to SMBs. An iGoogle for Business offering would certainly do it. Who knows – I’m certainly watching with interest.

Enterprise Information Management In Demand

Wednesday, August 12th, 2009

Recently I have noticed a lot of companies raising priority on Enterprise Information Management projects as it becomes clear that many companies realise that they do not have information under control. This includes both structured and unstructured data. It is clear that for unstructured data, enterprise content management, content authoring, tagging, search and taxonomy are all key. Also master data management has a part to play in deciding  facets that can be used in taxonomies.  Structured data needs to consider data naming and definitions, data modelling, data discovery, data mapping, data profiling, data cleansing, data integration, provisioning and data quality monitoring. There is a lot of work out there to do! The big question is how do you tie the two together. The secret is in MDM!  What is your strategy for content authoring, content storage, content tagging, taxonomy, search, business glossaries, data integration ……Get in touch and let me know

Data Federation – Rapid Information Delivery in a Tough Economy?

Wednesday, April 1st, 2009

Increasingly as I speak with my clients and CIOs I meet at various speaking engagements around UK and Europe, it is becoming clear that data federation can potentially offer rapid value to IT budget constrained companies that just can’t find the resources for another major database project. It may be that if you work in IT you are seeing increasing demand from business users for more reports requiring BI and non-BI information to help them manage their area of business responsibility in a more dynamic way.

In a recent paper by Jeremy Hope on Transforming Performance Management he states that “Most organizations want to adapt rapidly to changing events, but find that they are handicapped because of fixed budgets and poor forecasts. Adaptive organizations are able to respond more rapidly by switching resources dynamically to meet new threats and opportunities… “.

In order to do this there is no doubt that companies have to deliver information more rapidly irrespective of whether or not the data is in a BI system. What many cannot afford is going through a formal time consuming process of data warehouse change to bring in all information necessary. Data federation is capable of sourcing data from several places one of which would of course have to be a data warehouse or data mart. But with increasing amounts of valuable information residing outside BI systems (especially on the internet) it seems that data federation has a role to play as a delivery platform rather than having to change data models, ETL processes and creating another cube or relational data mart. My expectation is that over the coming year we will see an increase in demand for data federation software.

Several BI tools have been shipping data federation software as part of the BI tool bundle for some time to allow quick delivery of integrated information. Note that we are not talking about virtual data warehouses – on the contrary, data federation software rapidly integrates historical DW data with other data sources (operational data, internet feeds etc.) to deliver higher value information more rapidly. Therefore I see data federation software as complementary to BI systems.   If you would like more information on data federation please see this article on how it works. There is also a white paper on Maximizing Business Value from Data Virtualization that talks about patterns and best practices to get the most out of this software. If you have already purchased data federation software or are considering it, let me know.

Data Governance Losing Priority in Some European Countries

Monday, March 30th, 2009

Having just got back from a presentation tour in mainland Europe, it seems that in the countries I have spoken that Data Governance came out with a thumbs down vote among CIOs present in my sessions.  In particular in Belgium it would appear to be not on their radar.  Having probed for feedback into what exactly is high priority among CIOs attending my sessions it is almost as if raw ‘survival’ is taking hold. In other words, any IT project linked to business survival in this tough economic climate will get attention but not much else.  Customer retention, self- service, cost reduction/containment and growth are high on the list. One CIO explained to me that his company’s priority over the next 12 months was to allow customers to customise the products and services they offer much more in the future. Therefore in addition to offering their own product lines on the web, they would be integrating their e-procurement with many back end e-suppliers so they can buy ‘on-demand’ to match what a customer wants. This means they want to allow customers to create their own custom ‘package’ before buying on-line and will stretch beyond their own products to stand out from the crowd.  It seems to me that data governance and data quality to some extent are taking a back seat in favour of investment that will keep the revenue rolling in. I would be interested in your feedback.  Is data governance a high priority in your organisation?

Flexible Fact Tables – Best Practice or Rope To Hang Business Users With?

Friday, March 20th, 2009

Blogging occasionally offers up opportunity to open up a good debate. So here goes! Over the last several years I have observed data models in many different BI systems across different vertical industries where so called ‘generic’ fact tables have been designed with only one ‘generic’ measure. The objective of the design approach is that the measure in the fact table is supposed to hold ANY metric. Often this ‘generic’ measure column is then accompanied by some kind of type field to indicate what the measure actually is (what it means) and some other attribute(s) to indicate the level(s) in various dimension hierarchies that the measure stored is associated with.  This helps indicate the additive nature of the metric. Also if it is a monetary measure it may have a currency field and if it is a unit measure it may have a field to explain the kind of units used, e.g. centimeters, litres, cubic metres etc. The stated advantage of these kinds of approaches is flexibility. Adding new measures becomes easy to accommodate as no change to the design is necessary.  It is a perfectly good argument and certainly appears widely practiced by designers.

When it comes navigating such designs to develop queries (or even generate them) it is often the case that IT professionals developing reports for the business can figure out how to use retrieve the information required  (although even IT developers can struggle). However when it comes to business users developing their own ad hoc queries and reports I frequently see these users really struggling to navigate the ‘flexible’ design first trying to figure out what measures mean, if the measure(s) is/are additive and whatnot. More often than not I see this resulting in real frustration among business users who end up getting aggregations in reports wrong and then start to lose faith in their new BI system.  Of course IT steps in to rescue the situation by building more snapshot tables, more materialised views etc. burying generic ‘complexity’ to make the job easier for the user.  More often than not these users also often resort to switching back to Excel to hold data outside any data mart so that they can look at data in a form they understand.

Have you seen this in your organisation?  If so I want your feedback. Is it the case that so called ‘flexible’ design techniques are rope for end users to hang themselves with?  My question is this. What is the best way that you see to design fact tables so that business users become productive and can easily understand how to get at the data when building their own reports? I am not so sure that being so generic is of business value.  Sure it is flexible. But is it usable? What use is flexible design if a business user cannot understand it and make use of all that valuable data? Is it not better to have multiple metric attributes in a fact table (if multiple metrics are needed) with each attribute name saying what the measure actually is?  Let’s have your input!