Welcome

Welcome to my Blog on Building the Smart Business. This blog looks at all the areas that need to be addressed so that companies can transition themselves to being an agile event driven optimised business. To do this we need several building blocks.

  • Data Governance and Enterprise Information Management
  • Master Data Management
  • Process oriented Operational BI via On-demand Analytics
  • Event Processing and Automated Decisioning Management
  • Collaborative, Social and Mobile BI
  • CPM Strategy Management
  • Cloud Computing

I will discuss all of these areas in my blogs and ask you to comment on how you are using these technologies in your organisation.

For more information on Intelligent Business Strategies and how we can help you, please click here

Posted in Welcome | Leave a comment

Big Data Analytics – A Rapidly Emerging Market

Last week in London I spoke at the IRM Data Warehousing and Business Intelligence conference on a variety of topics. One of these was Big Data which I looked at in the context of analytical processing.  There is no question the hype around this topic is reaching fever pitch so I thought I would try to put some order on it.

First, I am sure like many other authors in this space I need to define Big Data in the context of analytical processing to make it clear what we are talking about.  Big Data is a marketing term and not the best of terms at that.  A new reader in this market may well assume that this is purely about data volumes. Actually this is about being able to solve business problems that we could no solve before.  Big data can and more often than not include a variety of ‘weird’ data types. In that sense big data can be structured or poly-structured (where poly in this context means many).  The former would include high volume transaction data such as call data records in telcos, retail transaction data and pharmaceutical drug test data.  Poly-structured data is more difficult to process and includes semi-structured data like XML and HTML and unstructured data like text, image, rich media etc. Graph data is also a candidate.

From the experiences I have had in working in this area to date, I would say that web data, social network data and sensor data are emerging as very popular types of data in big data analytical projects.  Web data includes web logs and e-commerce logs such as those generated by on-line gaming and on-line advertising data.  Social network data would include twitter data, blogs etc. These are examples of interaction data which is something that has grown significantly over recent years. Sensor data is machine generated data from  ’An Internet of Things’. It is something we have only seen the beginning of in my opinion as much of it remains un-captured. RFIDs are probably the most written about of sensors. However these days we have sensors to measure temperature, light, movement, vibration, location, airflow, liquid flow, pressure and much more. There is no doubt that sensor data is on the increase and in my opinion it is something that will dwarf pretty well everything in terms of volume.  Telcos, utilities, manufacturing, insurance, airlines, oil and gas, pharmaceuticals, cities, logistics, facilities management and retail…..they are all jumping on the opportunity to use of sensor data to ‘switch on the lights’ in parts of the business where they have had no visibility before.  Sensor data is massive but we don’t want it all – it is the variance we are interested in.  Many Big Data analytical applications are/will emerge on the back of sensor data. These include analytical applications for use in:

  • Supply chain optimisation
  • Energy optimisation via sustainability analytics
  • Asset management
  • Location based advertising
  • Grid health monitoring
  • Fraud
  • Smart metering
  • Traffic optimisation
  • Etc., etc.

Text as I already mentioned is also a prime candidate for big data analytical processing. Sentiment analysis, case management, competitor analysis are just a few examples of a popular types of analysis on textual data.  Data sources like Twitter are obvious candidates but tweet stream data suffers from data quality problems that still have to be handled even in a big data environment. How many times do you see spelling mistakes in tweets for example.  

There is a lot going on that is of interest to business in big data but while all of it offers potential return on investment, it is also increasing complexity. New types of data are being captured from internal and external data sources, there is an increasing requirement for faster data capture, more complex types of analysis are now in demand and new algorithms and tools are appearing to help us do this

So why is analytics on big data so important – or is it?

There are several reasons why big data is attractive to business. Perhaps for the first time, entire data sets can now be analysed and not just subsets. This is now a feasible option whereas it was not before.  So it is making enterprise think can we go down a level of detail? Is it worth it? Well to many it most certainly is. Even a 1% improvement brought about by analysing much more detailed data is significant for many large enterprises and well worth doing. Also schema variant data can now be analysed for the first time which could add a lot of valuable insight to that offered up by traditional BI systems.  Think of an insurance company for example. Any insurer whose business primarily comes from a broker network will receive much of its data in non-standard document format. Only a small percentage of that data finds its way into underwriting transaction processing systems while much of the valuable insight is left in the documents. Being able to analyse all of the data in these documents could offer up far more business value that could improve risk management and loss ratios.

At the same time there are inhibitors to big data analysis.  These include finding skilled people and a real lack of understanding around when to use Hadoop versus when to use Analytical RDBMS versus NoSQL DBMS.   On the skills front there is no question that the developers involved in Big Data projects are absolutely NOT your traditional DW/BI developers. Big Data developers are primarily programmers – not a skill often seen in a BI team.  Java programmers are aften seen at big data meet ups.  In addition, the analysis is primarily batch oriented with map / reduce programs being run and chained together using scripting languages like Pig Latin and JAQL (if you use the Hadoop stack that is).

Challenges with Big Data

There is no question that big data offers up challenges. These include challenges in the areas of

  • Big data  capture
  • Big data transformation and integration
  • Big data storage – where do you put it and what are the options?
  • Loading big data
  • Analysing big data

Over this and my next few blogs we will look at these challenges.  Looking at the first one on big data capture, the issues are latency and scalability.  Latency needs change data capture, micro batches etc. However I think it is fair to say that if Hadoop is chosen as the analytical platform, it is not geared up for very low latency. Very low latency would lean towards stream processing as a big data technology which I will address in another blog.  Scaling data integration to handle Big Data can be tackled in a number of ways  You can use DI software that implements ELT processing i.e. exploits the parallel processing power of an underlying MPP based analytical database. You can make use of data integration software that has been rewritten to exploit multi-core parallelism (e.g. Pervasive DataRush). Alternatively you can use data integration accelerators like Syncsort DMExpress or exploit Hadoop Map/Reduce from within data integration jobs e.g. Pentaho Data Integrator. Or you could use specialist data integration software like Scribe log aggregation software (originally written by Facebook). Also vendors like Informatica have also announced a new HParser to help with data in a Hadoop environment.

With respect to storing data, there are a number of storage options for analysing Big Data. They range from:

Let’s dispel a myth right away. The idea that relational database technology cannot be used as a DBMS option for big data analytical processing is plain nonsense.  Any analyst opinion claiming that should be ignored.  Teradata, ExaSol, ParAccel, HP Vertica, IBM Netezza are all classic examples of analytical RDBMSs that can scale to handle big data applications with some of these vendors having customers in the Petabyte club.  Improvements such as solid state disk, columnar data, in-database analytics and in-memory processing have all helped Analytical RDBMSs scale to higher heights. So it is an option for a big data analytical project perhaps more so with structured data.  
Hadoop is an analytical big data storage option that has often been associated more with poly-structured data. Text is a common candidate.  NoSQL databases like Neo4J or InfiniteGraph graph databases are candidates particularly in the area of Social Network influencer analysis.   So it depends on what you are analysing.

Going back to Hadoop, the stack includes HDFS  - a distributed file system that partitions large files across multiple machines for high-throughput access to application data.  It allows us to exploit thousands of servers for massively parallel processing which can be rented on a public cloud if needs be. To exploit the power of Hadoop, developers code programs using a programming framework known as Map/Reduce. These programs run in batch to perform analysis and exploit the power of thousands of servers in a shared nothing architecture. Execution is done in two stages. Map and Reduce. Mapping refers to the process of breaking a large file into manageable chunks that can be processed in parallel. Reduce then processes the data to produce results. Hadoop Map/Reduce is therefore NOT a good match where:

  • Low latency is critical for accessing data
  • Processing a small subset of the data within a large data set
  • Real-time processing of  data that must be immediately processed

Also Hadoop is not normally a RDBMS competitor either. On the contrary it expands the opportunity to work with a broader range of content and so Big Data analytical processing conducted on Hadoop distributions is often upstream from traditional DW/BI systems. The insight derived from that processing then often finds its way into a DW/BI system.  There are a number of Hadoop distributions out there including Cloudera, EMC GreenPlum HD (a resell of MapR), Hortonworks, IBM InfoSphere BigInsights, MapR and Oracle Big Data Appliance.  Hadoop is still an immature space with vendors like ZettaSet bolstering the management of this kind of environment. To appeal to the SQL developer community Hive was created with a SQL like query language. In addition Mahout supports a lot of analytics than can be used in Map/Reduce programs.  It is an exciting space but by no means a panacea.  Vendors such as IBM, Informatica, Radoop, Pervasive (TurboRush for Hive and DataRush for Map/Reduce, Hadapt, Syncsort (DMExpress for Hadoop Acceleration), Oracle, and many others are all trying to gain competitive advantage by adding value to it. Some enhancements appeal more to Map/Reduce developers (e.g. Teradata, IBM Netezza, HP Vertica connectors to Cloudera) and some to SQL developers (e.g. Teradata AsterData SQL Map/Reduce, Hive). One thing is sure – both need to be accommodated.

Next time around I’ll discuss analysing big data in more detail. Look out for that and if you need help on a Big Data strategy feel free to contact me

Posted in Uncategorized | Tagged , , , | Leave a comment

The Two Sides of Collaborative BI

While there is a lot of hype around collaborative BI today, this concept is not new. First attempts at introducing collaborative functionality into BI environments happened as far back as eight years ago or more when vendors of Corporate Performance Management (CPM) products in particular added collaborative functionality to their products to allow users to annotate scorecards and comment on performance measures.  In addition being able to email links to report also appeared. While a lot was marketed about these kinds of features, they only achieved limited success. A key reason for this in my opinion was because collaborative functionality was ‘baked into’ BI and CPM tools. In other words vendors brought collaboration to BI.  However the MySpace and Facebook generation taught us a different approach. What these collaborative and social networking environments showed was that it is much more natural to publish content to collaborative workspaces to elicit feedback and to share that content with others who are interested in it.

In the context of BI, this turned the first generation collaborative BI tools on their head and said rather than take collaboration to BI it is far more effective to take BI to collaborative platform where the range of collaborative tools available offers a lot more power. Lyzasoft was a pioneer of this new generation of modern social and collaborative BI technologies.  Also new releases of more widely adopted BI platform products are now being integrated with mainstream collaborative platforms such as Microsoft SharePoint and IBM Lotus Connections.  Even cloud based collaboration technologies from vendors like Google are getting in on the act.  Mobile BI technology is taking this further by allowing people to collaborate on BI from mobile devices.

However, I (and others) would argue that we are still seeing only one side of the coin here with respect to BI and collaboration. That side is the classic approach of formal integration of data from multiple sources into a data warehouse, the producing of intelligence and the publishing of BI artefacts (dashboards, reports, etc.) into social and collaborative environments where it can be shared with others, rated and collaborated upon for joint decision making. But what about innovation, what about when innovative business users want to experiment, get some data and ‘play’ with it in a sandbox environment to figure out what business insight might be useful or to figure out what new metrics that would be useful to the business? Do we not need collaboration here also?  Another probing question is whether this innovation should be ‘upstream’ from a data warehouse? In other words let them play with the data until there is consensus as to what is useful and then feed this into a more classic approach of data integration, storage, analysis and sharing. I am comforted by the fact that it is not only me asking this question. Others like my good friend Barry Devlin are also talking about the use of collaboration and sharing of business insight produced in an innovative environment. I know Barry will be speaking about this here. The point is that in my opinion ( and it is only opinion admittedly) there is a place for collaborative and social BI in an innovative sandbox environment where BI is not yet ‘hardened’.  We need this capability in many industries. I have come across it in both retail banking and in manufacturing for example. However, what must be controlled is the release of newly formed innovation into production. This is where governance comes in. Data governance would allow newly created metrics to be published in a business glossary to be used by multiple BI tools in a hardened production environment for example. Also at this point, new data sources may be declared to a more formal production DW/BI environment for data acquisition.  Therefore we have two sides to collaborative BI, the innovation cycle which needs to share ‘experimental’ information and elicit feedback from other as well as the more formal production BI/DW environment where well polished business insight is shared across the enterprise for people to use and act on.  One feeds the other, typically because innovators also need to collaborate with IT to take the innovation and move it into the mainstream environment.

Let me know what you are doing with social and collaborative BI. I would be grateful for your comments.

Posted in Uncategorized | Tagged , , , , | 2 Comments

Teradata Strengthens It Position in the BIG DATA Market

Today, my former employer (many moons ago – I left 17 years ago!) Teradata announced it is to acquire Aster Data effectively bolstering its position in the BIG DATA marketplace (See the announcement here) . Aster Data has made its mark in the big data market with its well crafted integration of Hadoop Map/Reduce and the SQL query language allowing SQL developers to execute massively parallel Map/Reduce analytical functions on the Aster Data platform and leverage the power of Hadoop.  Aster Data also has a IDE tool to make it easier for developers unfamiliar with Map/Reduce to generate Hadoop M/R applications (e.g. analytic functions) that can then be automatically deployed in a Aster Data nCluster database and invoked via SQL. Furthermore AsterData nCluster also supports both row AND column based storage. Of course Teradata already has a relationship with Hadoop vendor Cloudera to serve up data from Teradata to Map/Reduce applications running on Cloudera’s CDH platform. It is also working on interfacing Teradata with Cloudera’s Sqoop (part of the Cloudera Enterprise offering) to move data into HDFS via the Teradata Hadoop Connector .

Adding Aster Data to the mix means that Teradata now can potentially integrate with Hadoop deployments in both directions rather than one-way as with the Cloudera partnership.  For example, organisations could access Hadoop (Cloudera’s CDH and other other offerings) from from analytical queries running SQL M/R on Aster Data nCluster or indeed I would assume in the future on the Teradata DBMS itself.

There is no question this is a good move for Teradata. It gives them columnar capability and  also Aster Data has a rich library of pre-built map/reduce analytic functions to speed up M/R development and these functions can be invoked from SQL M/R on nCluster.  I would have to assume that Teradata would also want to open the Aster Data IDE and the M/R functions up to Teradata developers to deploy these M/R functions inside of Teradata. That is a no brainer in my opinion. You would also have to say that this takes Teradata in-database analytics to a new level of depth opening up the door for more sophisticated analytic applications. While the Teradata/SAS partnership is a successful one but adding Aster Data will potentially give Teradata much more power in the in-database analytics area. This is an area that really matters in big data environments. It will also give them more to compete with against IBM whose acquisition of Netezza (particularly with its TwinFin iClass appliance) and SPSS has given IBM much more competitive muscle recently especially against Teradata and Oracle (Exadata).   Besides competing with IBM, Aster Data will also give Teradata much more to compete with against Oracle’s Exadata.  We will have to wait to see what HP does with Vertica.

In addition, with the Tsumani of sensor data coming over the horizon this acquisition will help Teradata move into the world of Sensor Data Analytics which, by the way, is a battle still to be fought for(see my blog on this from last year).  Aster Data will help Teradata in accommodating the onslaught of data being generated by organisations increasing the instrumentation of their business operations with sensor networks.  However in this space, adding a CEP vendor technology to the Teradata portfolio would be a good move as sensor data event correlations need to be acted upon BEFORE that event data is stored in a data warehouse.  CEP, Active DW and SQL/MR. Hmm… now that is a combination  worth having. It will be interesting to see what is offered across the family of Teradata Appliances and if Teradata decide to rollout nCluster on any of them.  I would also think that Teradata will make sure they carefully protect the Aster Data customer base if they bring the DBMS technologies together gradually.

My only question is who will acquire Cloudera who have partnerships with BI platform vendors and other appliance vendors. That acquisition would pull the rug from under a lot of players.

Posted in Uncategorized | Tagged , , , , , , | Leave a comment

EDW on a Private Cloud? Steady as you go!

I read James Kobielus’s blog on Innovation Transforms Data Warehousing today talking about EDW in the Cloud as becoming mainstream by the middle of the decade.

My take on this is that configuration management and workload management are critical to large scale EDWs making it to the Cloud. I am thinking more about PRIVATE cloud than public cloud at this point as their are already an abundance of DW/BI PaaS and SaaS offerings on public clouds outside the firewall. However on-premise private cloud deployment is still very young. There is no doubt that small data marts are already moving but large scale EDW are not. Why not? I believe the reason is simple – no one has any experience how to configure virtual resources to make production EDW (Data integration, DBMS and BI platform) maximise it’s use of underlying hardware.

Continue reading

Posted in Uncategorized | Tagged , , , | 1 Comment

Pervasive Rush To Take On The Challenge of Scalable Data Integration

As a member of the Boulder BI Brain Trust (BBBT), I sat in on a session given by Pervasive Software Chief Technology Officer (CTO) and Executive Vice President Mike Hoskins last week.  The session started out covering Pervasive financial performance of $47.2 million revenue (Fiscal 2010) with 38 consecutive quarters of profitability before getting into the technology itself. Headquartered in Austin, Pervasive offer their PSQL embedded database, a data an application exchange (Pervasive Business Xchange) as well as their Pervasive Data integrator and Pervasive Data Quality products which can connect to a wide range of data sources using their Pervasive Universal Connect suite of connectors.  They also offer a number of data solutions.  Pervasive has has success in embedding its technology in ISV offerings and in SaaS solutions on the Cloud.  However, what caught my eye in what was a very good session was their new scalable data integration engine DataRush.

Continue reading

Posted in Uncategorized | Tagged , , , , | 1 Comment

MicroStrategy Takes BI Mobile – What are The Implications of Mobile BI for BI Platforms?

Having just got back from the MicroStrategy World Conference in beautiful Cannes, I thought I would cover what was announced this week at the event.  CEO Michael Saylor launched MicroStrategy Mobile for iPhone, iPad and Blackberry describing it as “the most significant launch in MicroStrategy history”.  In his opening keynote he talked about mobile as “the 5th major wave of computing” starting with mainframes, then mini-computers, then personal computers, desktop internet and now mobile internet.  Their vision here is a good one – BI all the time, everywhere and for everyone. Mobile device access to BI has been around for a while in some offerings but I was impressed with the work MicroStrategy have put into the mobile user interface on touch sensitive ‘gesture’ devices like Apple iPhones and iPads.   They have taken advantage of the full set of Apple gestures and also added BI specific gestures including Drill down and Page By.  They have also released an Objective C software development kit (SDK) for MicroStrategy Mobile.  This allows developers to build custom widgets and embed them in the MicroStrategy Mobile application or embed MicroStrategy Mobile in your own application.

Continue reading

Posted in Uncategorized | Tagged , , | Leave a comment

BITunes on the Cloud? – The Emergence Of Subscription Based On-Demand BI

As I research more and more into the world of Cloud-based BI, it is becoming pretty evident where we are headed. In my opinion we are moving down the road to an iTunes model for BI.   Yesterday I spent some time with Actuate in London looking at their BIRT On-Demand platform as a service (PaaS) solution (which is very easy to use). It was only a matter of minutes before I was up and running with a Mashboard.  A few weeks back in New Orleans I used Dundas Dashboard to quickly build a dashboard from pre-built components. Similarly Microsoft SQL Server 2010 has the ability in ReportBuilder 3.0  to quickly build up a library of components that can be dragged and dropped into a report. The  more I use these products to understand their capabilities the more I see a similarity to what is happening in the information management world.  Looking at cloud-based data integration solutions like Boomi, Informatica and SnapLogic for example, you can see that what these vendors are trying to do is to create a development platform for Information as a Service. In other words you build data integration jobs and then make the results available on subscription such that companies can subscribe to information which is supplied to them by cloud based data integration workflows running on the net.   So now apply this idea to the BI produced on cloud-based PaaS solutions.  Once your reports and dashboards are built then the next thing people are going to want to do is to publish these artifacts as on-demand BI services assuming the intelligence is of business value to others.

Continue reading

Posted in Uncategorized | Tagged , , | 1 Comment

Cloud Based BI – Understanding The Options Is the Biggest Barrier

Last week I was in Munich to present at the annual TDWI (The Data Warehouse Institute) conference on “Business Intelligence and Data Management in a Cloud Computing Environment”.  It was a very well attended conference with some great speakers and sessions.  My session focused on the following:

  • What is Cloud Computing and why use it as a deployment option?
  • Why Cloud BI? – What are the requirements for a public cloud or externally hosted BI system?
  • Understanding what is on offer – The Cloud BI Marketplace
  • Getting data into a cloud based BI system
  • Managing access to cloud based BI systems and analytic applications
  • Integrating cloud based BI systems with on-premise systems
  • Pros and cons of deploying on the cloud?
  • Getting started with Cloud based BI

Continue reading

Posted in Uncategorized | Tagged , , , , , , | 2 Comments

Chasm Not Crossed as A Sensor Data Tsunami Comes Over The Horizon

Just over a week ago I spent a day at SensorExpo in Chicago to present on Complex Event Processing (CEP) discussing how CEP engines, Predictive Analytics, business rules can be used to analyse sensor emitted event data in-motion to facilitate business optimisation.  This was a very busy conference.  I estimated at least 2000-3000 people on the exhibition floor with maybe 400 on the conference.  I found around 100 vendors with all kinds of sensor devices on show exhibiting their products and services.  To my surprise however I had only heard of 2 of the vendors. IBM and Texas Instruments.  The floor was heaving with people looking to instrument their business operations to measure everything from movement, temperature, energy consumption, stress, heat, fluid volumes, pipeline flows and RFIDs.  There were analog devices and digital devices.  When taking to the vendors the big common denominator was that they are all trying to collect the data from sensor networks and RFIDs to analyse it.  Yet other than IBM there was not a single BI vendor in sight. Not even a single complex event processing (CEP) vendor in sight.   I was shocked because this market is clearly booming.   What was even more surprising was that I could not find an IT professional anywhere. 99.9% of all delegates and speakers were engineers.

Continue reading

Posted in Uncategorized | Tagged , , , | 3 Comments

Microsoft Opens Up Collaborative and Self-Service BI

Just over a week ago I was invited to attend an analyst briefing at the Microsoft BI conference in New Orleans that was running alongside the Microsoft TechEd conference.  The conference itself was very well attended with several thousand delegates.  Several things were on show at this event including SharePoint 2010, SQL Server 2008 R2, Office 2010,  PowerPivot, PerformancePoint services 2010. Also on show was SQL Server Data Warehousing Edition (also known as the Madison project) – the massively parallel edition of SQLServer that will be shipped later this year.

Continue reading

Posted in Uncategorized | Tagged , , , , , , | Leave a comment