Posts Tagged ‘Data Integration’

Pervasive Rush To Take On The Challenge of Scalable Data Integration

Wednesday, August 18th, 2010

As a member of the Boulder BI Brain Trust (BBBT), I sat in on a session given by Pervasive Software Chief Technology Officer (CTO) and Executive Vice President Mike Hoskins last week.  The session started out covering Pervasive financial performance of $47.2 million revenue (Fiscal 2010) with 38 consecutive quarters of profitability before getting into the technology itself. Headquartered in Austin, Pervasive offer their PSQL embedded database, a data an application exchange (Pervasive Business Xchange) as well as their Pervasive Data integrator and Pervasive Data Quality products which can connect to a wide range of data sources using their Pervasive Universal Connect suite of connectors.  They also offer a number of data solutions.  Pervasive has has success in embedding its technology in ISV offerings and in SaaS solutions on the Cloud.  However, what caught my eye in what was a very good session was their new scalable data integration engine DataRush.

I have had concerns for some time about how data integration tools are going to step up to the challenge of big data.  We are already in the era were hundreds of Terabytes and even Perabytes are a reality in data warehouses.  Also the volume of data needed for web analytics is massive let alone the tsunami of data being emitted by sensors that is coming over the horizon (if that data ever makes it into a data warehouse we really are going to re-define large). There is no doubt that the future is constantly on the up in terms of volumes of data and the number of data sources that the businesses need to integrate data from.  We all talk about how important data warehouse appliances, columnar compression  and scalable MPP databases are to handle large data volumes. But what about data integration? There is not much point having the ability to manage big data in databases if we can’t get the data in there in the first place.  So data integration vendors have to step up to this challenge. Of course several already have.  Many products on the market have offered pipeline parallelism during ETL processing for many years.  We have also seen many vendors switching to an ELT model for better performance so that they can exploit parallel SQL in the target DBMS engine to deal with the problem. That of course makes the data integration engine dependent on the parallel DBMS. This is fine as long as the data integration workload can be fenced off and managed separately from query processing workloads by a DBMS workload manager.  But is that it? What about the fact that these days modern hardware consists of multi-processor multi-core systems. Can a data integration engine itself not exploit this without relying on a DBMS?  Is there any way to get more bang for your buck on this kind of hardware?

Step in Pervasive DataRush.  This new engine from Pervasive is designed from the ground up as an MPP data integration engine that can exploit every core on a multi-core, multi-processor server.  This means that you might potentially avoid the need to go to clustered hardware because you can scale up before you need to scale out. The DataRush architecture is shown below


Source: Pervasive Software

What is interesting about this is not just that it exploits multiple cores in the DataRush engine but  that it also has an analytics library and other plug-in modules. The one that caught my eye was the DataRush Recommender module.  It strikes me that this engine (which can also be extended to support user defined libraries) not only has the capability to integrate data in parallel but it can also analyse that data using analytical models (data mining models) at the same time.  Couple that with the DataRush Recommender module and we are bordering on complex event processing (CEP). It seems we just need a rules engine in there and also of a sudden we are into massively parallel CEP.  Given that data integration is already rules driven it certainly looks to me that this product could potentially go well beyond just doing integration in parallel as important as that need is.  Pervasive has also made their products available in a PaaS offering on the Amazon EC2 Cloud as well as offering them on-premise which means that they can integrate and clean data from inside and outside the enterprise.  Given that you can also embed the technology it is certainly worth a look.  I plan to cover it in more detail in my up and coming Enterprise Data Governance and Master Data Management class running in London on September 22-24

Battle Still Raging For Data Integration Leadership

Monday, November 19th, 2007

Last month both Informatica and IBM, both long regarded as among the leaders in the data integration market, made further announcements to their products in attempts to keep their noses in front of the others in this market. Informatica announced their 8.5 release of PowerCenter and PowerExchange while IBM announced further extensions to their Information Server suite of data management tools. The Informatica 8.5 announcement includes the following:

  • Power Exchange Real-time change data capture
  • Integration of data quality services with SAP operational applications for on-demand data quality as you use the SAP applications (currently this is for name and address data only)
  • An overhaul of the Power Center Metadata Manager to provide search, filtering and personalisation capabilities. This also includes the ability for users to annotate metadata
  • A data masking option to keep secure data safe when generating test data
  • Re-entrant data services and parallelised data quality for more scalability (this adds to the grid computing and the push down optimisation support added in release 8)
  • A Data Quality Assistant to allow Data Stewards to participate in data integration workflows so as to review and edit poor quality data records
  • Web based data quality reports and dashboards
  • Pre-built Data Migration tool on top of the Informatica platform to address this kind of data integration problem

Involving data stewards in the data quality process through the new Data Quality Assistant and the Pre-built Data Migration tool certainly stand out as differentiators. The latter of these is certainly the beginning of a trend among data management vendors in that it introduces the first of potentially multiple patterns on top of the Informatica data management platform.

Not to be undone, IBM responded with the following enhancements to Information Server:

  • A new look WebSphere Business Glossary
  • A new product WebSphere Business Glossary Anywhere to access business metadata from your mobile device
  • Integration of Information Server and WebSphere Customer Center Master Data Management
  • A New Multi-Domain Master Data Management Server with pre-built integration with Information Server
  • WebSphere QualityStage pre-built integration with SAP and Siebel
  • Information Server Fast Track which automatically generation of DataStage ETL jobs (this adds to IBM™s ability to automatically generate EII virtual views and mappings for their WebSphere Federation Server)
  • Metadata Asset Interchange to move metadata between different instances of Information Server (e.g. Development, Test and Production)
  • Data Masking for data security
  • Column level impact analysis
  • Real-time Change Data Capture and Replication (from the acquisition of DataMirror)
  • Pre-built integration of Information Server with WebSphere Integration Developer (WID) and WebSphere Portlet Factory
  • Enhancements to Information Services Director so as to invoke information services via multiple interfaces including web services, EJB, JMS, REST (XML/JSON) and RSS
  • More connectivity
  • The Introduction of the Information Server Blade i.e. Information Server pre-installed with Tivoli Workload Schedule LoadLeveler on an IBM HS21 BladeCenter hardware blade

It seems the two vendors in particular are fighting it out at the head of the pack.  Oracle, SAP and Microsoft will have to move fast if they are to keep pace.

Enterprise Information Integration Products to Watch

Wednesday, January 17th, 2007

This blog entry is co-authored by Mike Ferguson and William McKnight (link) and is being cross-posted on our blogs.

 

Many questions arise about what vendors supply solutions in the market place for enterprise information integration (EII).

 

There are two main types of EII vendors in the marketplace:


1. Model driven federated query EII vendors
2. ETL tool vendors providing EII via data integration services built using traditional graphical data flows and published as web services

 

In the first category the vendors with federated query EII products include

·         Business Objects Data Federator

·         BEA AquaLogic Data Services

·         Composite Software Information Server

·         Denodo

·         IBM WebSphere Federation Server (formerly WebSphere Information Integrator) and IBM Information Server

·         Ipedo XIP

·         Metamatrix

·         Sybase Avaki

·         XAware

 

It is also the case that several ETL data integration vendors have extended their data integration tools to support EII. These vendors include:

 

·         Ab Initio

·         Business Objects – Data Integrator

·         IBM – WebSphere DataStage SOA Edition

·         Informatica – PowerCenter

·         Microsoft – SQL Server Integration Services

·         Oracle – Warehouse Builder

·         SAS – Data Integration Studio