
“I just want to say one word to you. Just one word.
Yes, sir.
Are you listening?
Yes, I am.
Plastics.”
– The Graduate
We are undergoing a renaissance in the business intelligence space.
It started with companies like QlikTech and Tableau who sought to empower departments and analysts who were frustrated with the lengthy, inflexible, and costly implementations of monolithic, traditional BI solutions (this new movement has oftentimes been referred to as ‘self-service BI‘ which is part of the overall ‘consumerization of enterprise software’ trend).
More recently, companies such as SiSense, Platfora, EdgeSpring, GoodData, and others have taken up the cause promising new insights with unprecedented speed, flexibility, and/or scale.
While I am excited by the flurry of activity and innovation in this space, there are glaring gaps that have yet to be addressed.
The Problem
Before data can be analyzed, it must first be prepared (consolidated, cleansed, munged, etc.). Curt Monash wrote a great post about this recently.
Furthermore, a major challenge for SaaS-based BI vendors is convincing/enabling their customers to upload their data (typically located on-premise) to the cloud for analysis. This has been such a problem that some vendors who started off with cloud-based solutions have pivoted to on-premise deployments only.
The Idea
Given these challenges, someone should create ‘data management as a service’.
The idea is fairly simple (although there are some technical challenges to be sure):
- A customer stages documents (.csv, .txt, .xls, .xml, etc.) in a local folder
- The documents are uploaded to the cloud via a Dropbox-like service
- After receiving notification that the documents have been successfully uploaded and processed, the customer logs in for review
- The service identifies patterns across documents and suggests rules to be applied (normalization, classification, error handling/correction, etc.)
- The customer accepts/rejects/modifies these suggestions and defines data transformation rules as well as output format and output location
- The customer saves this ‘job’ and specifies how frequently it gets run (daily, weekly, monthly, whenever a new file is added to the local staging folder, etc.)
- The service is priced according to the amount and frequency of data processed with additional services available such as archiving, advanced processing/transformation, multiple output locations, etc.
The Benefits
‘Data management as a service’ would solve two major points of friction around self-service BI:
- Data access – Previously, users were beholden to IT to access ‘system of truth’ data. They also needed IT or outside consulting services to format data before analysis could begin. With this service, users could exploit ‘good enough’ data they obtained through various means (spreadsheets, reports, screen-scraping, etc.) and control the frequency of data updates.
- Choice – For companies that wanted to leverage SaaS-based BI solutions, they could specify that the output be pushed to a specific location in the cloud instead of back to a local folder (in fact, part of the transformation performed could be abstracting/masking sensitive data to address concerns around storing this type of data outside of the firewall). Customers could also then store their prepared data independently of their BI solution to avoid vendor lock-in.
I could even see having pre-defined output formats according to the BI tool of choice (e.g., QlikView format, etc.).
What do you think about ‘data management as a service’? Does something like this already exist? Let me know your thoughts.
Ken,
I’m quite late replying to this, I see, but I immediately think “you’re on to something” here.
Have a look at dataloader.io, which does a form of this for integration with Salesforce.com (more operational than analytic). I don’t think it has much of the data cleansing/correction type stuff, and Salesforce is the only target supported.
One advantage to doing this as a service is that your service should very quickly learn to suggest data cleansing/data quality rules to customer N based on rules created by customers 1..N-1 — or even to cleanse and correct certain types of data by comparison with other customers’ address databases. There is some parallel to Boomi Suggest here… as Boomi’s users design mappings from e.g. Salesforce customer to MyFavoriteSaaS customer, future Boomi users are offered the same data mappings as a labor-saver. You could also offer:
— value-added data correction based on third-party services– “For an extra $5 per 1000 customers, I can verify your incoming customer address data against Melissa Data” etc.. .
— value-added master data capture — “We can build the great data hub in the sky for you off of this integration process” etc..
hmm. You could white-label this as a service to be offered by SaaS analytic providers for whom a lot of the more “crude integration” methods would be more than they’d want to support. You offer to provide the “first mile” for getting on-premise data into a form that would go into their analytics/reporting engine.
Fun! (I’d love to hear your reaction.)
Thanks. I will definitely take a look at dataloader.io.
Agreed. SaaS vendors typically try to hide or shore up weaknesses as compared to on-premise solutions, but there are definitely opportunities to ‘play offense’ by delivering additional value that is difficult/impossible for traditional vendors to deliver. As you mentioned, one possible way is to use ‘the wisdom of crowds’ to make the service smarter/better through increased customer usage.