Friday, May 31, 2013

Unsupervised Machine Learning, Most Promising Ingredient Of Big Data


Orange (France Telecom), one of the largest mobile operators in the world, issued a challenge "Data for Development" by releasing a dataset of their subscribers in Ivory Coast. The dataset contained 2.5 billion records, calls and text messages exchanged between 5 million anonymous users in Ivory Coast, Africa. Various researchers got access to this dataset and submitted their proposals on how this data can be used for development purposes in Ivory Coast. It would be an understatement to say these proposals and projects were mind-blowing. I have never seen so many different ways of looking at the same data to accomplish so many different things. Here's a book [very large pdf] that contains all the proposals. My personal favorite is AllAborad where IBM researchers used the cell-phone data to redraw optimal bus routes. The researchers have used several algorithms including supervised and unsupervised machine learning to analyze the dataset resulting in a variety of scenarios.

In my conversations and work with the CIOs and LOB executives the breakthrough scenarios always come from a problem that they didn't even know existed or could be solved. For example, the point-of-sale data that you use for your out-of-stock analysis could give you new hyper segments using clustering algorithms such as k-means that you didn't even know existed and also could help you build a recommendation system using collaborative filtering. The data that you use to manage your fleet could help you identify outliers or unproductive routes using SOM (self organizing maps) with dimensionality reduction. Smart meter data that you use for billing could help you identify outliers and prevent thefts using a variety of ART (Adoptive Resonance Theory) algorithms. I see endless scenarios based on a variety of unsupervised machine learning algorithms similar to using cell phone data to redraw optimal bus routes.

Supervised and semi-supervised machine learning algorithms are also equally useful and I see them complement unsupervised machine learning in many cases. For example, in retail, you could start with a k-means to unearth new shopping behavior and end up with Bayesian regression followed by exponential smoothing to predict future behavior based on targeted campaigns to further monetize this newly discovered shopping behavior. However, unsupervised machine learning algorithms are by far the best that I have seen—to unearth breakthrough scenarios—due to its very nature of not requiring you to know a lot of details upfront regarding the data (labels) to be analyzed. In most cases you don't even know what questions you could ask.

Traditionally, BI has been built on pillars of highly structured data that has well-understood semantics. This legacy has made most enterprise people operate on a narrow mindset, which is: I know the exact problem that I want to solve and I know the exact question that I want to ask, and, Big Data is going to make all this possible and even faster. This is the biggest challenge that I see in embracing and realizing the full potential of Big Data. With Big Data there's an opportunity to ask a question that you never thought or imagined you could ask. Unsupervised machine learning is the most promising ingredient of Big Data.

Wednesday, May 22, 2013

Lead, Follow, Or Get Out Of The Way



If you have been following this blog you would know that I mainly blog about enterprise software, cloud, and big data with a few occasional posts on design and design thinking. That's what I am most passionate about. Having spent my entire career building enterprise software I have realized that success and competitive differentiation in market place boil down to an organization's unique ability to get three things right where management plays a key role: 1) people who can continuously learn and adapt to change 2) processes that are nimble and evolve as the company evolves 3) products that solve a real problem and delight the end users. While I continue to blog about enterprise software I have decided to evolve this blog further by adding a few management posts going forward.

There are a series of management topics that I am interested in but let's start with the basic one which is about my core management philosophy. My management philosophy is "lead, follow, or get out of the way." In any situation I ask myself whether I should be leading in this situation or following someone's lead and extend my full support to do so. If neither make sense I simply get out of the way and let people do their job. Building, selling, and supporting software, like many other things, require a loosely-connected (to put it in software terms) organization where there are leaders who lead and follow other leaders at the same time. This gets more and more complicated as the size and portfolio of an organization grow over years. People draw artificial boundaries and lose sight of the mission and the big picture.

Leading is hard, following is harder, and getting out of the way is the hardest which requires a conscious attempt to empower people to do their job without getting into their way. But, it is an approach that does work and I encourage you to try it out and share it with others.

Photo courtesy: Pison Jaujip