Traditional RDBMS & New Data Processing
Over the past two decades relational databases have been most successful in
serving large scale OLTP and OLAP applications across enterprises. However,
in the past couple of years with the advent of Big Data processing,
especially for processing unstructured data coupled with the need for
processing massive quantities of data, made the industry to look into Non
RDBMS solutions. This has lead into the popularity of NOSQL databases as well
as massively parallel processing frameworks.
However the traditional RDBMS were quick to react and added several Big Data
features as part of their offering so that the enterprises with a heavy
investment of traditional RDBMS can have the best of both worlds by properly
leveraging these new features.
The following sections provide ideas about Big Data features in the popular
SQL Server databa... (more)
Master Data Management (MDM) is a very important data governance aspect in
enterprises whereby MDM enables the development of a "Single Version of
Truth." MDM establishes Single Version of Truth by providing common
descriptions for enterprise-wide entities.
Need for MDM in Big Data Processing
Before Big Data, enterprises generally managed their transaction data in
traditional relational databases. One of the biggest strengths of relational
databases is their ability to enforce constraints like check constraints,
primary key, foreign key, etc., which ensure that the data captured i... (more)
Microsoft recently announced the PolyBase technology as part of SQL Server
2012 Parallel Data Warehouse solution. PolyBase is a breakthrough new
technology on the data processing engine in SQL Server 2012 Parallel Data
Warehouse designed as the simplest way to combine non-relational data and
traditional relational data in your analysis.
PolyBase is part of an overall Microsoft "Big Data" solution that already
includes HDInsights (a 100% Apache Hadoop compatible distribution for Windows
Server and Windows Azure), Microsoft Business Intelligence, and SQL Server
2012 Parallel Data ... (more)
Data Warehouse as a Service
Recently Amazon announced the availability of Redshift Data warehouse as a
Service as a beta offering. Amazon Redshift is a fast, fully managed,
petabyte-scale data warehouse service that makes it simple and cost-effective
to efficiently analyze all your data using your existing business
intelligence tools. It's optimized for datasets ranging from a few hundred
gigabytes to a petabyte or more and costs less than $1,000 per terabyte per
year, a tenth the cost of most traditional data warehousing solutions.
Architecture Behind Redshift
Any data warehouse ... (more)
Big Data & Text Analytics: As the analysis of large amounts of
unstructured data is gaining a major space in enterprise computing,
we are seeing the emergence of more use cases in this regard. While
the term "Big" in Big Data makes it more synonymous with
Massively Parallel Processing frameworks like Hadoop, however the
underlying the success of Big Data relies on effective usage of
content analytics of the underlying unstructured data. I have high
lighted this thought process in my earlier article, Big Data Analytics
Thinking Outside Of Hadoop.
U... (more)