Sentiment Analysis using Apache Hive
Apache Hive is a data warehouse system built on top of Hadoop. Using SQL-like language you can query data stored in the Hadoop filesystem (HDFS). Those queries are then translated into Map Reduce jobs...
View ArticleAutomated Export of Cloudera Manager Configuration for Hadoop
Cloudera Manager is a web based management application for your Apache Hadoop cluster. It makes the installation and configuration for your Hadoop cluster a whole lot easier and is free for a cluster...
View ArticleHow to setup MongoDB in production
Interesting story on how to setup MongoDB in production. It talks about 2, 3, 4, even 5 replica sets, it’s pros/con’s and shows you how these setups deal with reconfiguring primary / secondary...
View ArticleDesign a large scale NoSQL/DataGrid application similar to Twitter.
Design a large scale NoSQL/DataGrid application similar to Twitter with Nati Shalom (Founder and Chief Technology Officer at GigaSpaces). Nice session taking you through some of the challenges that...
View ArticleCombining Neo4J and Hadoop (part I)
Why combine these two different things. Hadoop is good for data crunching, but the end-results in flat files don’t present well to the customer, also it’s hard to visualize your network data in excel....
View ArticleCombining Neo4J and Hadoop (part II)
In the previous post Combining Neo4J and Hadoop (part I) we described the way we combine Hadoop and Neo4J and how we are getting the data into Neo4J. In this second part we will take you through the...
View ArticleFinding important connections in a network – automatically
One of the domains for which data lends itself well to be represented as a graph is trade. We can take any tradable good, represent the trading actors as nodes, and represent the (amount of) traded...
View ArticleApache Spark
Spark is the new kid on the block when it comes to big data processing. Hadoop is also an open-source cluster computing framework, but when compared to the community contribution, Spark is much more...
View ArticleDevOps in a data science world
Many organisations have a new ambition to become a data-driven organisation. In essence, this means the organisation wants to make better business decisions based on insights provided by data [4]. Data...
View ArticleA data-platform is just a normal platform
A data-platform is nothing more than a normal (cloud) platform with some additional functionality on top to make it specific to the requirements of the data domain. Instead of the applications that run...
View Article