Quantcast
Channel: BigData – Xebia Blog
Browsing all 30 articles
Browse latest View live

Sentiment Analysis using Apache Hive

Apache Hive is a data warehouse system built on top of Hadoop. Using SQL-like language you can query data stored in the Hadoop filesystem (HDFS). Those queries are then translated into Map Reduce jobs...

View Article


Automated Export of Cloudera Manager Configuration for Hadoop

Cloudera Manager is a web based management application for your Apache Hadoop cluster. It makes the installation and configuration for your Hadoop cluster a whole lot easier and is free for a cluster...

View Article


How to setup MongoDB in production

Interesting story on how to setup MongoDB in production. It talks about 2, 3, 4, even 5 replica sets, it’s pros/con’s and shows you how these setups deal with reconfiguring primary / secondary...

View Article

Design a large scale NoSQL/DataGrid application similar to Twitter.

Design a large scale NoSQL/DataGrid application similar to Twitter with Nati Shalom (Founder and Chief Technology Officer at GigaSpaces).  Nice session taking you through some of the challenges that...

View Article

Combining Neo4J and Hadoop (part I)

Why combine these two different things. Hadoop is good for data crunching, but the end-results in flat files don’t present well to the customer, also it’s hard to visualize your network data in excel....

View Article


Combining Neo4J and Hadoop (part II)

In the previous post Combining Neo4J and Hadoop (part I) we described the way we combine Hadoop and Neo4J and how we are getting the data into Neo4J. In this second part we will take you through the...

View Article

Finding important connections in a network – automatically

One of the domains for which data lends itself well to be represented as a graph is trade. We can take any tradable good, represent the trading actors as nodes, and represent the (amount of) traded...

View Article

Apache Spark

Spark is the new kid on the block when it comes to big data processing. Hadoop is also an open-source cluster computing framework, but when compared to the community contribution, Spark is much more...

View Article


DevOps in a data science world

Many organisations have a new ambition to become a data-driven organisation. In essence, this means the organisation wants to make better business decisions based on insights provided by data [4]. Data...

View Article


A data-platform is just a normal platform

A data-platform is nothing more than a normal (cloud) platform with some additional functionality on top to make it specific to the requirements of the data domain. Instead of the applications that run...

View Article
Browsing all 30 articles
Browse latest View live