Monday, March 23, 2015

Hadoop Scalability Challenges

Hadoop is hot, not because it necessarily represents cutting edge technology, but because it's being rapidly adopted by more and more companies as a solution for engaging in the big data trend. It may be coming to your company sooner than you think.

The Hadoop framework is designed to facilitate the parallel processing of massive amounts of unstructured data. Originally intended to be the basis of Yahoo's search-engine, it is now open sourced at Apache. Since Hadoop now has a broad range of corporate users, a number of companies offer commercial implementations of Hadoop.

However, certain aspects of Hadoop performance, especially scalability, are not well understood. These include:

  1. So called flat development scalability
  2. Super scaling performance
  3. New TPC big data benchmark

See "Hadoop Superlinear Scalability: The Perpetual Motion of Parallel Performance" for a more detailed discussion.

Friday, March 20, 2015

Performance Analysis vs. Capacity Planning

This question came up in a (members only) Linkedin discussion group:
Often found a misconception about these terms. I'm sure this must be written in a book, but for informal discussions is always preferable to cite sources from standardization institutes or IT industry referents.

Thanks in advance

Gian Piero
Here's how I answered it.

Monday, March 9, 2015

Guerrilla Training: New Location

Finally! We have a new location for our Guerrilla training classes in Pleasanton, California: Sheraton Four Points.

We had some complaints last year about noise from the car parks of surrounding restaurants during the night at the previous location. Four Points is much more secluded. It also has its, own restaurant, which some of you will recognize if you've attended previous Guerrilla classes (more than likely, we did lunch and/or dinner there).

The current 2015 schedule and registration page is now posted. The classroom is intimate and only holds about 10-12 people, so book early, book often.