Big data is a concept that's been as widely hyped as cloud computing, and perhaps just as misunderstood in regards to its capabilities and limitations. One of the aspects of big data that is not clearly understood is how existing databases can be used with data storage engines that are non-relational in nature.
What's involved in moving data from a relational database management system (RDBMS) to distributed systems? And, perhaps more of interest to IT staff, what's the best way to learn about these big data systems, to determine the best way to use them in an organization?
Currently, the most popular example of a non-relational database management system (NDBMS) is probably Apache Hadoop, a distributed data framework that seems to be the poster child for big data and so-called NoSQL databases. But even those descriptions screen the true nature of Hadoop and how it works. What is Hadoop, really, and how can businesses and IT staffers start using it? Which businesses should use Hadoop and where can you find resources for implementing it...?