Monday, March 21, 2016

wah so big



saosebastiao 2 days ago

This is so true. I do business intelligence at Amazon, and I've seen this play out millions of times over. The fetishization of big data ends up meaning that everybody thinks their problem needs big data. After 4 years in a role where I am expected to use big data clusters regularly, I've really only needed it twice. To be fair, in a complex environment with multiple data sources (databases, flat files, excel docs, service logs), ETL can get really absurdly complicated. But that is still no excuse to introduce big data if your data isn't actually big.
I really hate pat-myself-on-the-back stories, but I'm really proud of this moment, so I'm gonna share. One time a principal engineer came to me with a data analysis request and told me that the data would be available to me soon, only to come to me an hour later with the bad news that the data was 2 terabytes and I'd probably have to spin up an EMR cluster. I borrowed a spinning disk USB drive, loaded all the data into a SQLite database, and had his analysis done before he could even set up a cluster with Spark. The proud moment comes when he tells his boss that we already had the analysis done despite his warning that it might take a few days because "big data". It was then that I got to tell him about this phenomenal new technology called SQLite and he set up a seminar where I got to teach big data engineers how to use it :)
P.S. If you do any of this sort of large dataset analysis in SQLite, upgrade to the latest version with every release, even if it means you have to make; make install; Seemingly every new release since about 3.8.0 has given me usable new features and noticeable query optimizations that are relevant for large query data analysis.

Buttons840 2 days ago

Me and a coworker were laughing at the parent comment, and I told him:
"I guarantee that somewhere, sometime, an engineer has been like 'hay guys, I loaded our big data into SQLite on my laptop and it ended up being faster than our fancy cluster'". We then joked that the engineer would be fired a few weeks later for not being a "cultural fit". A few minutes later you commented with your story. I hope you didn't get fired? :)

