Today is the first anniversary of the MIOsoft blog: the first post was on December 5, 2014. While we're looking back over the past year as we celebrate this milestone in the life of the blog, we're also looking forward to another year of sharing our thoughts with you here.
Over the past week, we've been featuring some of our favorite posts from the past year on social media, with introductory comments from the post authors. If you missed any of it, you can catch up below.
On Monday, we featured The Nexus of Big Data: Context. From the author:
Big data still has big problems. As Hadoop installations mature, companies are beginning to understand that their data lakes require far too many tradeoffs.Read The Nexus of Big Data: Context here.To address efficiency concerns, it seems everyone is looking to Spark. Though Spark is a promising step in the right direction, it’s still very new. It will take time for the learning curve to flatten out and for all the necessary features to be developed and stabilized.
That still leaves the problem of how to make sense of all the data. Companies need to be able to transform messy, incomplete data into high quality, actionable information. That’s where the Context story begins.
On Tuesday, we featured Data First: Never Trust a Dashboard. From the author:
Imagine you were asked to design the dinosaurs for Jurassic Park. You would rely on ecologists, anatomists, and many other experts to be accurate.Read Data First: Never Trust a Dashboard here.In the world of data systems, we often don’t have the luxury of relying on expert work done before us. We have to struggle with everything from being the all-star data paleontologists to the talented artist. And that also means we don’t have a global community of our peers reviewing our work.
In the Data First series, we discuss some of the core tenets that data systems projects need to be successful and withstand scrutiny. In Never Trust A Dashboard, we focus on the importance of having the ability to get back to the assumptions and raw data you used when creating high-level information.
This post is a classic: ensuring that information and the systems that create it are accurate and reliable in the scientific sense is something that will always be important. And as we see increasingly complex systems, including those in fields like artificial intelligence and machine learning, Never Trust A Dashboard’s ideas are perhaps more timely than ever.
On Wednesday, we featured Rethinking the nature of repeating data.
From the author:
The day before Thanksgiving, another client reminded me once again how important repetitions are to data wrangling.They’ve been struggling to produce a file for another vendor, partly because they need to turn vertical repetitions into fixed horizontal columns: the opposite of the scenario I presented in Rethinking the nature of repeating data.
Since our toolset supports repetitions in a universal fashion, it’s easy to go either way.
Read Rethinking the nature of repeating data here.
On Thursday, we featured Accident Explorer: Machine learning with traffic accident data. From the author:
Our technology allows us to do a lot of cool stuff, but it can be challenging to explain what that stuff is and why it’s cool to friends and family.Read Accident Explorer: Machine learning with traffic accident data here.Luckily, we can use examples like Accident Explorer to show how our technology can turn ‘data’ into ‘benefits.’ (And to identify where it might be wise to keep especially alert while driving.)
But Accident Explorer is an especially great example because of the how-to-define-a-cluster and missing-cluster issues discussed in the post. It’s an approachable way to show that even if you’re lucky enough have relatively clean, valid data, you still have work ahead before you can get something meaningful out of it--but here at MIOsoft, we can help you get there.
When the 2014 FARS data comes out, we’ll be loading that into Accident Explorer too, and looking to see how that extra year of data changes things. If we find anything interesting, we’ll let you know!
On Friday, we featured Snap: Programming for everyone. From the author:
Writing this post went pretty smoothly; the surprises came later. I published the post on the blog in late February, then posted on our social media channels about it. The Snap post got a fairly typical number of views over the next few months, in line with views on our previous posts.Read Snap: Programming for everyone here.On June 1, Josh posted Snap: Programming for everyone on Hacker News. The resulting readership spike was enormous--a pageviews vs time graph for the Snap post is completely useless if it includes the first week of June.
Snap: Programming for everyone has been by far our most popular post, and we see sustained interest in it even today, 6 months after its big break. We’re really pleased to see so much interest in Snap and in Snap’s educational goals.
We’re planning a second post about Snap for February 2016—but in the meantime, enjoy the original!