I love facebook. There I said it. No I do, it’s fantastic. The number of friends I’ve re-established real contact with and caught back up with in real-life, is testament enough. And not only that but the friends I’ve made along the way, chatted with, and generally shared life’s up and downs with all through this ubiquitous, globally accepted, easy to use platform… Man I’d pay for it it wasn’t free. Don’t tell Zuck !
Anyway, with privacy and data ownership laws being what they are these days, Facebook, along with most of the other big social media outfits, have made it possible and easy for users to download all the data that has been collected about activities on the platform ever since the day they joined, in a machine-readable format. All you need are some semi-mad skills to interpret and derive insights from it.
So what would I want to know about my facebook goings-on that may interest or inform me ? Well there are the easy ones – how often do I use it and when, how has my usage evolved over the 12 years since I joined ? I’ve typically perceived my usage to by cyclic or in phases – I’ll use it rabidly for a while then kick it to the curb and forget about it, then pick it up again. Is my self-assessment true to the data or is there some other pattern in reality ? Also what kinds of posts to do I tend towards ? Do I talk about myself (status updates), talk to and comment upon others posts (comments), share stuff (photos), or just talk out loud a lot (make posts) ? Well, with my fledgling Data Science skills, I downloaded my data, dove in and, in the words on my favourite Psychology Professor, “got empirical” 🙂
Some things to note before we go on:
- The download doesn’t give you everything, for example, it’s mostly focussed on the above four categories shown in parenthesis – they don’t include the host of other activities like voting in polls; expressing interest, creating or attending events; liking something; friending someone; adding life events, etc etc.
- There are some nulls amongst the data which aren’t explained but they are statistically few especially when seen in the context of 12 year data dump.
- Feel free to share this is you like – yes it’s my activity data but there’s nothing personally identifiable in it – it’s more the patterns associated with a random FB user who happens to be me, and while I’m aware of personal security and privacy, this data does not pertain to those so share away 🙂
So with that out of the way, let’s dive in and get our charts on ! The analysis will consist of a question followed by my visualisation to inform the answer, followed by my analysis of the insights revealed.
- How many posts have I made since day one, and what’s been the pattern to now ?
Well it turns out I’ve made a total of 1,026 posts in the dataset. Pretty sure I’ve interacted a whole lot more than that but as mentioned above this pertains to those four key categories, so while not every little thing is there, it’s enough to form a representative indication of overall activity.
We can see the initial excitement while I joined up and went crazy friending-up as the network effect took hold. Then there was a period of relative silence, followed by a spike, a dip, a larger spike, a dip, and another spike. So more or less cyclic as I’d thought, just the amplitude that varies and this is further backed-up by the trend line suggesting a sinusoidal waveform. There are life and global or even local events that happened which I personally can identify here, which explain to me the reasons for the patterns shown, suggesting a correlation between FB activity and things happening in my biosphere.
2. When do I post and to what degree ?
Dividing the day up into it’s diurnal phases and showing the activity as a scatter plot with number of events determining the size of the dots, we see the majority of the activity is during the day but a good amount of late night and insomniac-style early morning goings-on.
3. What sort of posts do I make depending on time of day ? Do I get all sharey during the day then all commenty during the night ?
Going from top to bottom following the legend, I generally post more than other activities, and do this with increasing frequency from midday up until about 8pm. I share, and update my status, to a medium level during the day, then really ramp up that activity from 7pm to midnight. My writing on people’s timeline is the least frequent and variable activity but does see a slight up-tick after 9pm.
4. During any given time of day, what sort of posts do I make and in what relative quantities ?
This one follows on from #3 above in that it shows the general timings of my activities but drills deeper to show the mix of posts and the relative amounts of each, that I’ve engaged in over time. It shows that there’s not much variety in what I’m doing in the wee-small hours; that if I’m to write on someone’s timeline it’s more likely to be either at 10am or 11pm and that the amount of these varies throughout the day. Commenting/posting kicks off around 12pm, is of a high though varied volume, while status updates – in contrast the the former two – follows a fairly predictable timing, quantity, and pattern, from midday through to midnight.
In order to show the evolution of the types and quantities of posts over time I produced an animation, sped up and aggregated by week:
The next step for this puppy is to produce a smooth animation mimicking Hans Roslings brilliant and now famous TED talk about global health as it relates to demographics and economic development. Smooth visualisation animations involves the use and integration of many different tools and some pretty exotic libraries. It’s on my list though… on my list 🙂
Privacy and data ownership laws have made it a requirement for companies collecting personal data to make that data easily available to the party from which it was either knowingly or unknowingly collected. Sometimes it can surprise in the volume, variety, and velocity (yes the three V’s of Big Data – yes there’s been a 4th V added being Veracity; but personally I don’t think this applies when you’re taking about an Exabyte-scale data lake) that is collected and returned; other times it turns out to be quite minimal and pedestrian – “nothing to see here”. Examples for the latter currently being Apple and Spotify – I downloaded my data for those and it was a snooze-fest. So kudos to Facebook I suppose, for providing more than at least those other big players which I have so far sampled, even if the data is not exhaustive or complete it’s certainly more than what I’ve seen from the others I’ve looked at to date.
And as for what it tells you ? Like all exploratory data analytics and visualisation – often more than what you thought it would, sometimes confirming, other times contradicting your understanding or expectations.
This to me is what good Data Science is about – finding the patterns, the outliers, confirming the expected, revealing the unexpected, gaining novel insights and making sound predictions. And above all making all of this easily understandable to whatever audience is necessary. There is no room for technical or academic arrogance in this field – it’s about being entrusted with massive amounts of oftentimes sensitive data and being relied upon to turn it into actionable intelligence for the benefit of the owner in a way that is doable, and has a measurable, material impact on the bottom line.
Bad Data Science is lazily turning numbers into pie charts and a bit of Excel, and you can expect none of that from me.
Anyways, thank you for reading – I hope you enjoyed learning about my strange habits and if you like what you’ve seen then please share, or feel free to leave a comment.