Alright, so today I wanna share something kinda random I stumbled upon while messing around with some data stuff. The title? Well, it’s “what does mimis mean“. Yeah, I know, sounds like some kinda internet slang, right? But trust me, it got a little interesting.

It all started when I was trying to clean up a dataset, you know, the usual. I grabbed this data from some open source project, think it was related to a bunch of user feedback from an app, and there was this weird column filled with what looked like different units of measure or something. I’m sifting through it, deleting duplicates, filtering out the useless columns when bam! a bunch of rows flagged as “mimis”.
My first thought was, “Okay, somebody made a typo.” But there were way too many “mimis” for it to be a mistake. So, naturally, I did what any self-respecting data tinkerer would do. I Googled it. Now, the results were a bit all over the place, nothing that really jumped out and made sense for my use case.
I decided to dig deeper into the dataset itself. I started by grouping the rows by “mimis” and other values in that same column to see if there were any other units, and looked at associated values in other columns, like number of ratings, timestamp etc. At first, nothing. Nada. I then plotted a simple distribution of counts for each unique entry in the entire column and “mimis” had a lot of rows associated with it, more than entries like ‘seconds’, ‘minutes’, ‘hours’
So, thinking this might be time, I grabbed a subset of the rows where the value was “mimis” and looked at the other time related columns. Bingo! I noticed that the numbers attached to “mimis” were actually really small, and then it hit me. Could this be milliseconds?
To test this, I wrote a quick little script to convert all the “mimis” values to seconds by dividing them by 1000, and then I checked the range of the new “seconds” column. It all seemed to line up perfectly with the other time-based data. Awesome!
Then, I replaced all instances of “mimis” with “milliseconds” in that particular column. The final step was to write some documentation to the data cleaning script so that whoever came after me would actually know what the heck I did.
Honestly, it was a pretty simple fix, but it goes to show that sometimes the most interesting problems come from the weirdest places. And hey, I learned something new along the way! Data cleaning: not always glamorous, but always an adventure.