Lately I have been doing a lot of data processing in my job for UCL. I recently came across a problem which didn’t have a ready made solution, so here is my take on it.
Inverted Index
An inverted index is simply a list of all the unique words in a document, labeled by their position. Something like {‘Hello’: [1], ‘world.’: [2]}. Except that they are rarely in order and there can be multiple instances of any given word in a document.
I was given the task of ‘unpacking’ one of these (actually thousands, but if you can do one..). The inverted index came in the form of a dictionary of words and their positions, returned from an api call. Since I couldn’t find a ready made solution, here is my take on it, in Python 3.
The solution
You can try it out with the included example. I hope this helps someone with a similar problem to solve – do let me know in the comments if you have a different solution.