Solving a simple problem with ML

ChatGPT has been helping me out

A fair amount if I am going to be honest here. The problem for me always comes down to syntax. Currently focused on Full Stack IOT development my day consists of switching between coding in Python (Flask api and web back-end), JavaScript (with JQuery for front-end) and C++ (embedded).

I find that one of the more interesting parts of any Full Stack IOT code base comes down to dealing with requests and responses. A website has a login page, sends a request to the server. An IOT device has a key to access the online api, sends a request to the server. The server needs to respond but first check the input is valid (sanitize it).

Python/JavaScript/C++ all have their own ways of working with text input. In JavaScript for example you need to trim() but in Python it’s strip() – and in C++ (embedded) it’s a bit more complicated operation involving char arrays. The point is that when switching around between languages I sometimes find myself going blank on the specifics of each approach. That’s when ChatGPT comes in handy for me – as a helper, faster than a google search.

I recently came across an issue which was helped by ML in a different way – namely parsing unstructured text input.

The Problem

I had a series of spreadsheets which I was extracting data from (using Pandas) but the date format was unfortunately unstructured. For example:

12th, 13th Dec 2022 and 5th-8th Jan 2023

In order to continue with the project I needed the date in Pandas date format

2022-12-12

If you give it “12th December 2022” as an input, Pandas can do this for you, but it couldn’t cope with the format in the spreadsheets. Regular expressions weren’t going to do the job either, considering the range of formats. So I asked ChatGPT. The prompt:

Please can you parse the following dates as separate dates in list format, suitable for python pandas usage:
12th, 13th Dec 2022 and 5th-8th Jan 2023

The reply:

['2022-12-12', '2022-12-13', '2023-01-05', '2023-01-06', '2023-01-07', '2023-01-08']

As you can see, this is exactly what I was looking for! Unfortunately my ChatGPT free trial has ended – not that I am opposed to paying for software but that’s not the only issue stopping me from using the api for this. I have been burned in the past by proprietary software solutions which have arbitrarily changed and/or cancelled a product which I relied on. This is a simple use case but I can imagine myself taking advantage of it more and more, then OpenAI pulling the rug (or just raising prices) and leaving me with a whole bunch of useless code.

The Solution:

After looking around I found NLPCloud (no affiliation, just it looks cool and works for me*). They use Open Source Generative AI models which in theory I could replicate and host anywhere. I decided to give it a try.

It works! The free plan api key is now plugged into my program and when my program encounters a date format it can’t deal with, a query is sent.

response = client.chatbot(f"Please can you parse the following dates as separate dates in list format, suitable for python pandas usage: {dates_unformatted}", "Human asking AI for help with a text summarization task. ", [])  #text, context, history

The response is exactly the same quality as the ChatGPT one, using the “finetuned-gpt-neox-20b” model.

What simple things has ML been helping you out with?

*Update: A couple of months later, tried out NLPCloud again and it’s broken. Luckily the same code works fine using hugging-chat-api. I really need to get onto self-hosting, every api breaks sooner or later it seems.