Over the past two days, I dove back into coding, not because I had to, but to satisfy a personal curiosity. It was a refreshing experience, reminiscent of my early programming days when the excitement of exploring something new was my main motivation.

This time, I had the advantage of using tools like GitHub Copilot and Claude Sonnet 3.5, which made the process easier and more enjoyable. These tools have turned casual coding into a fun experience by allowing me to focus on creativity rather than getting stuck on remembering syntax or which packages to include. They take care of the tedious parts, like writing boilerplate code for tasks such as validating input, opening files, and handling exceptions. While these tasks are important, they can be overwhelming when you’re just trying to write a simple program. However, it’s crucial to remember that you still need to be familiar with the programming language you’re using.

During this coding spree, I developed two simple Python programs that I’m quite pleased with. Both projects are now available on GitHub for anyone interested in trying them out.

ProductDigest

The first project I tackled was something I called “ProductDigest.” A friend had asked for handicraft gift suggestions, and instead of sending a long list of URLs, I wanted to create a concise PDF digest. This tool automatically extracts and compiles product details from e-commerce sites into a well-formatted PDF document. It’s particularly tailored for Amazon India product pages, capturing prices and other key details, but it also works with general URLs to gather metadata and generate page previews.

Here’s how it works: the script reads a list of URLs from a file, fetches the title, timestamp, and thumbnail of each webpage, and for Amazon URLs, it retrieves product pricing as well. It then compiles these details into a structured PDF file, making it convenient for users to view key webpage information offline. The project uses Selenium, which automates web browsing by simulating user interactions with a headless browser (Microsoft Edge in my case) to handle the dynamic content of Amazon pages, and PyMuPDF for PDF generation. You can find the code and sample outputs on GitHub here.

Sample output from the ProductDigest project showing the details of one product on a page in PDF
Sample output from the ProductDigest project showing the details of one product on a page in PDF

LLAMA-Vision-Chat

The second project, “LLAMA-Vision-Chat,” was an exploration into image analysis using the Llama 3.2-Vision model. I’ve been using LLM models on my Windows machine for over a year, thanks to Ollama, which allows these models to run locally if you have a good GPU. Recently, Ollama made available Meta’s Llama 3.2 based Vision model, which recognizes what it sees in an input image. I wanted to test its accuracy, especially on photographs and Tamil text.

The script leverages the Llama 3.2-Vision model to analyze images and generate detailed descriptions. It can output the analysis to a file or display it directly in the console. I have used the 11Billion parameter model in Ollama, which is sufficient for my current GPU setup. The Llama models are not open source but are freely available for local use, which makes them accessible for experimentation.

The Vision model supports multiple languages for text tasks, including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai, although image+text applications are primarily supported in English. While Llama 3.2 Vision doesn’t officially support Tamil, it still attempts to identify it, opening new possibilities for OCR applications.

You can check out the code and sample outputs on GitHub here.

Output from the Llama 3.2 Vision model running locally on my PC via Ollama, describing an image of a vegetable salad plate
Output from the Llama 3.2 Vision model running locally on my PC via Ollama, describing an image of a vegetable salad plate
Output from the Llama 3.2 Vision model describing a Tamil newspaper advertisement for a book. The results from two trial runs were incorrect, but you can see the model's effort.
Output from the Llama 3.2 Vision model describing a Tamil newspaper advertisement for a book. The results from two trial runs were incorrect, but you can see the model’s effort.

Summary

Both projects showcase how generative AI and LLM tools have made it easier for me to generate the code, removing the drudgery and allowing me to focus on the creative process. While the code generated by these tools isn’t perfect and requires tweaks, it’s incredibly useful for casual coding tasks. If you’re interested, feel free to explore the projects and see what you can create with these powerful tools.


Discover more from Mangoidiots

Subscribe to get the latest posts sent to your email.

Leave a Reply

Discover more from Mangoidiots

Subscribe now to keep reading and get access to the full archive.

Continue reading

Discover more from Mangoidiots

Subscribe now to keep reading and get access to the full archive.

Continue reading