====================================================================== Title: Plaintext Weblog Posts Date: 2025-01-25 Tags: smolnet Link: https://spool-five.com/posts/2025-01-25-plaintext_weblog_posts/ Word Count: 620 ====================================================================== I use sourcehut pages for both my gemini and website. It is a simple, great service where you don't need to worry about managing a server. Sorcehut pages[1] =>[1] https://srht.site/ One downside is that you are forced into a 'static site' mode of thinking about your content. This isn't necessarily a bad thing, but it means you can't do much with the content on the server side. For example, I needed to use a separate server instance for the interactive/games content that is available at dev.spool-five.com. Gemini games[2] =>[2] gemini://dev.spool-five.com Recently Kris Occhipinti posted a video where he demonstrates the cli-friendly nature of his website. His site serves the content differently if your client is curl/wget, and allows you to do things like searching the site without leaving the terminal. You can see for yourself: wget -qO- "https://filmsbykris.com" Kris Occhipinti Video[3] =>[3] https://www.youtube.com/watch?v=IsKN6nuTauY I loved this idea, but in the context of a static site hosted on sourcehut pages this isn't really an option. So I wanted to try an alternative, which would at least help to make my site more 'grep-able'. I didn't necessarily succeed in that sense, but I ended up writing some scripts to generate plaintext version of my web content that could sit alongside the (more 'bloated') html. It was a fun exercise. Like a lot of people on here I'm sure, I've always been curious about trying to build a static site generator myself, but have been dissuaded by all the great options that are already available. So this scratched that itch a bit by giving me a chance to write something that could at least generate a more stripped-down, portable version of the content that I've posted here. The same 'source' content persists in the form of markdown, and the plaintext versions of the files are generated alongside them within the sourcehut actions workflow. You can see an alternative 'index' of all the plaintext content at the link below. Index of plaintext version of site[4] =>[4] https://spool-five.com/pt/ If you want to read this post in the terminal you can try: wget -qO- "https://spool-five.com/pt/2025-01-25-plaintext_weblog_posts.txt" While not as elegant as Kris Occhipinti's cli interface, you can also use the 'plaintext' index of posts to filter for information using wget. Each line of the plaintext index is split into four sections with spaces: date, title, tags, link. The formatting of these lines uses something similar to the Denote emacs package, where title words are separated by dashes and tags are separated by underscores. 20250101 Title-of-post _foo_bar https://example.com In this format, if you wanted to get the link for the oldest post that was tagged with 'philosophy', you would filter for '_philosophy' and take the last entry: wget -qO- "https://spool-five.com/pt/index.txt" | grep "_philosophy" | tail -n1 | awk -v x=4 '{print $4}' Or, to print out a random page to the terminal: wget -qO- "https://spool-five.com/pt/index.txt" | shuf | head -n1 | awk -v x=4 '{print $4}' | xargs -I {} wget -O- {} | less The scripts for converting/building these features were written using babashka. I was pleasantly surprised at how simple it was to integrate into the sourcehut build process. I'm not a professional developer so I don't engage with CI/CD much, but the sourcehut build instructions were quite clear and easy to follow. Sourcehut builds[5] =>[5] https://man.sr.ht/builds.sr.ht/ Babashka[6] =>[6] https://babashka.org/ Source code for the scripts[7] =>[7] https://git.sr.ht/~loopdreams/spv-plaintext