Posted: Jan 21, 2021

The magic of regex

I’m not an regex wizard in any way, but sometimes regex solves problems in no time.

Yesterday I was about to remove an old VPS that still ran with Wallabag. I wanted to export all entries and import them into Shaarli. This was unfortunately the only way since none of the Wallabag export methods was accepted by Shaarli.

What I did was to export it into an XML file which of course contains a lot of noise I had to remove, I just want the URLs.

The rows looked like this:

<url><![CDATA[https://domain.tld]]></url>

To get rid of everything exept the URL itself I first had to grep it out:

grep '<url><!\[CDATA\[https://.*]]></url>' All\ articles.xml > wallaToShaarli

Now I have the file called wallaToShaarli which only contains the rows with an URL in it.

Next up was to open the file wallaToShaarli in vim and removed everything up until before the https:// part with visual block mode.

For the last part of each row (the part after the url) is easiet removed with Regex. This below will replace ]]></url> with nothing.

:%s/]]></url>//g

This took a few minutes to complete. If I’d do this manually, it would take hours since I had more than 500 entries.