[python] Best way to remove random length strings from a file with similar starting text. : learnprogramming

Okay, So i have an exported bookmarks.html file from chrome.

I want to clean it up a little by removing these stupidly long strings of useless ICON text. I’m not sure if I should use regex or if there is a better solution. The problem is the ICON encoded strings are random lengths. Is there a way when I can scan from the word Icon and then when it hits “> delete them and everything in between?

for example:

 <DT><H3 ADD_DATE="1601097819" LAST_MODIFIED="1601097819">Alternative Sites</H3>
            <DL><p>
                <DT><A HREF="http://alternativeto.net/" ADD_DATE="1426023513" ICON="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAIAAACQkWg2AAABkElEQVQ4jWMUKFzGQApgQeb8//vn15vHP988/vfzOwMDAzMXH7uoLJuwDE4NP57dmhpm6euUKSYsyMDAcO/xswXrd3UdvsYhpcbIDFXJzGEZDNfw9e65jd2l3FycEK4gP6+juYEk448t526x8Ahi0fD/zy+ur28nLdsS2jbnzZOn9iZ6bKwsMhIinYvXs4vJY3ESp6xWzZ47P988ynYxNVRX/Pr9OzcXh5iwIMRLWDR8f3ytzFY5JzoD4gcCofTn87tABZ6mvAQI997jZzxcnJg6mVh+f4eg368e5EZ4Q0TrJi1QT2t78fodwmCYMia40M+Pr62MdSBmt28/w8jMitVJTFhFGRgY/v/9zcPNic8PLJw89x4/U5KVUpKV2lIZx8DAoCQrhc8GDmGpBet3QdjudmbudmYEnMQhKNF98FbdpAWv3r5nYGB49fb96u0HGBgYvn77AU8XDAwMjCI5c5EN+P7myY+3z/7++sHMxsHKK/Tj7TNGZhZOYSkucQV0P0AAp4gMpwgiefJIqaApAAAyfpCyVQiY4QAAAABJRU5ErkJggg==">AlternativeTo </A>
                <DT><A HREF="http://www.siteslike.com/" ADD_DATE="1365104276" ICON="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAACYElEQVQ4jZWQTUgUcRjGn/mP7q4fM6MzrG7iQTup2SGXSjskHisUkg2lJDLoA9KCiCC6VHQJIygssLCgoIJM+txblJpZgaDhugWFXayxZbbdnVlnP2b+/w4L4uKu0HN93vf3Ps/L6cHnncmJq9ftqFaNVSpwb5wraL14RKr1fsQ64kJD2+JiU1Wxo6Yry0gtPKKRse/E5XvQsh6EWxpoYMJW+UNoemaHHnJAcKeg1JogRR7wZQ008n5xXQihjEFftkX1dxHEo+9a9ZADAEBNFVZ4hpS1VNLlJz1T0YXp5twAm6Ik8bkxLhH0n7s0Vl6dWDFZMgwr8oWUe53UeHwgJ4RYlAHgULf3AjybNqOqazhrgKUisOMLRPZSGnnYvQZCEok0CF8INnMSxh8V8YmeNTGZZYCav0jFFoNq97uyICS+8/zlubcSXE5HvkdnILYJmvpLNjSF6dI93wqEuGpaX4+wNvyYFGCEQgDH5YfQNKhlkGqvRhfvdE4BAKmU5flvahrKvmsnAIAnLlgmwFguApDULXCgJG3TTAVFUWK9B3v7Tt8YvTnQ13FqfkyEQyhGKgZQe9UuA5JRBpdYjPlxCbWHhjoAYE3eWNDf/nXo8IvGtmUYqgFSmLlsmQylVSUIjJei/tjdDrF+98u8XWNBf/tkXwWLP5OYOsgxdZBj+qjAJvsrWCzob189m5UgEAg43G63g+d5os6+8mkjZ4a377EABnzy86jcf6s7KdbNCoIQkSTJ1DQtkfPl4XBY0jRNMH+O79Kenr0NAB7fleO2p/mN0+lMKIqiy7IczVvhf/QPuLwYarnxcHYAAAAASUVORK5CYII=">Sites Like</A>
                <DT><A HREF="http://www.similarsites.com/" ADD_DATE="1426025037" ICON="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAACqklEQVQ4jXWTX2jVZRzGP8/3bBUJJVhdtP5I52xtGSisVpsXh2ZwCgoiGZa7SHIXkljYzFld9MNACBKzi/5cWCIVjd05EgxcthAN808XktmZUjmZOl15Ws5t5/d0cc7GGPS9eV54Ht7vh5fnFfOnoyNDX18ZoHPDh7c90PbU6oiom56aKo4UD/XvSrquzo1r9pQk0Tr+xGZPT5w6sqPQ39Hz6e1Lmp/chzihlCHLiyCaSb0/eSG7EzBAzGxtG2//CmvbpfMnBgGyjS2dmCuGSduLcPxL6v2Es+/0Fj8Ca/aCtsUbtoJWSZo+u2SiBBARd1nOyTxMqME4RziLNC7plqT33JsAsWz15/U2bwDYrlk63NAEMHZ5pB9TD+TtdIXEcsNSW42kziCvXP/+N/fHrffk1kmqBZDEgoX3vgrw8abCMaflzUCNFAuBnFAruJ1QHrjpzvsa1wWmYNtAajsFXm7t/u45gK0vPrijPDW5FvsGFUQJFmDXAU3GhQByklwlEBCKmi8fe31gJcC7nU27x0Yv5oFLVHyqKqH6ADJVAubozZlMbW/rpsFtJEnsXL/86F/D59ttj1V9DLJdG4Y/5xHMqCT1tI2v+IIkiQ825n+ZuP7PK0BICoGkGI6Ab20OA79VN0zNI1r1eKn9NYD31izrcyXnypukP8SN0sjXIUaBQ4QOgL+X9DNwQeg6kIb0UrWvlvhRFWSXSlc/i58+6ThYtkclD9ucMXHK5ijoIPIA5ojh2kzjbdVWCLR3e1fLYACMnTnc7VSLM/K1EGcjPIRdBJ12cAFp35zP8yji9LniybWzVf51b0/p798PdJXL6YTth1KrDnG37UfCXJ4c2LMdoGfX8afBQ38Uj+f3vPX8Ff5nlC1syTU883ZLw7Pdd8w1tuw+1jw//B/QgUlF2BCj8wAAAABJRU5ErkJggg==">Sites Similar to </A>
            </DL><p>
            <DT><H3 ADD_DATE="1601097819" LAST_MODIFIED="1601097819">Tools</H3>
            <DL><p>
                <DT><H3 ADD_DATE="1601097819" LAST_MODIFIED="1601097819">Audio</H3>
                <DL><p>
                    <DT><A HREF="http://www.listentoyoutube.com/download.php?server=srv78&hash=uNSxeYNyypewZG5wmZSacGpg5KWmqXBw4pSXbXFgoGRmZmm0sszWrJ6evKGNoq6s28XI&file=Gresik%20Aerial%20footage%202015.mp3" ADD_DATE="1458075113" ICON="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAAALUlEQVQ4jWNgGAUUA0YsYv8p0Eu0AXA1TETahhMMvAEsUJrYgMNQN/BeGAYAAGmHBQxgHji5AAAAAElFTkSuQmCC">Audio - Strip audio from youtube</A>
                    <DT><A HREF="http://audio-extractor.net/#" ADD_DATE="1426139160" ICON="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAAB6ElEQVQ4jX2Tv2tTURTHP+e++NLGX2CpUH+VBlySwQ4ONRKkohCpxSmrIkRE61L6B+jYKQ4SQRpBcBDsqtAhuChVofgDTBah1FitUCwEtGmeefc4tC++pDXf6Z7v/X6/3HM4V+jApbQmmsbmBC4AQ9+nDcAiMKdC8f2IVMJ6CQ7ZhLr1gzaPcgMwAb8VEMAizEQtk69TUicQZhPq1vv1OcpE2LwDDMr1BjxLlNVtBaz327ug57oY2yGc7amRB5DxUU2qtR8BJ7g/5Pzgyt6nDLtl9MkMb2twrwrVjbYY33c4YfBtLmw+HFkh33eHVHSBmNQpzTc48PMPj5Iw2NsW4DiWnFEhE2Yv75llt6y3aq+pfF72qVab3Dra0YqSMUA8zA275ZDAphHSAIsrPiP7t00j3m3imxmq2uXaM2wuSQsfvOS/QsxLQV4BDA04vKm1u0X5YkSZC5OPf2X5rbFW7UaE40ccBo9FKHzteB2UjDXmIeAH5HJzgKm128w3TlLXHs6noqz17eJqBZbqbX7fKEUBGDvjFwRu7tRkxyqHcf/dKZkwALFVMwlS+p9yG5QXG/uYhK1Vnq2I17sqYwgFwHaxWoQHUbhYSYoHod8YYHxUk6r2GkpGIP5t2niiLCmUjFJcOC2fwvq/nqaoQBHDbC0AAAAASUVORK5CYII=">Audio Extractor - Extract sound from video online</A>

Edit: using this awesome regex site I found this regex expression:

 ICON=".*?"

Now my problem is how to implement it into code?

Edit2: I got this code but it doesn’t seem to do anything….

Import re
for line in open("bookmarksTEST.html"):
    print(line)
    re.sub('/(ICON=".*?")/gm', "!!!!WoRkEd!!!!", line)

Leave a Comment