I have an HTML content which I was asked to extract the content of the tags using regex. I know they must be an elegant way to do that using beautiful soup but I was asked to do that using regex. I have html content
<div id="sym">
<div id="Y" class="s"><a class="ey" href="/browse/o">orange</a></div>
<div id="Y" class="s"><a class="ey" href="/browse/m">mango</a></div>
<div id="Y" class="s"><a class="ey" href="/browse/b">banana</a></div>
<div id="Y" class="s"><a class="ey" href="/browse/a">apple</a></div>
</div>
i want to print
orange
mango
banana
apple
i try this but it didn’t work
import re
file = open('test.html')
myfile = file.read()
lines = myfile.splitlines()
matching=re.findall(r'<div[^>]*class=.*?s[^>]*>',myfile)
for style in matching:
for b in style:
c=re.findall(r'<a[^>]*class=.*?ey>([^<]+)</a>',b)
print(c)