Whitelisting vs. Blacklisting

Hello, fellow campers!

Hopefully I’m bringing this up in the right way. Apologies if not!

Given that whitelisting is generally more secure than blacklisting, I wonder if the way for sanitzing strings suggested in
https://www.freecodecamp.org/challenges/convert-html-entities is the best approach for teaching newbies.

Thanks for building a great resource! I’ve been enjoying working my way through the algorithims.

–Ael

Please excuse my ignorance, but what does that problem have to do with whitelisting and blacklisting?

I apologize–I should have done a better job writing my initial post! Thank you for asking.

So, the problem asks that a given input string be sanitized–that is, that special characters which have meanings (&, <, >, " (double quote), and ’ (apostrophe)) be replaced with the HTML entities that say ‘display this character’ rather than ‘do this thing’. You might sanitize your data so it actually displays properly. You also sanitize your data to avoid injection attacks.

A blacklisting solution to the problem involves writing an algorithim that affects specific special characters only–you’re saying ‘these aren’t great and need to be tweaked’.

A whitelisting solution to the problem involves writing an algorithim that passes through specified non-special characters (say, A-Z, a-z, 0-9, etc.) and replaces the others with their HTML entity equivalents. You’re saying ‘these are fine and go through; everything else is tweaked.’

Yes, I get the difference between white and black listing, I’m just used to being applied to access, not to characters.

This is a section on algorithms, not internet security.

Are you saying that this approach is a danger to the computer’s security? I’m no security expert by any means, but I fail to see how this poses a risk.

If you’re saying that it is “dangerous” because it might contain some unprintable character, etc. then I’d say that is true of any user input. Yes, there are better ways to “sanitize” user input, but the blurb doesn’t claim that this is a method to sanitize input completely, nor does it even say that it is user input.

I think the purpose is just test regex skills with a silly problem. When teaching, it is important not to overwhelm the student with too many complications too early.

I appreciate your concerns, but I disagree with them in this case. Maybe someone else will pipe in that has something different to say.

Yeah, it’s just an exercise to try to get you to understand how to use regex - in practice you’d use an XML/HTML parser, not regex. It’s useful to think about white/blacklisting but it doesn’t really apply here (though you could make the solution as complex as you wanted, with dictionaries of allowed/disallowed characters)

1 Like

I’m saying that a mindset of blacklist vs. a mindset of whitelist can be dangerous to computer security, as students will take the thought-patterns we’re taught and extrapolate outwards. ‘How have I solved this problem in the past? Oh, right, I replaced special characters like this…’ The issue with using a blacklist mentality when sanitizing data is that you’ll miss something that someone can use for an attack when user input is permitted. Sorry if I answered a question you didn’t need answered; I was taking your ‘I fail to see how this poses a risk’ question literally. :slight_smile:

You’re right that the blurb doesn’t say the test strings are user input! Also, that the purpose of this is to test regex skills. I do think that it could test regex skills just as well with minor tweaks (and adding no extra complexity), but I am a student in this, not a teacher. If you do not find it bothersome, I understand, and thank you for your time and attention!

Re. the dictionaries, you’re quite right. That’s my plan, since the parser wants entity names rather than entity numbers! :slight_smile:

1 Like

Yeah, you can make quite a nice exercise out of this task I think - you can make it into something quite fully featured, which will give you a real feel for the difficulties of actually doing this in a real life situation.

To be fair. If people are thinking about internet security, they are probably gonna look up a better solution then regex and come across one.

Then again I’m finding alot of people asking for help on questions have been solved a million times over on google so who knows.

Still, I dont think MOST people going into sanitizing input will think back to this problem as the way to do it.