Search and Replace - Better Advanced Solution

Search and Replace - Better Advanced Solution
0

#21

This solution is probably 1 or 2 years old but it still holds its weight; maybe you can only change the way it checks for casing with RegExp instead of charCode but hey, whatever.

function myReplace(str, before, after) {
  return str.replace(new RegExp(before,"ig"), (toReplace) =>
    toReplace.charCodeAt(0) < 97
      ? after[0].toUpperCase() + after.slice(1)
      : after
  )
}

#22

@lynxlynxlynx What do you mean? JS supports RegEx modifiers, dotall is on the track to be added to the spec soon too. Not sure how that relates to doing inline replace.

Did you mean matching groups? Because JS RegEx supports those too.

Here’s a relatively one-liner regex that solves the problem using some of the more advanced, and less known features of JS regex.

@KittenHero Nice, when you said one-liner I was thinking the same thing. Here’s an alternative method to insert variables into the regex source. If you are building a regex string that conditionally includes multiple variable, this can be a lot cleaner than using new with a concatenation.

This version covers the missing edge case mentioned in my last comment. Also, you don’t need the g modifier because you’re only doing a single match.

function myReplace(str, before, after) {
 return str.replace(/X/i.source.replace(/X/, before), (m) => {
   if (/^[A-Z]/.test(m))
     return after[0].toUpperCase() + after.slice(1);
   if (/^[a-z]/.test(m))
     return after[0].toLowerCase() + after.slice(1);
   return after;
 });
}
  • /X/.source - references the regex source’s raw string
  • replace(/X/, before) - replaces X in the source with the value of before.

The next part leverages another feature of RegEx. The second parameter of Replace can be fed a function. For every matching group (ie surrounded by parentheses) a match variable will be made available in the function.

str.replace(/group1)|(group2)|(group3)/, (m, g1, g2, g3, offset, string) => {
  console.log(m);
  console.log(g1);
  console.log(g2);
  console.log(g3);
});

This is fully documented at MDN - String.prototype.replace()

If you need to do a complex replace without additional logic you can use $&, $1, $2, $3 in the second parameter to reference the match, group1, group2, group3, etc.

str.replace(/group1)|(group2)|(group3)/, "$&  $1 $2 $3);

The replacement string is whatever you return from the function. By inlining the after construction logic into this string we can capture before in the text, construct after and return the correct version of `after.

Even for an advanced example, this goes pretty far above and beyond the typical usage of RegEx. I only know about this because I use RegExp as the lexer in jquery-csv lib.


#23

I suppose capture group could be a bit advanced, but it does make your life easier when you do search and replace in vim :slight_smile:


#24

That’s just Firefox btw. Chromium’s implementation will differ. Chrome/IE/Edge/Safari are all closed souce so can’t tell. However, the core JS stuff (ie not the browser APIs) has to follow the ECMAScript spec, somyou can tell what it’s doing anyway without the source.

Ok, string.replace is notthing to do with regex, it’s basic string substitution. However, the method does a check to see if a regex is used - if so, it delegates to the regexp module to build the string to look for and the one to replace. The browser is written in C++ (some Rust if it’s Firefox), so if using the native methods is a better option than the JS ones, then it makes sense to use them, but it’ll be on a case by case basis. Like the array method sort: if it’s all integers, generally a fast C++ integer sort from the standard library is (or used to commonly be) used.

Firefox is having parts of itself rebuilt in Rust; it’s Mozilla’s language and it was funded for that reason afaik


#25

It supports very few of them. If it had case-affecting modifiers (eg. \L), it could make things cleaner here. I do rescind the one-regex possibility though, as I don’t see a way to do it with less than two, except maybe if it also supported recursion.

@luishendrix92: yes, it would need to be fixed to work on non-ascii locales.


#26

@DanCouper I could track where it fellback to using the non-optimized version if the native c version doesn’t exist. I couldn’t see where it delegated to using RegEx, so I assumed the RegEx approach was the fallback alternative.

I should probably take a moment to sit and study the code more closely but I’ll take your word for it. Thanks for the feedback. It’s inspiring to see the work Mozilla is doing on Rust.

@lynxlynxlynx I’m no PREG guru, I didn’t really pick up RegEx until I transitioned over to mostly writing code in JS.

  • \L does make a lot of sense for this

Maybe more modifiers will be added in the future. I was honestly surprised to see the Dotall operator proposal on TC39’s list. Even more-so, that the proposal has been fast-tracked to Stage 4 within a year. We can likely look forward to more being added in the near future.

It looks like a lot more are coming soon:
https://mathiasbynens.be/notes/es-regexp-proposals