Search and Replace - Better Advanced Solution

evanplaice · January 14, 2018, 6:41am

@lynxlynxlynx What do you mean? JS supports RegEx modifiers, dotall is on the track to be added to the spec soon too. Not sure how that relates to doing inline replace.

Did you mean matching groups? Because JS RegEx supports those too.

Here’s a relatively one-liner regex that solves the problem using some of the more advanced, and less known features of JS regex.

@KittenHero Nice, when you said one-liner I was thinking the same thing. Here’s an alternative method to insert variables into the regex source. If you are building a regex string that conditionally includes multiple variable, this can be a lot cleaner than using new with a concatenation.

This version covers the missing edge case mentioned in my last comment. Also, you don’t need the g modifier because you’re only doing a single match.

function myReplace(str, before, after) {
 return str.replace(/X/i.source.replace(/X/, before), (m) => {
   if (/^[A-Z]/.test(m))
     return after[0].toUpperCase() + after.slice(1);
   if (/^[a-z]/.test(m))
     return after[0].toLowerCase() + after.slice(1);
   return after;
 });
}

/X/.source - references the regex source’s raw string
replace(/X/, before) - replaces X in the source with the value of before.

The next part leverages another feature of RegEx. The second parameter of Replace can be fed a function. For every matching group (ie surrounded by parentheses) a match variable will be made available in the function.

str.replace(/group1)|(group2)|(group3)/, (m, g1, g2, g3, offset, string) => {
  console.log(m);
  console.log(g1);
  console.log(g2);
  console.log(g3);
});

This is fully documented at MDN - String.prototype.replace()

If you need to do a complex replace without additional logic you can use $&, $1, $2, $3 in the second parameter to reference the match, group1, group2, group3, etc.

str.replace(/group1)|(group2)|(group3)/, "$&  $1 $2 $3);

The replacement string is whatever you return from the function. By inlining the after construction logic into this string we can capture before in the text, construct after and return the correct version of `after.

Even for an advanced example, this goes pretty far above and beyond the typical usage of RegEx. I only know about this because I use RegExp as the lexer in jquery-csv lib.

KittenHero · January 14, 2018, 6:51am

I suppose capture group could be a bit advanced, but it does make your life easier when you do search and replace in vim

DanCouper · January 14, 2018, 10:41am

That’s just Firefox btw. Chromium’s implementation will differ. Chrome/IE/Edge/Safari are all closed souce so can’t tell. However, the core JS stuff (ie not the browser APIs) has to follow the ECMAScript spec, somyou can tell what it’s doing anyway without the source.

Ok, string.replace is notthing to do with regex, it’s basic string substitution. However, the method does a check to see if a regex is used - if so, it delegates to the regexp module to build the string to look for and the one to replace. The browser is written in C++ (some Rust if it’s Firefox), so if using the native methods is a better option than the JS ones, then it makes sense to use them, but it’ll be on a case by case basis. Like the array method sort: if it’s all integers, generally a fast C++ integer sort from the standard library is (or used to commonly be) used.

Firefox is having parts of itself rebuilt in Rust; it’s Mozilla’s language and it was funded for that reason afaik

lynxlynxlynx · January 14, 2018, 9:47pm

It supports very few of them. If it had case-affecting modifiers (eg. \L), it could make things cleaner here. I do rescind the one-regex possibility though, as I don’t see a way to do it with less than two, except maybe if it also supported recursion.

@luishendrix92: yes, it would need to be fixed to work on non-ascii locales.

evanplaice · January 15, 2018, 1:21am

@DanCouper I could track where it fellback to using the non-optimized version if the native c version doesn’t exist. I couldn’t see where it delegated to using RegEx, so I assumed the RegEx approach was the fallback alternative.

I should probably take a moment to sit and study the code more closely but I’ll take your word for it. Thanks for the feedback. It’s inspiring to see the work Mozilla is doing on Rust.

@lynxlynxlynx I’m no PREG guru, I didn’t really pick up RegEx until I transitioned over to mostly writing code in JS.

\L does make a lot of sense for this

Maybe more modifiers will be added in the future. I was honestly surprised to see the Dotall operator proposal on TC39’s list. Even more-so, that the proposal has been fast-tracked to Stage 4 within a year. We can likely look forward to more being added in the near future.

It looks like a lot more are coming soon:
https://mathiasbynens.be/notes/es-regexp-proposals

jeremyfiel · August 2, 2019, 8:00pm

My Functional Programming solution. (updated for edge case where before is lcase and after is ucase)

const isUpperCase = str => (/^[A-Z]/).test(str)
  
const preserve = (before, after) => {
  let newStr
  if(isUpperCase(before)) {
  newStr = after.charAt(0).toUpperCase() + after.slice(1)
  return newStr
  }
  if(isUpperCase(after)) {
    newStr = after.charAt(0).toLowerCase() + after.slice(1)
    return newStr
  }
 return after
}

const myReplace = (str, before, after) => {
  let checkCase = preserve(before, after)
 return str.replace(before, checkCase)
}

console.log(
myReplace("His name is Tom", "Tom", "john"),
myReplace("He is Sleeping on the couch", "Sleeping", "sitting"),
myReplace("Let us go to the store", "store", "mall"),
myReplace("This has a spellngi error", "spellngi", "spelling"),
myReplace("Let us get back to more Coding", "Coding", "algorithms"),
myReplace("This is not the wrong caes", "caes", "Case")
)