-->

Search and replace unicode zero width characters <U+200B>

By: Varghese Chacko 2 years, 8 months ago

When we copy-paste text from applications like MS-Word or similar to our HTML or text files, we end-up having unicode characters in the file, ultimately leading to different issues. Inlinus, itseasy to replace it from all files in current directory. A typical example is the zero width space <U+200B> 

First, let us print it and see what Linux says

$ printf %b '\u200b' | uniname
character  byte       UTF-32   encoded as     glyph   name
        0          0  00200B   E2 80 8B               ZERO WIDTH SPACE

Now we can use it in grep to find the files. A typical result I received now is

$ grep -q "$(printf %b '\u200b')" -rl

templates/base.html
templates/base_index.html
templates/includes/custom/blue_menu.html
templates/base_static.html
templates/staticpages/obesity.html
templates/staticpages/slimming.html
templates/staticpages/face.html
templates/staticpages/about.html
templates/staticpages/bridal.html
templates/staticpages/all-about-face.html
templates/staticpages/anti-aging.html

$

Now we can pipe it to sed to do a replacement in the file

$ grep "$(printf %b '\u200b')" -Rl | xargs sed -Ei "s/$(printf %b '\u200b')//g"

Thats it. Thecommand terminates silently and all occurances of <U+200B>

Let us talk!

We take the vision which comes from dreams and apply the magic of science and mathematics, adding the heritage of our profession and our knowledge to create a design.