e.g. a file like this:
I get it like this:dfasdfasdfaasdfsregaregeagrerg242342423ytuyutuy qqweqweqweqsdadsasdasdasdzxczcxzcx
Fine. So I can't just remove newlines, so a simple sed oneliner won't work. But a little looking on the web gets me a summary of quick sed oneliners which has exactly what I'm looking for but would never in a million years have figured out on my own:dfasdfasdfaasdf- sregaregeagrerg- 242342423- ytuyutuy qqweqweqweq- sdadsasdasdasd- zxczcxzcx-
It looks for the dash followed by the end of line (in sed fashion, the new line character is not part of a line), and if found appends that line -and- an actual new line character to the search space, which is then searched for by the next 's/...' and removed (and then a little 'goto'ing' which I never new existed in sed before).# if a line ends with a backslash, append the next line to it sed -e :a -e '/-$/N; s/-\n//; ta'
Great. Except it doesn't work. Why not? because..well, before the explanation, I have to complain about the hours and hours (well, 3) that I spent doing the 'debugging by permutation', trying all the possibilities of small changes, maybe it's for a different shell or slightly different sed version, or whatever. OK, that's enough...on with the solution...
Like all the Sherlock Holmes stories, there's always a tiny bit of information that the author doesn't tell you until the very end, which of course if anybody knew already would have solved the problem...the file I received was in -MSDOS- format, meaning simply that new lines are denoted by -2- characters, carriage return -and- line feed (or \r \n, or \x0d \x0a).
So the sed was correctly finding '-' at the end of a line, and appending the next line, but it couldn't find '-\n' and remove it because it really needed to look for '-\r\n'.
That is, an invisible character. You can't see it but you have to know about it to correctly solve the problem. In my very dim memory of the far past, it seems like this used to be a 'joke' bug, a possibility to blame something unknowable on (because you can't -see- it), when the bug is probably really a thinko.
Anyway, hours wasted on trivialities.
That is all.
No comments:
Post a Comment