Thanks Kaz Yes, I did mean ASCII double quotes. Thanks for your TXR code samples. Given the fact that TXR doesn’t seem to have any CSV reader built in, it is impressive how short an ad hoc implementation is! On 30 Nov 2023, at 11:27, Kaz Kylheku wrote: > On 2023-11-28 23:50, halloleo wrote: >> Hi TXR people! >> >> I have CSV files like this sample: >> >> >> col1,col2,col3 >> aaa,“a,b,c“,ccc >> 111,222,“1,200.30“ >> ... > > Are those the actual double quote characters? > > You have: “ (U+201C) > > The ASCII double quote is: " (U+0022) > > I'm going to assume that some e-mail text editor ate your ASCII double quotes, replacing them with Unicode "sixty-sixes". > >> and I want to string-concat the first two columns into a third column directly after the two first columns. So the result file of the sample would be: >> >> col1,col2,col1col3,col3 >> aaa,“a,b,c“,“aaaa,b,c“,ccc >> 111,222,“111222“,“1,200.30“ >> >> How can I do this with TXR? Or is TXR not a good tool for this? > > There are many ways to solve this. > > How important is it to correctly treat all the quoted CSV fields? > > Is it something you will be running regularly on new data? > > Is this just a one-off problem you need solved, without any further investment in doing interesting things with the data? > > We can treat it as a dumb text processing problem, where the example data > captures all the variations that we need to handle: > > $ cat data > col1,col2,col3 > aaa,"a,b,c",ccc > 111,222,"1,200.30" > > $ cat cols.txr > @(repeat) > @ (cases) > @c1,"@c2",@rest > @ (bind out `@c1,@c2,"@c1@c2",@rest`) > @ (or) > @c1,@c2,@rest > @ (bind out `@c1,@c2,@c1@c2,@rest`) > @ (end) > @ (do (put-line out)) > @(end) > > $ txr cols.txr data > col1,col2,col1col2,col3 > aaa,a,b,c,"aaaa,b,c",ccc > 111,222,111222,"1,200.30" > > We can treat CSV with quoted fields and double quotes representing > single quotes in TXR Lisp: > > $ cat csv.tl > (defun csv-split (str) > (flow str > (tok #/[^,]*|"([^"]|"")+"/) > (mapcar (do if (starts-with "\"" @1) > (regsub `""` `"` [@1 1..-1]) > @1)))) > > (defun csv-fmt (list) > (flow list > (mapcar [iffi #/[,"]/ (ret `"@(regsub "\"" "\"\"" @1)"`)]) > `@{@1 ","}`)) > > (defun csv-test () > (whilet ((str (get-line))) > (let* ((fields (csv-split str)) > (csv (csv-fmt fields))) > (put-line `@str -> @(tostring fields) -> @csv`)))) > > $ txr -i csv.tl > TXR's no-spray organic production means every bug is carefully removed by hand. > 1> (csv-test) > a,b,c > a,b,c -> ("a" "b" "c") -> a,b,c > a,b c,c > a,b c,c -> ("a" "b c" "c") -> a,b c,c > a,"b,c",d > a,"b,c",d -> ("a" "b,c" "d") -> a,"b,c",d > a,"b,""c,d""e f",g > a,"b,""c,d""e f",g -> ("a" "b,\"c,d\"e f" "g") -> a,"b,""c,d""e f",g > nil > > This is a strict CSV implementation which doesn't allow spurious > spaces around quoted fields. > > With these CSV functions we can write a loop like this, to do that > col1,col2,col1col2,col3... thing: > > (whilet ((line (get-line))) > (flow line > csv-split > (tree-bind (col1 col2 . rest) @1 > ^(,col1 ,col2 ,`@col1@col2` ,*rest)) > csv-fmt > put-line)) > > Or: > > (whilet ((line (get-line))) > (flow line > csv-split > (let f) > ^(,[f 0] ,[f 1] ,(join [f 0] [f 1]) ,*[f 2..:]) > csv-fmt > put-line)) > > Or other ways.