On 2023-04-25 04:58, Roger Mason wrote:
> Hello,
>
> I have files like this (Si.in):
>
> 'Si' : spsymb
> 'silicon' : spname
> -14.0000 : spzn
> 51196.73454 : spmass
> 0.534522E-06 2.2000 47.8169 400 : rminsp, rmt, rmaxsp, nrmt
> 7 : nstsp
> 1 0 1 2.00000 T : nsp, lsp, ksp, occsp, spcore
> 2 0 1 2.00000 T
> 2 1 1 2.00000 T
> 2 1 2 4.00000 T
> 3 0 1 2.00000 F
> 3 1 1 1.00000 F
> 3 1 2 1.00000 F
> 1 : apword
> 0.1500 0 F : apwe0, apwdm, apwve
> 0 : nlx
> 2 : nlorb
> 0 2 : lorbl, lorbord
> 0.1500 0 F : lorbe0, lorbdm, lorbve
> 0.1500 1 F
> 1 2 : lorbl, lorbord
> 0.1500 0 F : lorbe0, lorbdm, lorbve
> 0.1500 1 F
I have the impression that the indentation of the data indicates
a nesting level, so that there is a hierarchy.
A general approach is possible to parse the whole along these lines.
We define a simple data structure to represent a frame.
- A frame consists of headings, rows and children.
- The headings is a list of strings like ("lorbl" "lorbord").
- The rows are a vector of lists of items, which we can tokenize
into strings and floating-point (or possibly more finely: we
could have T and F be t and nil Lisp objects or whatever).
- Children are other frames, listed below a certain frame, if
they are indented by one from that frame.
According to this, I wrote a prototype program:
(defstruct frame ()
headings
rows
children)
(defun tokenize-data (str)
(let ((toks (tok #/'.*'|[^ ]+/ str)))
(collect-each ((tok toks))
(match-case tok
(@(@f (tofloat)) f)
(@(and @(starts-with "'") @(ends-with "'")) [tok 1..-1])
(@else tok)))))
(defun table-data-read (: (stream *stdin*))
(let ((stack (vector 32))
(prev-level 0))
(build
(whilet ((line (get-line stream)))
(let ((level (match-regex line #/ */)))
(if (< level 32)
(match-case line
(`@data : @headings`
(let ((fr (new frame
headings (spl ", " headings)
rows (vec (tokenize-data data))
children (vec))))
(set [stack level] fr)
(if (eql 1 level)
(add fr)
(iflet ((parent [stack (pred level)]))
(vec-push parent.children fr)))))
(`@data`
(iflet ((current [stack level]))
(vec-push current.rows (tokenize-data data)))))))))))
(prinl (table-data-read))
Note that this contains a hack: that the root level is 1 rather
than 0. This is because the sample data's root node is indented by
one. See the expression (eql 1 level).
The program produces the following data (which I reformatted
manually).
Is this barking up the right tree?
(#S(frame headings ("spsymb")
rows #(("Si"))
children #())
#S(frame headings ("spname")
rows #(("silicon"))
children #(#S(frame headings ("spzn")
rows #((-14.0))
children #(#S(frame headings ("spmass")
rows #((51196.73454))
children #())))
#S(frame headings ("rminsp" "rmt" "rmaxsp" "nrmt")
rows #((5.34522e-7 2.2 47.8169 400.0))
children #(#S(frame headings ("nstsp")
rows #((7.0))
children #())
#S(frame headings ("nsp" "lsp" "ksp" "occsp" "spcore")
rows #((1.0 0.0 1.0 2.0 "T")
(2.0 0.0 1.0 2.0 "T")
(2.0 1.0 1.0 2.0 "T")
(2.0 1.0 2.0 4.0 "T")
(3.0 0.0 1.0 2.0 "F")
(3.0 1.0 1.0 1.0 "F")
(3.0 1.0 2.0 1.0 "F"))
children #())
#S(frame headings ("apword")
rows #((1.0))
children
#(#S(frame headings ("apwe0" "apwdm" "apwve")
rows #((0.15 0.0 "F"))
children #())))
#S(frame headings ("nlx")
rows #((0.0))
children #())
#S(frame headings ("nlorb")
rows #((2.0))
children #())
#S(frame headings ("lorbl" "lorbord")
rows #((0.0 2.0))
children #(#S(frame headings ("lorbe0" "lorbdm" "lorbve")
rows #((0.15 0.0 "F")
(0.15 1.0 "F"))
children #())))
#S(frame headings ("lorbl" "lorbord")
rows #((1.0 2.0))
children #(#S(frame headings ("lorbe0" "lorbdm" "lorbve")
rows #((0.15 0.0 "F")
(0.15 1.0 "F"))
children #()))))))))