MTran is a programming language for XML transformation based on MSO (Monadic Second-order Logic) queries.
For more information, please contact: kiki .a.t. kmonos .d.o.t. net.
The following example gathers all <a> elements in the input document, and simply prints them.
result[
{gather x :: x in <a> :: {x}}
]
<html>
<head>
<title>Hello, World</title>
</head>
<body>
<a href="http://www.google.com/">Google</a> and
<a href="http://www.yahoo.com/">Yahoo!</a> are
two famous search engines.
</body>
</html>
<result>
<a href="http://www.google.com/">Google</a>
<a href="http://www.yahoo.com/">Yahoo!</a>
</result>
"Visit" expressions, as their name shows, visits each element in the input satisfying the condition and rewrites them. Other elements are kept unchanged.
{visit x
:: <a>/@href/x :: "http://proxy/-_-" {x}
:: x in <a> :: a[ @target["_blank"] {gather y :: x/y :: {y}} ]
}
<html>
<head>
<title>Hello, World</title>
</head>
<body>
<a target="_blank" href="http://proxy/-_-http://www.google.com/">Google</a> and
<a target="_blank" href="http://proxy/-_-http://www.yahoo.com/">Yahoo!</a> are
two famous search engines.
</body>
</html>
The example below is a template to add a table of contents to a given input XHTML document. It retrieves the heading elements from the input document, constructs a tree of itemized lists that reflect the hierarchical structure of the input, and prepends it to the original document.
{pred subsection(var1 a, var1 b, var2 B, var2 A) =
a<b & b in B & ~?x.(a<x & x<b & x in A);
}
{visit b :: /<html>/b:<body> ::
h1["index"]
ul[
{gather h2 :: h2 in <h2> :: li[ {h2/_:#} ul[" "
{gather h3 :: subsection(h2,h3,<h3>,<h2>) :: li[ {h3/_:#} ul[" "
{gather h4 :: subsection(h3,h4,<h4>,<h3>) :: li[ {h4/_:#} ul[" "
{gather h5 :: subsection(h4,h5,<h5>,<h4>) :: li[ {h5/_:#} ul[" "
]]} ]]} ]]} ]]}
]
{b/_}
}
<html><head><title>Title</title></head><body>
<h1>Title</h1>
<h2>Chapter 1</h2>
<h3>Section 1.1</h3> <p>The quick</p>
<h4>Section 1.1.1</h4> <p>brown fox</p>
<h3>Section 1.2</h3> <p>jumps over</p>
<h2>Chapter2</h2> <p>the lazy</p>
<h3>Section 2.1</h3> <p>dog.</p>
</body></html>
<html><head><title>Title</title></head><body>
<h1>Index</h1>
<ul><li>Chapter 1 <ul>
<li>Section 1.1 <ul>
<li>Section 1.1.1 <ul/></li>
</ul></li>
<li>Section 1.2 <ul/></li>
</ul></li>
<li>Chapter 2 <ul>
<li>Section 2.1 <ul/></li>
</ul></li> </ul>
<h1>Title</h1>
<h2>Chapter 1</h2>
<h3>Section 1.1</h3> <p>The quick</p>
<h4>Section 1.1.1</h4> <p>brown fox</p>
<h3>Section 1.2</h3> <p>jumps over</p>
<h2>Chapter2</h2> <p>the lazy</p>
<h3>Section 2.1</h3> <p>dog.</p>
</body></html>
The following example reads arithmetic expressions using <plus>, <minus>, and <times> operators in MathML 'content' markup, and converts them into MathML 'presentation' markup. No redundant parenthesis are produced.
{
pred single_arg( var1 op ) = ~ex1 c.(op.1.1=c);
pred follows( var1 x, var1 y ) = ex1 p.(p/x & p/y & x<y);
pred need_paren( var1 ap ) =
(ap.0 in <plus> | ap.0 in <minus>) & ex1 op. (
follows(op,ap) & (
(op in <minus> & single_arg(op))
| (op in <minus> & op.1 ~= ap)
| (op in <times>)
));
}
mrow[
{visit x
:: x in <ci> :: mi[ {x/_} ]
:: x in <cn> :: mn[ {x/_} ]
:: x in <apply> & need_paren(x) :: mo["("] {_=x.0} mo[")"]
:: x in <apply> :: {_=x.0}
:: x in <minus> & single_arg(x) :: mo["-"] {_=x.1}
:: x in <plus> :: {_=x.1} {gather y :: follows(x.1,y) :: mo["+"] {y}}
:: x in <minus> :: {_=x.1} {gather y :: follows(x.1,y) :: mo["-"] {y}}
:: x in <times> :: {_=x.1} {gather y :: follows(x.1,y) :: mo["*"] {y}}
}
]
<apply>
<times/>
<cn>1</cn>
<apply> <plus/> <cn>2</cn> <cn>3</cn> </apply>
<apply> <minus/> <cn>4</cn> </apply>
</apply>
<mrow>
<mn>1</mn>
<mo>*</mo>
<mo>(</mo>
<mn>2</mn>
<mo>+</mo>
<mn>3</mn>
<mo>)</mo>
<mo>*</mo>
<mo>(</mo>
<mo>-</mo>
<mn>4</mn>
<mo>)</mo>
</mrow>
Program ::= Expression Expression ::= VisitExpression | GatherExpression | VarExpression | XmlLiteral | Expression+ VisitExpression ::= { visit x :: MSOFormula :: Expression :: MSOFormula :: Expression ... :: MSOFormula :: Expression } GatherExpression ::= { gather x :: MSOFormula :: Expression } VarExpression ::= { x } XmlLiteral ::= elem [ Expression ] | @att [ Expression ] | "string"
MSOFormula ::= MSOFormula & MSOFormula // and | MSOFormula | MSOFormula // or | MSOFormula => MSOFormula // if-then | MSOFormula <=> MSOFormula // equivalent | ~ MSOFormula // not | ex1 x. MSOFormula // there exists an element x s.t. | ex2 X. MSOFormula // there exists a set of elements X s.t. | all1 x. MSOFormula // all elements x satisfies ... | all2 X. MSOFormula // all sets of elements X ... | FstTerm in SndTerm | FstTerm = FstTerm | FstTerm < FstTerm // position comparison with document order | true | false | PathFormula PathFormula ::= x/y // y is a child of x | x//y // y is an descendant of x | x/Y // shorthand for ex1 y. (y in Y & x/y) | x/y/z // shorthand for x/y & y/z | x/y:Y/z // shorthand for x/y & y in Y & y/z | ... // etc FstTerm ::= x | FstTerm.0 // first child | FstTerm.1 // next siblin SndTerm ::= X | <elem> // set of all elements tagged <elem> | @att // set of all atribute node with the name "att" | <*> // set of all element nodes | @* // set of all attribute nodes | # // set of all text nodes
Expression ::= ... | MacroDef MSOFormula ::= ... | MacroUse MacroDef ::= { pred MacroName ( ParameterList ) = MSOFormula ; ... } MacroUse ::= MacroName ( ArgumentList )
Macros are expanded at compile time. Example:
{ pred url(var1 x) = <a>/@href/x | <link>/@href/x; } html[body[ {gather x :: url(x) :: {x} br[]} ]]
Expression ::= ... | { MSOFormula }
Using formulae with one free variable "_" makes it easier to write transformations in the "gather and just print" pattern. For example:
result[ {_ in <a>} ]
means the same thing with
result[ {gather _ :: _ in <a> :: {_}} ]