Tuesday, May 3, 2011

How to limit string word count in XSLT 1.0?

How can I limit a string's word count in XSLT 1.0?

From stackoverflow
  • How about something like:

      <xsl:template match="data"> <!-- your data element or whatever -->
        <xsl:call-template name="firstWords">
          <xsl:with-param name="value" select="."/>
          <xsl:with-param name="count" select="4"/>
        </xsl:call-template>
      </xsl:template>
    
      <xsl:template name="firstWords">
        <xsl:param name="value"/>
        <xsl:param name="count"/>
    
        <xsl:if test="number($count) >= 1">
          <xsl:value-of select="concat(substring-before($value,' '),' ')"/>
        </xsl:if>
        <xsl:if test="number($count) > 1">
          <xsl:variable name="remaining" select="substring-after($value,' ')"/>
          <xsl:if test="string-length($remaining) > 0">
            <xsl:call-template name="firstWords">
              <xsl:with-param name="value" select="$remaining"/>
              <xsl:with-param name="count" select="number($count)-1"/>
            </xsl:call-template>
          </xsl:if>
        </xsl:if>
      </xsl:template>
    
  • This is an XSLT 1.0 solution:

    <xsl:stylesheet version="1.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     xmlns:ext="http://exslt.org/common"
    >
    
       <xsl:import href="strSplit-to-Words.xsl"/>
    
       <xsl:output indent="yes" omit-xml-declaration="yes"/>
    
        <xsl:template match="/">
          <xsl:variable name="vwordNodes">
            <xsl:call-template name="str-split-to-words">
              <xsl:with-param name="pStr" select="/"/>
              <xsl:with-param name="pDelimiters" 
                              select="', &#9;&#10;&#13;()-'"/>
            </xsl:call-template>
          </xsl:variable>
    
          <xsl:call-template name="strTakeWords">
            <xsl:with-param name="pN" select="10"/>
            <xsl:with-param name="pText" select="/*"/>
            <xsl:with-param name="pWords"
                 select="ext:node-set($vwordNodes)/*"/>
          </xsl:call-template>
        </xsl:template>
    
        <xsl:template match="word" priority="10">
          <xsl:value-of select="concat(position(), ' ', ., '&#10;')"/>
        </xsl:template>
    
        <xsl:template name="strTakeWords">
          <xsl:param name="pN" select="10"/>
          <xsl:param name="pText"/>
          <xsl:param name="pWords"/>
          <xsl:param name="pResult"/>
    
          <xsl:choose>
              <xsl:when test="not($pN > 0)">
                <xsl:value-of select="$pResult"/>
              </xsl:when>
              <xsl:otherwise>
                <xsl:variable name="vWord" select="$pWords[1]"/>
                <xsl:variable name="vprecDelims" select=
                   "substring-before($pText,$pWords[1])"/>
    
                <xsl:variable name="vnewText" select=
                    "concat($vprecDelims, $vWord)"/>
    
                  <xsl:call-template name="strTakeWords">
                    <xsl:with-param name="pN" select="$pN -1"/>
                    <xsl:with-param name="pText" select=
                          "substring-after($pText, $vnewText)"/>
                    <xsl:with-param name="pWords" select=
                         "$pWords[position() > 1]"/>
                    <xsl:with-param name="pResult" select=
                     "concat($pResult, $vnewText)"/>
                  </xsl:call-template>
              </xsl:otherwise>
          </xsl:choose>
        </xsl:template>
    
    </xsl:stylesheet>
    

    when this transformation is applied on the following XML document:

    <t>
    (CNN) -- Behind closed doors in recent days,
    senior White House aides have been saying that
    measuring President Obama's first 100 days
    is the journalistic equivalent of a Hallmark holiday.
    </t>
    

    the wanted result is returned:

    (CNN) -- Behind closed doors in recent days, senior White House

    Do note:

    1. The str-split-to-words template from FXSL is used for tokenization.

    2. This template accepts a parameter pDelimiters which is a string consisting of all characters that should be treated as delimiters. Thus, in contrast with other solutions, it is possible to specify every delimiter (and not just a "space") -- in this case 8 of them.

    3. The named template strTakeWords calls itself recursively to accumulate the text before and including every word from the wordlist produced by the tokenization, until the specified number of words has been processed.

    Dimitre Novatchev : @Will My solution doesn't require any extension function except the EXSLt node-set(), which is implemented internally by most XSLT 1.0 processors (such as the .NET XslCompiledTransform class). Any FXSL 1.x template is written in pure XSLT and the only used extension function is the already mentioned EXSLT node-set() function. Therefore, there is absolutely no obstacle to use this solution inside an internal network -- just use any XSLT 1.0 processor, which implements the EXSLT node-set() function (such as .NET XslCompiledTransform, Saxon 6, Xalan, JD, ..., etc.)

0 comments:

Post a Comment