Blog Splash

An Ultra Fast CSS Minify Algorithm

by Kerido Saturday, January 30, 2010 6:10 AM

Introduction

You've probably thought a lot about ways you can optimize your Web site. Since the release of this new site, I've been thinking about it a lot. In this post I would like to introduce an advanced version of the CSS minify algorithm. There are several existing implementations available, but I took the one integrated into BlogEngine.NET, as a reference. Compared to it, my algorithm offers several benefits:

Concept

As opposed to the built-in CSS minify algorithm which uses regular expressions, the current algorithm is more like a state machine. This is why you might think it's harder to read and debug. This is why I am writing this post.

First, here are valid CSS states that the algorithm uses:

enum CssState
  { Punctuation, Token, StringD, StringS }

The CSS code is always considered Punctuation if a different state does not apply. Curly braces, square brackets, parentheses – this is all Punctuation. The Token state includes all tag names, element IDs and classes, properties and values – anything that consists of alpha-numeric characters plus a limited number of auxiliary characters – .#_-@*%(). And finally, there are two string states. StringD represents a double-quoted string, StringS – single-quoted. I'll cover more about the reason why the latter two are required, in the Handling Strings section.

The algorithm reads the input character by character and determines if a sequence of whitespace characters can be removed from it. It also handles comments pretty much the same way. Here is the code illustrating the idea:

public static string Minify(string theCss)
{
  // Assume that the length of the output string
  // will be at most 75% the length of the incoming one to avoid
  // additional StringBuilder reallocations.
  StringBuilder aRet = new StringBuilder(theCss.Length * 3 / 4);

  int aNumChars = theCss.Length;

  CssState aPrevState = CssState.Punctuation;
  int aPrevPos = 0;

  int i = 0;

  while(i < aNumChars)
  {
    CssState aCurState = GetState(theCss, ref i, aPrevState);

    if(i > aPrevPos + 1)
    {
      // If whitespace is found between two tokens, keep it compact
      if (aPrevState != CssState.Punctuation && aCurState != CssState.Punctuation)
        aRet.Append(' ');

      // Otherwise, no whitespace is needed, skip everything between aPrevPos and i
    }

    aPrevPos = i;
    aPrevState = aCurState;
    aRet.Append(theCss[i++]);
  }

  return aRet.ToString();
}

As you can see, whitespace must not be removed is when it delimits token or string characters. Examples of this case are:

border-bottom: solid 2px #9f4c1f;
background-position: center top;
quotes: '« ' ' »';
quotes: "»" "«" "\2039" "\203A";

In all other cases, the whitespace can be trimmed. Again, there is a special case when whitespace is found inside a string, but we'll cover it in the next section.

The key of the algorithm is the GetState method. It skips any comments and whitespace characters found in the incoming string, updates the position variable i to a value represented by a meaningful state:

static CssState GetState(string theCss, ref int thePos, CssState theCurState)
{
  int aLen = theCss.Length;
  int i = thePos;

  if (theCurState == CssState.StringD)
  {
    //REMOVED FOR COMPACTNESS
  }
  else if (theCurState == CssState.StringS)
  {
    //REMOVED FOR COMPACTNESS
  }


  bool aSkip = true;

  while(aSkip)
  {
    /////////////////////////////////////////
    // Skip whitespace
    while(aSkip = (i < aLen - 1 && IsWhitespaceChar(theCss[i]) ) )
      i++;

    /////////////////////////////////////////
    // Skip comments
    if(i < aLen - 1)
      if (theCss[i] == '/' && theCss[i+1] == '*') // comment opening
      {
        aSkip = true;

        while(i < aLen - 1)
        {
          i++;

          if(theCss[i-1] == '*' && theCss[i] == '/') // comment closing
          {
            i++;
            break;
          }
        }
      }
  }

  thePos = i;
  if ( IsTokenChar( theCss[i] ) )
    return CssState.Token;

  else if ( theCss[i] == '\"' )
    return CssState.StringD;

  else if(theCss[i] == '\'')
    return CssState.StringS;

  else
    return CssState.Punctuation;
}

For for the sake of saving space I have removed several lines, so let's move on to see what they are meant for.

Handling Strings

According to the CSS specification, two types of strings are supported: single-quoted and double-quoted. My algorithm leaves the string intact, even if whitespace characters are found inside it. Of course, for the HTML language it may not matter at all since all whitespace characters will be merged by the browser. But I just think that keeping the string unmodified is more intuitive and professional. In order achieve that, support needs to be added for escaped quote characters. An escaped single quote character inside a single-quoted string does not close the string. Similarly, an escaped double quote character does not close a double quoted string:

if (theCurState == CssState.StringD)
{
  if(theCss[i] == '\"')
  {
    // Make sure the double quote character is not escaped
    if(thePos > 0)
      if(theCss[i-1] == '\\')
        return CssState.StringD;

    // Enforce a whitespace afterwards
    return CssState.Token;
  }
  else
    return CssState.StringD;
}
else if (theCurState == CssState.StringS)
{
  if(theCss[i] == '\'')
  {
    // Make sure the single quote character is not escaped
    if(thePos > 0)
      if(theCss[i-1] == '\\')
        return CssState.StringS;

    // Enforce a whitespace afterwards
    return CssState.Token;
  }
  else
    return CssState.StringS;
}

There is a special example attribute in the reference CSS file that illustrates the differences in the way the two algorithms work:

angledouble: "Angle=             00deg00'00\"      ";
anglesingle: 'Angle=             00deg00\'00"      ';

Although these attributes make no sense to a browser, they represent valid CSS syntax. My version produces the same string while the BlogEngine.NET version trims several spaces from it. I consider this a flaw even though string length gets reduced.

Performance Testing

The suggested algorithm provides both speed and size wins. The code below lists two methods I used to measure output length and processing time of the two algorithms:

[TestMethod]
public void CssMinifyTest_Length()
{
  string aCss = Properties.Resources.style;

  string aMin1 = CssMinifier.Minify(aCss);
  string aMin2 = CssHandler.StripWhitespace(aCss);

  Console.WriteLine(
    string.Format("KO Software output ({0} bytes):", aMin1.Length) );
  Console.WriteLine(aMin1);

  Console.WriteLine("---------------------------------");

  Console.WriteLine(
    string.Format("BlogEngine.NET output ({0} bytes):", aMin2.Length) );
  Console.WriteLine(aMin2);

  Assert.IsTrue(aMin1.Length <= aMin2.Length);
}

[TestMethod]
public void CssMinifyTest_Speed()
{
  int aNumCycles = 50000;
  string aCss = Properties.Resources.style;

  DateTime aStart = DateTime.Now;
  for(int i = 0; i < aNumCycles; i++)
  {
    string aMin = CssMinifier.Minify(aCss);
  }
  DateTime aEnd = DateTime.Now;

  TimeSpan aTime_AspxGear = aEnd - aStart;
  Console.WriteLine(aTime_AspxGear.ToString());


  aStart = DateTime.Now;
  for(int i = 0; i < aNumCycles; i++)
  {
    string aMin = CssHandler.StripWhitespace(aCss);
  }
  aEnd = DateTime.Now;


  TimeSpan aTime_BE = aEnd - aStart;
  Console.WriteLine(aTime_BE.ToString());


  Assert.IsTrue(aTime_AspxGear <= aTime_BE);
}

The results are illustrated in the table below:

KO Software CSS Minifier BlogEngine.NET CSS Minifier
Output size, bytes 6,888 7,274
50,000 iteration processing time (Debug build) 33 seconds 1 minute and 25 seconds
50,000 iteration processing time (Release build) 15 seconds 1 minute and 20 seconds

Conclusion

The suggested algorithm produces output which is smaller by almost 400 bytes. And it runs more than 5 times faster! Interestingly, the Release build is more than two times faster than the Debug one while the BlogEngine.NET version yields only subtle performance boost. I suspect, this is because the latter uses regular expressions which are already built with most possible optimizations, regardless of our build type.

I am going to contact the BlogEngine.NET team and e-mail my code as a patch to their source code repository. I think, if this code is properly tested in various environments, it can surely improve web site stability and reduce traffic. As an old school programmer, I am still interested in ways of improving the code. So please, download the source code and leave comments with bugs and improvements.

Restoring a SQL Server Database that Is Missing the Log File

by Suojatar Tuesday, January 26, 2010 9:31 AM

To restore Database without the LDF file:

  1. Create a dummy database with the same name.
  2. Stop SQL Server and replace the dummy MDF with the one in question, leaving the dummy LDF file intact.
  3. Restart SQL Server – the database will appear in the Enterprise Manager with a gray icon, as "Suspicious".
  4. Switch on the "Emergency mode" for this database by running the following script:
    EXEC sp_configure 'allow updates', 1
    RECONFIGURE WITH OVERRIDE
    GO
    
    BEGIN TRAN
    
    UPDATE master..sysdatabases
    SET status = status | 32768
    WHERE name = 'your_name_here'
    
    IF @@ROWCOUNT = 1
    BEGIN
       COMMIT TRAN
       RAISERROR('emergency mode set', 0, 1)
    END
    ELSE
    BEGIN
       ROLLBACK
       RAISERROR('unable to set emergency mode', 16, 1)
    END
    GO
    
    EXEC sp_configure 'allow updates', 0
    RECONFIGURE WITH OVERRIDE
    GO
  5. Stop SQL Server.
  6. Rename or remove the Log File.
  7. Start SQL Server.
  8. Create the new Log:
    DBCC REBUILD_LOG
    (
     'your_name_here',
     'C:\Program Files\Microsoft SQL Server\MSSQL\Data\your_name_here_Log.LDF'
    )
    
  9. Set the Multi-User mode (otherwise the database will appear as (DBO Use Only):
    ALTER DATABASE your_name_here SET MULTI_USER
  10. Remove the emergency mode:
    EXEC sp_configure 'allow updates', 1
    RECONFIGURE WITH OVERRIDE
    GO
    
    BEGIN TRAN
    
    UPDATE master..sysdatabases
    SET status = status & ~32768
    WHERE name = 'your_name_here'
    
    IF @@ROWCOUNT = 1
    BEGIN
      COMMIT TRAN
      RAISERROR('emergency mode removed', 0, 1)
    END
    ELSE
    BEGIN
      ROLLBACK
      RAISERROR('unable to remove emergency mode', 16, 1)
    END
    GO
    
    EXEC sp_configure 'allow updates', 0
    RECONFIGURE WITH OVERRIDE
    GO

* If for some reason it is impossible to rebuild the log file, after restarting SQL Server (step 7) the Import data functionality in Enterprise Manager will be available. It is useful to create another empty database and import the data and objects into this new database. Use the "Copy database objects" (third option) to copy tables, stored procedures and data. It may be necessary NOT to copy database roles and object-level permissions, as well as SQL server logins as this may cause the transfer to fail. Copying any stored procedures referring to non-existent db objects will also fail.

N.B. Graphical representation of a database running in Emergency mode is not available in Enterprise Manager, however, the structure of tables can be seen (and scripted!) in Query Analyzer.

Longing for KO Approach 0.4.4

by Kerido Wednesday, January 20, 2010 3:25 AM

We're working hardly on a new release of our flagship product – KO Approach. This release is going to catch up with all the new things that occurred to the Windows desktop computing world so far. The product is finally compatible with Windows 7 which will enable the users of this new OS to open files and folders even quicker!

Approach Items is being redesigned to display user-selectable items. From now on, this feature will be fully extensible allowing developers to provide custom content to Approach Items. As an example, we've included a custom item displaying a list of recently used commands from the Run dialog box. With these new concepts, the user will be able to better organize most frequently accessed items, files and folders.

We've added new options into Titlebar Menus to improve user experience. Browse the entire folder hierarchy, from Desktop to the current folder, with CTRL+click on an Explorer window's title bar. Instruct Titlebar Menus to order items as you like: parent-to-child or child-to-parent. The current folder can also be automatically highlighted.

As usual, Scope and InstantWave, the two companions to KO Approach, allow for previewing graphics files as well as playing sound files, right from KO Approach menus! And there are more plugins coming soon!

Approach will consume even less resources thanks to advanced optimization techniques. But most importantly, upgrading to KO Approach 0.4.4 will be completely free!

git: Archiving Files Changed Between Two Revisions

by Kerido Monday, January 18, 2010 8:48 AM

Just recently I started using git and I'm pretty excited about it. Today I needed to obtain a ZIP archive containing only files that were changed or added between a known revision and the current head (I believe, that's roughly the same as trunk in SVN). The solution I came upon is not fully automated, but it's still a HUGE time saver:

  1. Obtain a list of edited files:
    git diff --name-only HEAD __TAG_OR_REVISION__ > out.txt
    After running this command the file out.txt will contain the list of modified file names (including deleted ones), one per line.
  2. Open the out.txt file in a text editor and merge the contents into one line (i.e. replace CR/LF with a space). I also had to manually remove a few files which I didn't need for the archive.
  3. Copy this huge line to clipboard.
  4. Finally, generate the archive:
    git archive --format=zip HEAD __PASTE_HERE__ > out.zip

I'm not a command line expert and I'd love to know a better way. I'm sure it's possible on Linux, but I primarily use Windows, so it might take a cmd.exe geek to sort things out.

A Phrase of Inspiration

by Kerido Monday, January 18, 2010 7:06 AM

This is a phrase that came to me after watching Seth Godin's inspiring video Ideas That Spread, Win:

Many ideas were born from sharing the idea that ideas can be shared.