Performance Optimization: Part 1 (Images)
Part 1 of the performance optimization initiative was to reduce the overall size of the transfer.
A big chunk of what we are sending over HTTP were images (and still is). A big issue was our Slider banners. Those are big images (1020x287) and we have around 10 of them on our home page. Optimized versions have approximately 30kb while non-optimized were showing sizes of about 300Kb. Huge difference!
So what we’ve done?
First of all we exported all images from sitecore into the disk.
Then, we processed them through two different applications. For gifs and pngs we opted to use a unique format (PNG) and processed them through an amazing application called PNGOUT (http://advsys.net/ken/utils.htm).
For JPGs we processed them through a similar lossless tool called purejpeg.
This was not enough. We had some banners that were saved as PNGs and were still too big. We then decided to go with JPG for all banners and larger images so part of the project was to convert certain images from PNG to JPG.
Another piece of it was to identifying images that were unnecessarily huge. A couple head-shots that were uploaded to our system had a full resolution so they had more than 1000px of width! All of them show in a control with 150px width only. So we re-sized and optimized those for web distribution.
To export the images we used a code that looks like this:
var items =
Database.GetDatabase("master").SelectItems(
string.Format("fast://sitecore/media library/Images//*"));
Parallel.ForEach(items, item => {
try {
var filename = string.Format(@"C:\SITECORE_ALL_IMAGES\{0}.{1}",
item.ID.ToShortID(),
((MediaItem) item).Extension);
var stream = ((MediaItem) item).GetMediaStream();
using (Stream file = File.OpenWrite(filename)) {
CopyStream(stream, file);
}
}
catch (Exception) {
}
}
);
Then, to re-import the images after processing, we used a code that looks like this:
var pngs = Directory.GetFiles(@"C:\SITECORE_ALL_IMAGES\PNG");
foreach (var png in pngs) {
try {
var mediaItem =
Database.GetDatabase("master").GetItem(new ID(Path.GetFileNameWithoutExtension(png)));
var m = MediaManager.GetMedia(mediaItem);
using (var stream = new FileStream(png, FileMode.Open)) {
m.SetStream(stream, "png");
}
} catch (Exception e) {
context.Response.Write(string.Format("error in {0} : {1}", png, e.Message));
}
}
Performance Optimization
I’ll add a number of posts detailing changes that will directly impact the time it takes for members to load ASTD website pages. Stay tuned!
New Control on Community Pages!
We added a new control to our Community of Practice pages! This control is called “Stay Connected” and can feature a variety of links to Social Sites and RSS feeds.
The social items can be customized by Community of Practice (CoP), so this means that you can link to a specific LinkedIn group or facebook page.
Community Preferences
We believe we addresses some issues that were seen in our “Engage with the Community” control.
Also, we now notify Community Managers of changes made by users in their profiles by email. This gives them a way to engage quickly with members that have shown interest in their community.
Event Tracking: Google Analytics to the Rescue
To see if something is efficient we need to measure its performance.
There is really no way around that.
On our new Home Page and Communities of Practice pages we have a big real state dedicated to an Article slider. Is it being effective? We don’t really know.
Now, on this release, we should be able to see Slider click events on our Google Analytics.
The event structure is:
- Slider
- Page Name (hone page or the community name)
- The destination URL (Title and if not present, the Href
We expect this change to give more power to our content editors and marketing teams.
Canonicals and Google Analytics
In our site we have a notion of a ‘virtual item’ page. I will elaborate this later but this means you can have the same article being served from a “Blog” page but also within a Community page, keeping it’s frame and navigation.
In scenarios like this Google suggests the use of canonical urls:
http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html
And that’s what we’ve been doing since we went live.
Now, we noticed some URL’s were showing up in GA as separate entities and we don’t want that :) Of course, some other people came across the same issue and luckily we have a StackOverflow question that helped us out:
http://stackoverflow.com/questions/9103794/canonical-url-in-analytics
Now we shouldn’t see different URL’s in analytics!
Anchor tags everywhere!
Our marketing department gave a great idea of adding anchor tags everywhere on pages that are structured and therefore a bit longer than landing pages.
Take this page, for example: http://www.astd.org/Education/Certificate-Programs/Action-Learning-Certificate.aspx
We now will have the ability to send them directly to the Facilitators or Audience sections just by using an anchor:
<a name="facilitators"></a>
Did you know?
The anchor name attribute is deprecated on HTML5. They suggest the use of id’s instead.
Here for more: http://dev.w3.org/html5/markup/a.html
Where is my format?
It’s tricky. Our AMS solution has an HTML editor that let’s users enter the brief and long descriptions of products. We did a project a couple months back that used the HtmlAgilityPack to clean the HTML. We had it all: word XML tags, font tags specifying size and type and the regular bolds (strong and b) and italics (i and em). To ensure the site had a consistent look and not a mix of a bunch of different fonts and sizes we opted to remove all HTML tags. Of course, sometimes editors purposely wanted Italics (like for book titles in descriptions) and those were being removed.
The way our process works is that we have a StoredProcedure that feeds a SOLR index (more on Solr later!). This stored procedure made use of a .NET CLR Scalar function that used a simple Regex to clean the HTML. That was the “StripHtml” function. Now, to support the new requirements, I added a new function called “CleanHtml” which just leaves <strong>, <b>, <i> and <em> tags. I’d like to use the AgilityPack for this as it’s a proper HTML parser but SQL doesn’t like it too much. So I just tweeked some Regex I found online.
Here is how the new SQL function looks like:
[Microsoft.SqlServer.Server.SqlFunction]
public static SqlString CleanHtml(string input) {
try {
var result = Regex.Replace(input, "<(?!/?(?:strong|b|em|i))[^>]*>", string.Empty, RegexOptions.IgnoreCase);
return new SqlString(result);
} catch (Exception) {
return new SqlString(input);
}
}
We are approaching version 1.1.5 of our website! I’ll share some details of what’s in this release.
Gravatar and DISQUS
I really like Gravatar’s idea. A centralized and easy-to-use place to store your avatars and simple profile information.
Here we are using DISQUS premium which enables us to have Author-Specific email notifications and a seemless integration to our login system through their SSO.
Currently, we do not ask for images from members and Staff. So if they want their picture shown in comments they can easily set up a Gravatar account.
Gravatar works by using an MD5 hash of an email - so the integration was extremly simple.
Want to learn more about Gravatar? Check it out here.
