Saturday, October 11, 2014

String Concatenation

One of the things beginner .NET developers should have learned is that it’s not a good idea to add strings (e.g. s = s + “text”)  in .NET languages to concatenate them. The reason for this is that .NET string are immutable, meaning they cannot be changed once created. Because of that, each time you add strings it’s creates a new string object which replaces the existing string, thus leaving an object that needs to be disposed of. For the occasional concatenation this isn’t a big problem, but if you are doing this in a loop you could end up creating a lot of unnecessary objects. The solution to this problem is to use the System.Text.StringBuilder object which let’s you concatenate strings without creating extra objects.
In this article I want to take a closer look at string concatenation to understand what is actually happening behind the scenes. Let’s start with this block of code

string s1 = "one";
string s2 = "two";
string s3 = s1 + s2;

If we compile this code and then de-compile it  (in this case I am using Telerik’s JustDecompile) here is the resulting code:

string s1 = "one";
string s2 = "two";
string s3 = string.Concat(s1, s2);

You will notice that the addition of strings was replaced by the compiler with a call to the string.Concat function. Let’s take a look at the code for Concat. I have stripped out some of the code from the function just to show the important part.
int length = str0.Length;
string str = string.FastAllocateString(length + str1.Length);
string.FillStringChecked(str, 0, str0);
string.FillStringChecked(str, length, str1);
return str;

This code uses FastAllocateString to create a new string object that is big enough to hold both of the strings being concatenated. It then uses FillStringChecked to place the two strings in the correct spots in the new string. Finally it returns the new string. Here is where you can see the new object being created which replaces the old object that was in s3 in our original code.

There are versions of the String.Concat function that take up to four strings. So if you were to do s5 = s1+s2+s3+s4, this would call String.Concat(s1,s2,s3,s4). What happens after four strings? For example this code:

string s1 = "one";
string s2 = "two";
string s3 = "three";
string s4 = "four";
string s5 = "five";
string s6 = s1 + s2 +s3 +s4 +s5;

compiles to:
string s1 = "one";
string s2 = "two";
string s3 = "three";
string s4 = "four";
string s5 = "five";
string[] strArrays = new string[5];
strArrays[0] = s1;
strArrays[1] = s2;
strArrays[2] = s3;
strArrays[3] = s4;
strArrays[4] = s5;
string s6 = string.Concat(strArrays);

This version of Concat will scan through the array adding up the total length of all the strings, create a new string this length and then load each string into the correct location. This does create one extra object, the array, but it doesn’t create a new object for each concatenation.

One other thing to note about string concatenations. If for some reason you did something like this, maybe for purposes of code clarity:

string s1 = "one" + "two" + "three" + "four";
Console.WriteLine(s1);

the resulting code will look like this:

string s1 = "onetwothreefour";
Console.WriteLine(s1);

The compiler is smart enough to know that you are adding together 4 string literals so it automatically creates a single literal for you.
In conclusion, concatenation isn’t always bad in .NET programs. As long as all the concatenation happens in one line you don’t end up with a lot of extra objects. But if you are concatenating to a single string over multiple lines or within a loop, it’s a good idea to use a StringBuilder instead.

Sunday, June 29, 2014

Excel Graphs with EPPlus

In my last post I showed how to write data to an Excel file using the EPPlus library. I this post we will look at a more advanced topic, how to create a graph on an Excel worksheet.
We will start by using the same code as in my last post to create a new workbook, worksheet, and then add some sample data to it.

using (var package = new ExcelPackage())
{
var workbook = package.Workbook;
var worksheet = workbook.Worksheets.Add("Sheet1");

worksheet.Cells["C1"].Value = "Widgets";
worksheet.Cells["C2"].Value = 20;
worksheet.Cells["C3"].Value = 5;
worksheet.Cells["C4"].Value = 30;
worksheet.Cells["C5"].Value = 32;
worksheet.Cells["C6"].Value = 17;
           
worksheet.Cells["B2"].Value = "Jan";
worksheet.Cells["B3"].Value = "Feb";
worksheet.Cells["B4"].Value = "Mar";
worksheet.Cells["B5"].Value = "Apr";
worksheet.Cells["B6"].Value = "May";

In this sample data we have labels in column B and values in column C. The first step in creating the chart is to create a new chart object in our worksheet, we do this with the AddChart function:

var chart = worksheet.Drawings.AddChart("chart", eChartType.ColumnStacked);

AddChart takes two parameters, the first is a name for the chart and the second is the type of chart. EPPlus supports a large number of different chart types, in this case we will use a stacked column (bar) chart. Next we need to add a data series to our chart:

var series = chart.Series.Add(“C2:C6", "B2:B6");           

The Series.Add function takes two parameters. The first specifies a range of cells that will be used for the values in the series, so we set this to C2:C6 which contains our numeric data. We skip C1 since it contains the column label, we will come back to this in a minute. The second parameter specifies a range of cells that contain the labels for our X-Axis, so we point this to B2-B6 which contains our month names.

If we were to run this now the series would end up with a default name in the legend of the chart. We can fix this using this line:

series.HeaderAddress = new ExcelAddress("'Sheet1'!C1");

With the HeaderAddress property of the series we can set which cell we want the name of the series to come from. Note that for this to work you must specify the full address of the cell including the sheet name. If you leave out the sheet name the name of the series will remain blank.

Since this is a stacked column chart we can add more then one series. If we had additional numeric data in column D we could create a second series like this:


var series2 = chart.Series.Add("D2:D6", "B2:B6");               
series2.HeaderAddress = new ExcelAddress("'Sheet1'!D1");

One thing that is missing from EPPlus is the ability to change the style (color, etc.) of a series, you have to stay with the default styles.

The final step is to save the workbook:

package.SaveAs(new System.IO.FileInfo(@"c:\temp\demoOut.xlsx"));

If you run this you will get the following results:

image

Saturday, May 24, 2014

Writing Excel Files using EPPLus

In my last post I introduced the EPPLus library and showed how to use it to read Microsoft Excel XLSX files. In this post I will show how we can use it to write these files. Writing is pretty similar to reading but there are a lot more options you are likely to use when writing. Here is the basic code you need to create an XLSX file:

using (var package = new ExcelPackage())
{
var workbook = package.Workbook;
var worksheet = workbook.Worksheets.Add("Sheet1");
var cell = worksheet.Cells["A1"];

cell.Value = 100;
package.SaveAs(new System.IO.FileInfo(@"c:\temp\demoOut.xlsx"));
}

We start out by creating an ExcelPackage object. Since we are going to be creating a new file, we don’t pass a FileInfo object to the constructor like we did when reading an existing file. Next we add a new worksheet to the Worksheets collection of the package. Now that we have a worksheet we can access the cells just like we did when reading, in this case we have a variable called ‘cell’ which points to cell A1. Next we put a value in the cell, in this case the number 100. Finally we need to save the file, we do this by calling the SaveAs function on the package. Note, just like when we open a file to read, SaveAs does NOT take a file name as a parameter, but a FileInfo object instead.

Beyond setting cell values, EPPLus also allows you to change the appearance of the sheet. To change the appearance of a single cell we use the Style property on the cell. For example here is how we would change the font properties on a cell using the cell variable we created above.

cell.Style.Font.Name = "Arial";
cell.Style.Font.Bold = true;
cell.Style.Font.Size = 12;

We can change the background color of a cell like this:

cell.Style.Fill.PatternType = OfficeOpenXml.Style.ExcelFillStyle.Solid;
cell.Style.Fill.BackgroundColor.SetColor(System.Drawing.Color.Yellow);

First we have to set the PatternType for the background, in this case we will make it solid. Setting the PatternType is required before you can change the color, the second line will throw an exception if the PatternType is not set. In the second line we set the color. If you are doing this from a console application you will need to add a reference to System.Drawing to be able to use System.Drawing.Color.

Another common Excel formatting task is to create borders around a cell. Here is how we would set a thick border around an entire cell:

cell.Style.Border.BorderAround(OfficeOpenXml.Style.ExcelBorderStyle.Thick);

You can also set each part of the border individually by using the Bottom, Top, Left, and Right properties of the Border. Each of these has a Color and Style property to set the appearance of the border.

So far we have been setting the style on a single cell, but we can also use these properties on a group of cells. For example this code will set the color and border on the block of cells between B1 and C10:

var cells = worksheet.Cells["B1:C10"];
cells.Style.Fill.PatternType = OfficeOpenXml.Style.ExcelFillStyle.Solid;
cells.Style.Fill.BackgroundColor.SetColor(System.Drawing.Color.Blue);                cells.Style.Border.BorderAround(OfficeOpenXml.Style.ExcelBorderStyle.MediumDashed);

You will notice that each cell is set to blue, but the border is put around the whole block, not around each cell.

Besides setting properties on cells you can also work with the properties on columns and rows. For example here is how to set the width of a column:
worksheet.Column(1).Width = 100;

It’s important to remember that columns and rows start numbering at one no zero. EPPLus will allow you to use column and row index 0, but you will get an error when you try to open the workbook in Excel. To hide a column you can do this:

worksheet.Column(1).Hidden = true;

There are also similar properties for rows:

worksheet.Row(1).Height = 20;
worksheet.Row(1).Hidden = true;

These are some of the basic tasks you will do when creating an Excel file, but EPPLus lets you do almost anything with a workbook that you can do directly in Excel.

Saturday, May 10, 2014

Working with XLSX files

It’s not unusual when writing applications, especially business applications, then you will have a need to work with Microsoft Excel files. If you are working with .XLSX format files there is an excellent open source library called EPPlus that makes it’s easy to both read and write these files. In my next few posts I want to show how to use this library to handle various tasks. I will start with a discussion of how to use EPPlus to read from existing XLSX files.
First you will need to download the library. The easiest way to do this is to open your Visual Studio Project and in the Package Manager Console type:
Install-Package EPPlus
For the purposes of these posts I will be using version 3.1.3. At the time of this writing Version 4.0 is in the works but it is currently Beta, so I am going to stick with the older stable version for now.
I am going to be doing this demo in a Console application but you could also use the library in a Winforms or even a web application. First you need to import the following namespace:

using OfficeOpenXml;

Now let’s say that I know I have a number in cell B2 of the spreadsheet, here is the code to read that value:

static void Main(string[] args)
{
  using (var package = new ExcelPackage(new System.IO.FileInfo(@"c:\temp\demo.xlsx"))) {
       var workbook = package.Workbook;
       var worksheet = workbook.Worksheets[1];

       double data = (double)worksheet.Cells["B2"].Value;
       Console.WriteLine(data);
  }

  Console.ReadLine();
}

First we need to open the Excel file we want to read. We do this by creating a new ExcelPackage object. ExcelPackage won’t take a filename as a parameter so you have to either pass in an existing FileStream or, as I did in this case, a FileInfo object. I have put everything in a using block to be sure the ExcelPackage gets disposed properly.

Once we have the package open, we get the Workbook object. Workbook contains a collection of all the Worksheets in the file. In this case we will assume there is only one worksheet so we will get the first one into worksheet variable. Note that the Worksheet starts at 1 not 0. Now that we have the worksheet we want to work with we can access any cell by using the Cells property. Here we are getting the Value in cell B2 which will return an object type that in this case we case to a double. In a real application you will want to do some error checking to be sure that cell really contains the type you are expecting.

Another way you can access cells is to use their column and row coordinate like this:

double data = (double)worksheet.Cells[2,2].Value;

Again, note that the rows and columns starting numbering at 1 not 0. This method is useful when you need to iterate over a series of cells. You can also iterate over a specific range of cells like this:


var range = worksheet.Cells["B2:B5"];
foreach (var cell in range)
{
    Console.WriteLine(c.Value);
}

Up until this point we have been using the Value property of the cell which get the raw value. There is also a property called Text which returns what would be displayed in the cell as a string. For example if a cell contained the value .5 and it was formatted as a percentage, Value would return the double value .5, but Text would return the string “50%”. You can also access the formula in a cell by using the Formula property which will return a string containing the formula, or an empty string if the cell does not have a formula.

One final way of querying cells is to use a LINQ query:

var query = (from c in worksheet.Cells["B2:B13"]
                  where c.Value is double
                  && (double)c.Value > 7
                  select c);
foreach (var c in query) Console.WriteLine(c.Address + " " + c.Value.ToString());

This piece of code searches for all cells in the range of B2 to B13 that have a value greater than seven and then prints that address (e.g. B2) and the value. Note that in the where clause we first check if the value is a double in case the cell contains something that can’t be cast to a double.

Saturday, April 19, 2014

Enum Tricks

In this article I want to talk about a few tricks for using enums in .NET programs. The examples I will provide are in C# but the concepts apply to VB.NET programs also.
Enums are useful when you have a variable that needs to hold only a specific set of values. For example what if we had a class that represented a sales order and it had a status property that could be set to Pending, Open, Shipped or Closed. We could simply use a string for this but this is very error prone, for example at one point in our program we might set the status to Complete instead of the correct value Closed. This problem can be solved by using an enum like this:

public enum StatusEnum
{
    Pending,
    Open,
    Shipped,
    Closed
}
public StatusEnum OrderStatus {get; set;}  


Now we can only set OrderStatus to one of the values in the enum. What if we have a situation where we still need to deal with the string value for Status, for example if we want to store it in a database. We can handle this by adding a public string Status property to our class then changing the OrderStatus property to a StatusEnum type and having it get and set the string value. Here is what this code would look like:

public string Status { get; set; }

public StatusEnum OrderStatus
{
    get
    {
        return (StatusEnum)Enum.Parse(typeof(StatusEnum), Status);
    }
    set
    {
        Status = value.ToString();
    }
}

The setter part is pretty simple, you just call ToString() on the enum value to convert it to a string. The getter is a little more complicated but still one line. To convert from a string back to an enum we use the Enum.Parse function passing it the type of the enum we want to convert to and the string we want to convert. Any time we want to work with the status in code we can use the OrderStatus property so we can work with the value safely, and then we can use the string Status property when we want to read/write to a database or display the status value.

One important thing to note about this getter function. If the string contains a value that isn’t in the enum, the get will throw an exception. Whenever you want to the user to input a value for status it is best to provide a drop-down list of options so that they cannot enter an invalid value. The Enum class provides a function that makes this easy:

UIOrderStatus.DataSource = Enum.GetNames(typeof(SalesOrder.StatusEnum)); 

This will fill the ComboBox control UIOrderStatus with the values in the enum, this way if you add a value to the enum you don’t have to remember to add it to your drop downs. If you still have situations where the user could provide an incorrect status value, for example if you are importing data from a file, or in the case of a web application were you can’t trust the values coming back from the page you can validate the value like this:

Enum.IsDefined(typeof(StatusEnum), statusName)) 

This will check if the string statusName matches one of the values in the enum. This will return true if there is a match, and false if there isn’t.

Even with all these safeguards you could still have a bug in your program that allows an invalid status value to get through. The exception thrown by the get will be a good thing in this case but it still could be a little tricky to find the bug since we won’t necessarily know where the invalid value was set. To help with this we can make the following change to the string Status property:

public string Status
{
    get { return status; }
    set
    {
        if (!Enum.IsDefined(typeof(StatusEnum), value)) throw new ArgumentException("Status value " + value + " is invalid");
        status = value;
    }
} 

With this code the value is checked at the time the status is set, so we will get the exception immediately which will make it easier to track down the bug.

Now, what if we wanted to provide longer descriptions for each enum value? We could easily write a function that uses a switch statement to provide a description for each enum value, but a better way is put an attribute on each enum item that contains the description, this will allow us to keep the enum and descriptions all in one place. To do this we will use the Description attribute from the System.ComponentModel namespace:

public enum StatusEnum
{
    [Description("Order is pending Sales review")]
    Pending,
    [Description("Order is open and ready for fulfillment")]
    Open,
    [Description("Order has shipped")]
    Shipped,
    [Description("Order has been invoiced and closed")]
    Closed
}

To easily access the description we will write an extension method that can be used on any enum to get an item desription:

public static string ToDescription(this Enum en) 
{
    Type type = en.GetType(); 

    MemberInfo[] memInfo = type.GetMember(en.ToString()); 

    if (memInfo != null && memInfo.Length > 0)
    {
        object[] attrs = memInfo[0].GetCustomAttributes(typeof(DescriptionAttribute), false);
        if (attrs != null && attrs.Length > 0) return ((DescriptionAttribute)attrs[0]).Description;
    } 

    return en.ToString();
}

This function is used like this:

Console.WriteLine(order.OrderStatus.ToDescription());

This will display the description of the the OrderStatus. If the enum value doesn’t have a description attribute that name of the value will be displayed instead.