application|asp.net|session|对象
Figure 3 Caching
The amount of data you can reach at any level is different, but the right doses are determined on a per-application basis.
Also different from layer to layer is the time needed to retrieve data. Session, in most cases, is an in-process and in-memory object. Nothing could be faster. Keeping Session lean is critical because it is duplicated for each connected user. For quick access to data that can be shared between users, nothing is better than Cache or Application. Cache is faster and provides for automatic decay and prioritization. Relatively large amounts of frequently used static data can be effectively stored in any of these containers.
Disk files serve as an emergency copy of data. Use them when you don't need or can't afford to keep all the data in memory, but when going to the database is too costly. Finally, DBMS views are just like virtual tables that represent the data from one or more tables in an alternative way. Views are normally used for read-only data, but under certain conditions they can be updateable.
Views can also be used as a security mechanism to restrict the data that a certain user can access. For example, some data can be available to users for query and/or update purposes, while the rest of the table remains invisible. And table views can constitute an intermediate storage for preprocessed or post-processed data. Therefore, accessing a view has the same effect for the application, but doesn't cause preprocessing delays or place any locks on the physical table.
XML Server-side Data Islands
Caching is particularly useful when you have a large amount of data to load. However, when the amount of data is really huge, any technique—either on the client or the server—can hardly be optimal. When you have one million records to fetch, you're out of luck. In such situations, you can reduce the impact of the data bulk by using a layered architecture for caching by bringing the concept of client-side data islands to the server. An XML data island is a block of XML that is embedded in HTML and can be retrieved through the page's DOM. They're good at storing read-only information on the client, saving round-trips.
Used on the server, an XML data island becomes a persistent bulk of information that you can store in memory, or (for scalability) on disk. But, how do you read it back? Typically, in .NET you would use DataSet XML facilities to read and write. For lots of data (say, one million records), caching this way is not effective if you don't need all records in memory. Keeping all the records in a single file makes it heavier for the system. What about splitting records into different XML files that are organized like those in Figure 4? This expands the level of XML disk files shown in Figure 3.
Figure 4 Dividing Records for Performance
You can build up an extensible tree of XML files, each representing a page of database records. Each time you need a block of non-cached records, you fetch them from the database and add them to a new or existing XML data island. You would use a special naming convention to distinguish files on a per-session basis, for example, by appending a progressive index to the session ID. An index file can help you locate the right data island where a piece of data is cached. For really huge bulks of data, this minimizes the processing on all tiers. However, with one million records to manage there is no perfect tool or approach.
Automatic Cache Bubble-up
Once you have a layered caching system, how you move data from one tier to the next is up to you. However, ASP.NET provides a facility that can involve both a disk file and a Cache object. The Cache object works like an application-wide repository for data and objects. Cache looks quite different from the plain old Application object. For one thing, it is thread-safe and does not require locks on the repository prior to reading or writing.
Some of the items stored in the Cache can be bound to the timestamp of one or more files or directories as well as an array of other cached items. When any of these resources change, the cached object becomes obsolete and is removed from the cache. By using a proper try/catch block you can catch the empty item and refresh the cache.
String strFile;
strFile = Server.MapPath(Session.SessionID + ".xml");
CacheDependency fd = new CacheDependency(strFile);
DataSet ds = DeserializedDataSource();
Cache.Insert("MyDataSet", ds, fd);
To help the scavenging routines of the Cache object, you can assign some of your cache items with a priority and even a decay factor that lowers the priority of the keys that have limited use. When working with the Cache object, you should never assume that an item is there when you need it. Always be ready to handle an exception due to null or invalid values. If your application needs to be notified of an item's removal, then register for the cache's OnRemove event by creating an instance of the CacheItemRemovedCallback delegate and passing it to the Cache's Insert or Add method.
CacheItemRemovedCallback onRemove = new
CacheItemRemovedCallback(DoSomething);
The signature of the event handler looks like this:
void DoSomething(String key, Object value,
CacheItemRemovedReason reason)
From DataSet to XML
When stored in memory, the DataSet is represented through a custom binary structure like any .NET class. Each and every data row is bound with two arrays: one for the current value and one for the original value. The DataSet is not kept in memory as XML, but XML is used for output when the DataSet is remoted through app domains and networks or serialized to disk. The XML representation of a DataSet object is based on diffgrams—a subset of the SQL Server 2000 updategrams. It is an optimized XML schema that describes changes the object has undergone since it was created or since the last time changes were committed.
If the DataSet—or any contained DataTable and DataRow object—has no changes pending, then the XML representation is a description of the child tables. If there are changes pending, then the remoted and serialized XML representation of the DataSet is the diffgram. The structure of a diffgram is shown in Figure 5. It is based on two nodes, <before> and <after>. A <before> node describes the original state of the record, while <after> exposes the contents of the modified record. An empty <before> node means the record has been added and an empty <after> node means the node has been deleted.
The method that returns the current XML format is GetXml, which returns a string. WriteXml saves the content to a stream while ReadXml rebuilds a living instance of the DataSet object. If you want to save a DataSet to XML, use WriteXml directly (instead of getting the text through GetXml) then save using file classes. When using WriteXml and ReadXml, you can control how data is written and read. You can choose between the diffgram and the basic format and decide if the schema information should be saved or not.
Working with Paged Data Sources
There is a subtler reason that makes caching vital in ASP.NET. ASP.NET relies heavily on postback events, so when posted back to the server for update, any page must rebuild a consistent state. Each control saves a portion of its internal state to the page's view state bag. This information travels back and forth as part of the HTML. ASP.NET can restore this information when the postback event is processed on the Web server. But what about the rest? Let's consider the DataGrid control.
The DataGrid gets its contents through the DataSource property. In most cases, this content is a DataTable. The grid control does not store this potentially large block of data to the page's view bag. So, you need to retrieve the DataTable each time a postback event fires, and whenever a new grid page is requested per view. If you don't cache data, you're at risk. You repeatedly download all the data—say, hundreds of records—just to display the few that fit into the single grid page. If data is cached, you significantly reduce this overhead. This said, custom paging is probably the optimal approach for improving the overall performance of pagination. I covered the DataGrid custom paging in the April 2001 issue. Although that code was based on Beta 1, the key points apply. I'll review some of them here.
To enable custom pagination, you must set both the AllowPaging and AllowCustomPaging properties to True. You can do that declaratively or programmatically. Next, you arrange your code for pagination as usual and define a proper event handler for PageIndexChanged. The difference between custom and default pagination for a DataGrid control is that when custom paging is enabled, the control assumes that all the elements currently stored in its Items collection—the content of the object bound to the DataSource property—are part of the current page. It does not even attempt to extract a subset of records based on the page index and the page size. With custom paging, the programmer is responsible for providing the right content when a new page is requested. Once again, caching improves performance and scalability. The caching architecture is mostly application-specific, but I consider caching and custom pagination vital for a data-driven application.
Data Readers
To gain scalability I'd always consider caching. However, there might be circumstances (such as highly volatile tables) in which project requirements lead you to consider alternative approaches. If you opt for getting data each time you need it, then you should use the DataReader classes instead of DataSets. A DataReader class is filled and returned by command classes like SqlCommand and OleDbCommand. DataReaders act like read-only, firehose cursors. They work connected, and to be lightweight they never cache a single byte of data. DataReader classes are extremely lean and are ideal for reading small portions of data frequently. Starting with Beta 2, a DataReader object can be assigned to the DataSource property of a DataGrid, or to any data-bound control.
By combining DataReaders with the grid's custom pagination, and both with an appropriate query command that loads only the necessary portions of records for a given page, you can obtain a good mix that enhances scalability and performance. Figure 6 illustrates some C# ASP.NET code that uses custom pagination and data readers.
As mentioned earlier, a DataReader works while connected, and while the reader is open, the attached connection results in busy. It's clear that this is the price to pay for getting up-to-date rows and to keep the Web server's memory free. To avoid the overturn of the expected results, the connection must be released as soon as possible. This can happen only if you code it explicitly. The procedure that performs data access ends as follows:
conn.Open();
dr = cmd.ExecuteReader(CommandBehavior.CloseConnection);
return dr;
You open the connection, execute the command, and return an open DataReader object. When the grid is going to move to a new page, the code looks like this:
grid.DataSource = CreateDataSource(grid.CurrentPageIndex);
grid.DataBind();
dr.Close();
Once the grid has been refreshed (DataBind does that), explicitly closing the reader is key, not only to preserve scalability, but also to prevent the application's collapse. Under normal conditions, closing the DataReader does not guarantee that the connection will be closed. So do that explicitly through the connection's Close or the Dispose method. You could synchronize reader and connection by assigning the reader a particular command behavior, like so:
dr = cmd.ExecuteReader(CommandBehavior.CloseConnection);
In this way, the reader enables an internal flag that automatically leads to closing the associated connection when the reader itself gets closed.
SQL Statements
The standards of the SQL language do not provide special support for pagination. Records can be retrieved only by condition and according to the values of their fields, not based on absolute or relative positions. Retrieving records by position—for example, the second group of 20 records in an sorted table—can be simulated in various ways. For instance, you could use an existing or custom field that contains a regular series of values (such as 1-2-3-4) and guarantee its content to stay consistent across deletions and updates. Alternatively, you could use a stored procedure made of a sequence of SELECT statements that, through sorting and temporary tables, reduces the number of records returned from a particular subset. This is outlined in this pseudo SQL:
— first n records are, in reverse order, what you need
SELECT INTO tmp TOP page*size field_names
FROM table ORDER BY field_name DESC
— only the first "size" records are, in reverse order,
— copied in a temp table
SELECT INTO tmp1 TOP size field_names FROM tmp
— the records are reversed and returned
SELECT field_names FROM tmp1 ORDER BY field_name DESC
You could also consider T-SQL cursors for this, but normally server cursors are the option to choose when you have no other option left. The previous SQL code could be optimized to do without temporary tables which, in a session-oriented scenario, could create serious management issues as you have to continuously create and destroy them while ensuring unique names.
More efficient SQL can be written if you omit the requirement of performing random access to a given page. If you allow only moving to the next or previous page, and assume to know the last and the first key of the current page, then the SQL code is simpler and faster.
Conclusion
Caching was already a key technique in ASP, but it's even more important in ASP.NET—not just because ASP.NET provides better infrastructural support for it, but because of the architecture of the Web Forms model. A lot of natural postback events, along with a programming style that transmits a false sense of total statefulness, could lead you to bad design choices like repeatedly reloading the whole DataSet just to show a refreshed page. To make design even trickier, many examples apply programming styles that are only safe in applications whose ultimate goal is not directly concerned with pagination or caching.
The take-home message is that you should always try to cache data on the server. The Session object has been significantly improved with ASP.NET and tuned to work in most common programming scenarios. In addition, the Cache object provides you with a flexible, dynamic, and efficient caching mechanism. And remember, if you can't afford caching, custom paging is a good way to improve your applications.