PDA

View Full Version : HPC - Data Preloading (Avoiding Commonly accesed I/O)



dgavin
2012-Aug-18, 05:17 PM
One High Performance Computing (HPC) technique is eliminating complex joins, or lookups from data thats usualy in the style of a lookup (description/title for example) table, by preloading it into a binary, or b-tree searchable memory based table.

This can vary by programming language and file systems, and there are many explase available in google, so will just cover this at a summary level.

A. Flat Files (Idexed or not):

A.1. In the case of Cobol, this is almost always advisable. You basicaly preload the data into a indexed memory resident table, and look up values there.
A.2. In the case of most others, this is also almost always advisable. You basicaly preload the data into a indexed array, hastable or dictionary, and look up values there.

B. SQL or Codasyl databases (may not be advisable to always elimiate lookup joins, especialy in the case of report generation, it's usualy better to organize your database with appropriate indexing to speed up joins for that)

B.1. Web sites, and high usage applications. Cache the data tables (such as ones used to build drop down lists, or tree's) that are often read but not often modified, in an application state object pool as a dictionary or hashtable (A pool that doesn't not go out of scope when users log out).
B.2. CSharp/VB .Net. Use the fastes acces method, int he following order of fastest to slowest. Raw ADO.Net, Data Entities Framework (This is only 1.05 time slower then ADO), LINQ (Upto twelve times slower then the others)
B.2.1. LINQ slowness is based on how that platform functions, if you have a collection of rows return, each time you access that collection, LINQ re runs the base query it generated ever time! Cashe this to a dictionary object one time though, actualy is the same perfomace wise as the others.
B.2.2. LINQ2Sql according to Microsoft, is no longer going to be upgraded for newer database engines. Therefore, it is best avoided, instead use the Data Entities Framework as the primary access technology, with the LINQ namespace for the extentions on it's collections.
B.2.3 My recommendation after using all three, is to use Data Entities Framework. It allows for the mapping of stored procedures to language native methods (functions) in the native code, without any coding by the designer. For a change, I'm actulay impressed with what Microsoft has done with this framework.

dgavin
2012-Aug-18, 05:30 PM
Basicaly, these techicques are using built in high speed algorithms , binary seach, binary tree seach, etc, to elimate the slowness of common I/O. As swampyyakee and cjameshuff indicate in another post.

In a week or so, I'll be posting how to use these built in algorithms (from a csharp or java point of view) to actualy perform, dynamic code switching wihtout pointers, of all things.