iamjerryyeung

Wednesday, February 23, 2005

jdbc:performance

http://www.precisejava.com/javaperf/j2ee/JDBC.htm
Choose optimal isolation level
Isolation level represent how a database maintains data integrity against the problems like dirty reads, phantom reads and non-repeatable reads which can occur due to concurrent transactions. java.sql.Connection interface provides methods and constants to avoid the above mentioned problems by setting different isolation levels.
public interface Connection {
public static final int TRANSACTION_NONE = 0
public static final int TRANSACTION_READ_COMMITTED = 2
public static final int TRANSACTION_READ_UNCOMMITTED = 1
public static final int TRANSACTION_REPEATABLE_READ = 4
public static final int TRANSACTION_SERIALIZABLE = 8
int getTransactionIsolation();
void setTransactionIsolation(int isolationlevelconstant);
}
You can get the existing isolation level with getTransactionIsolation() method and set the isolation level with setTransactionIsolation(int isolationlevelconstant) by passing above constants to this method.
The following table describes isolation level against the problem that it prevents :

Transaction Level
Permitted Phenomena
Performance impact

Dirty reads
Non Repeatable reads
Phantom reads

TRANSACTION_NONE
N/A
N/A
N/A
FASTEST
TRANSACTION_READ_UNCOMMITED
YES
YES
YES
FASTEST
TRANSACTION_READ_COMMITED
NO
YES
YES
FAST
TRANSACTION_REPEATABLE_READ
NO
NO
YES
MEDIUM
TRANSACTION_SERIALIZABLE
NO
NO
NO
SLOW
YES means that the Isolation level does not prevent the problem
NO means that the Isolation level prevents the problem
By setting isolation levels, you are having an impact on the performance as mentioned in the above table. Database use read and write locks to control above isolation levels. Let us have a look at each of these problems and then look at the impact on the performance.
Dirty read problem :
The following figure illustrates Dirty read problem :


Step 1: Database row has PRODUCT = A001 and PRICE = 10
Step 2: Connection1 starts Transaction1 (T1) .
Step 3: Connection2 starts Transaction2 (T2) .
Step 4: T1 updates PRICE =20 for PRODUCT = A001
Step 5: Database has now PRICE = 20 for PRODUCT = A001
Step 6: T2 reads PRICE = 20 for PRODUCT = A001
Step 7: T2 commits transaction
Step 8: T1 rollbacks the transaction because of some problem
The problem is that T2 gets wrong PRICE=20 for PRODUCT = A001 instead of 10 because of uncommitted read. Obviously it is very dangerous in critical transactions if you read inconsistent data. If you are sure about not accessing data concurrently then you can allow this problem by setting TRANSACTION_READ_UNCOMMITED or TRANSACTION_NONE that in turn improves performance otherwise you have to use TRANSACTION_READ_COMMITED to avoid this problem.

Unrepeatable read problem :
The following figure illustrates Unrepeatable read problem :

Step 1: Database row has PRODUCT = A001 and PRICE = 10
Step 2: Connection1 starts Transaction1 (T1) .
Step 3: Connection2 starts Transaction2 (T2) .
Step 4: T1 reads PRICE =10 for PRODUCT = A001
Step 5: T2 updates PRICE = 20 for PRODUCT = A001
Step 6: T2 commits transaction
Step 7: Database row has PRODUCT = A001 and PRICE = 20
Step 8: T1 reads PRICE = 20 for PRODUCT = A001
Step 9: T1 commits transaction
Here the problem is that Transaction1 reads 10 first time and reads 20 second time but it is supposed to be 10 always whenever it reads a record in that transaction. You can control this problem by setting isolation level as TRANSACTION_REPEATABLE_READ.
Phantom read problem :
The following figure illustrates Phantom read problem :
Step 1: Database has a row PRODUCT = A001 and COMPANY_ID = 10
Step 2: Connection1 starts Transaction1 (T1) .
Step 3: Connection2 starts Transaction2 (T2) .
Step 4: T1 selects a row with a condition SELECT PRODUCT WHERE COMPANY_ID = 10
Step 5: T2 inserts a row with a condition INSERT PRODUCT=A002 WHERE
COMPANY_ID= 10
Step 6: T2 commits transaction
Step 7: Database has 2 rows with that condition
Step 8: T1 select again with a condition SELECT PRODUCT WHERE COMPANY_ID=10
and gets 2 rows instead of 1 row
Step 9: T1 commits transaction
Here the problem is that T1 gets 2 rows instead of 1 row up on selecting the same condition second time. You can control this problem by setting isolation level as TRANSACTION_SERIALIZABLE
Choosing a right isolation level for your program:
Choosing a right isolation level for your program depends upon your application's requirement. In single application itself the requirement generally changes, suppose if you write a program for searching a product catalog from your database then you can easily choose TRANSACTION_READ_UNCOMMITED because you need not worry about the problems that are mentioned above, some other program can insert records at the same time, you don't have to bother much about that insertion. Obviously this improves performance significantly.
If you write a critical program like bank or stocks analysis program where you want to control all of the above mentioned problems, you can choose TRANSACTION_SERIALIZABLE for maximum safety. Here it is the tradeoff between the safety and performance. Ultimately we need safety here.
If you don't have to deal with concurrent transactions your application, then the best choice is TRANSACTION_NONE to improve performance.
Other two isolation levels need good understanding of your requirement. If your application needs only committed records, then TRANSACTION_READ_COMMITED isolation is the good choice. If your application needs to read a row exclusively till you finish your work, then TRANSACTION_REPEATABLE_READ is the best choice.
Note: Be aware of your database server's support for these isolation levels. Database servers may not support all of these isolation levels. Oracle server supports only two isolation levels, TRANSACTION_READ_COMMITED and TRANSACTION_SERIALIZABLE isolation level, default isolation level is TRANSACTION_READ_COMMITED.
5. Close Connection when finished
Closing connection explicitly allows garbage collector to recollect memory as early as possible. Remember that when you use the connection pool, closing connection means that it returns back to the connection pool rather than closing direct connection to the database.

Optimization with Statement
Statement interface represents SQL query and execution and they provide number of methods and constants to work with queries. They also provide some methods to fine tune performance. Programmer may overlook these fine tuning methods that result in poor performance. The following are the tips to improve performance by using statement interfaces
1. Choose the right Statement interface
2. Do batch update
3. Do batch retrieval using Statement
2. Close Statement when finished
1. Choose right Statement interface
There are three types of Statement interfaces in JDBC to represent the SQL query and execute that query, they are Statement, PreparedStatement and CallableStatement.
Statement is used for static SQL statement with no input and output parameters, PreparedStatement is used for dynamic SQL statement with input parameters and CallableStatement is used for dynamic SQL satement with both input and output parameters, but PreparedStatement and CallableStatement can be used for static SQL statements as well. CallableStatement is mainly meant for stored procedures.
PreparedStatement gives better performance when compared to Statement because it is pre-parsed and pre-compiled by the database once for the first time and then onwards it reuses the parsed and compiled statement. Because of this feature, it significantly improves performance when a statement executes repeatedly, It reduces the overload incurred by parsing and compiling.
CallableStatement gives better performance when compared to PreparedStatement and Statement when there is a requirement for single request to process multiple complex statements. It parses and stores the stored procedures in the database and does all the work at database itself that in turn improves performance. But we loose java portability and we have to depend up on database specific stored procedures.
2. Do batch update
You can send multiple queries to the database at a time using batch update feature of statement objects this reduces the number of JDBC calls and improves performance. Here is an example of how you can do batch update,
statement.addBatch( "sql query1");
statement.addBatch(" sql query2");
statement.addBatch(" sql query3");
statement.executeBatch();
All three types of statements have these methods to do batch update.
3. Do batch retrieval using Statement
You can get the default number of rows that is provided by the driver. You can improve performance by increasing number of rows to be fetched at a time from database using setFetchSize() method of the statement object.
Initially find the default size by using
Statement.getFetchSize(); and then set the size as per your requirement
Statement.setFetchSize(30);
Here it retrieves 30 rows at a time for all result sets of this statement.
4. Close Statement when finished
Close statement object as soon as you finish working with that, it explicitly gives a chance to garbage collector to recollect memory as early as possible which in turn effects performance.
Statement.close();

Optimization with ResultSet
ResultSet interface represents data that contains the results of executing an SQL Query and it provides a number of methods and constants to work with that data. It also provides methods to fine tune retrieval of data to improve performance. The following are the fine tuning tips to improve performance by using ResultSet interface.
1. Do batch retrieval using ResultSet
2. Set up proper direction for processing the rows
3. Use proper get methods
4. Close ResultSet when finished
1. Do batch retrieval using ResultSet
ResultSet interface also provides batch retrieval facility like Statement as mentioned above. It overrides the Statement behaviour.
Initially find the default size by using
ResultSet.getFetchSize(); and then set the size as per requirement
ResultSet.setFetchSize(50);
This feature significantly improves performance when you are dealing with retrieval of large number of rows like search functionality.
2. Setup proper direction of processing rows
ResultSet has the capability of setting the direction in which you want to process the results, it has three constants for this purpose, they are
FETCH_FORWARD, FETCH_REVERSE, FETCH_UNKNOWN
Initially find the direction by using
ResultSet.getFetchDirection(); and then set the direction accordingly
ResultSet.setFetchDirection(FETCH_REVERSE);
3. Use proper getxxx() methods
ResultSet interface provides lot of getxxx() methods to get and convert database data types to java data types and is flexibile in converting non feasible data types. For example,
getString(String columnName) returns java String object.
columnName is recommended to be a VARCHAR OR CHAR type of database but it can also be a NUMERIC, DATE etc.
If you give non recommended parameters, it needs to cast it to proper java data type that is expensive. For example consider that you select a product's id from huge database which returns millions of records from search functionality, it needs to convert all these records that is very expensive.
So always use proper getxxx() methods according to JDBC recommendations.
4. Close ResultSet when finished
Close ResultSet object as soon as you finish working with ResultSet object even though Statement object closes the ResultSet object implicitly when it closes, closing ResultSet explicitly gives chance to garbage collector to recollect memory as early as possible because ResultSet object may occupy lot of memory depending on query.
ResultSet.close();

Optimization with SQL Query
This is one of the area where programmers generally make a mistake
If you give a query like
Statement stmt = connection.createStatement();
ResultSet rs = stmt.executeQuery("select * from employee where name=RR");
The returned result set contains all the columns data. you may not need all the column data and want only salary for RR.
The better query is "select salary from employee where name=RR"
It returns the required data and reduces unnecessary data retrieval.

Cache the read-only and read-mostly data
Every database schema generally has read-only and read-mostly tables. These tables are called as lookup tables. Read-only tables contain static data that never changes in its life time. Read-mostly tables contain semi dynamic data that changes often. There will not be any sort of writing operations in these tables.
If an application reads data from these tables for every client request, then it is redundant, unnecessary and expensive. The solution for this problem is to cache the read-only table data by reading the data from that table once and caching the read-mostly table data by reading and refreshing with time limit. This solution improves performance significantly. See the following link for source code of such caching mechanism.
http://www.javaworld.com/javaworld/jw-07-2001/jw-0720-cache.html
You can tweak this code as per application requirement. For read-only data, you need not refresh data in its life time. For read-mostly data, you need to refresh the data with time limit. It is better to set this refreshing time limit in properties file so that it can be changed at any time.

Fetch small amount of data iteratively instead of fetching whole data at once
Applications generally require to retrieve huge data from the database using JDBC in operations like searching data. If the client request for a search, the application might return the whole result set at once. This process takes lot of time and has an impact on performance. The solution for the problem is
1. Cache the search data at the server-side and return the data iteratively to the client. For example, the search returns 1000 records, return data to the client in 10 iterations where each iteration has 100 records.
2. Use Stored procedures to return data iteratively. This does not use server-side caching rather server-side application uses Stored procedures to return small amount of data iteratively.
Out of these solutions the second solution gives better performance because it need not keep the data in the cache (in-memory). The first procedure is useful when the total amount of data to be returned is not huge.

Key Points
Use Type two driver for two tiered applications to communicate from java client to database that gives better performance than Type1 driver.
Use Type four driver for applet to database communication that is two tiered applications and three tiered applications when compared to other drivers.
Use Type one driver if you don't have a driver for your database. This is a rare situation because all major databases support drivers or you will get a driver from third party vendors.
Use Type three driver to communicate between client and proxy server ( weblogic, websphere etc) for three tiered applications that gives better performance when compared to Type 1 &2 drivers.
Pass database specific properties like defaultPrefetch if your database supports any of them.
Get database connection from connection pool rather than getting it directly
Use batch transactions.
Choose right isolation level as per your requirement. TRANSACTION_READ_UNCOMMITED gives best performance for concurrent transaction based applications. TRANSACTION_NONE gives best performance for non-concurrent transaction based applications.
Your database server may not support all isolation levels, be aware of your database server features.
Use PreparedStatement when you execute the same statement more than once.
Use CallableStatement when you want result from multiple and complex statements for a single request.
Use batch update facility available in Statements.
Use batch retrieval facility available in Statements or ResultSet.
Set up proper direction for processing rows.
Use proper getXXX() methods.
Close ResultSet, Statement and Connection whenever you finish your work with them.
Write precise SQL queries.
Cache read-only and read-mostly tables data.
Fetch small amount of data iteratively rather than whole data at once when retrieving large amount of data like searching database etc.

0 Comments:

Post a Comment

<< Home