Big DataSet insertion to multiple tables in MySql

2014-06-05


I have integrated a web service which gives all the hotels & its details found in a city. The response data is so big that when the data is converted to XML file it has size of any where b/w 1-5 MB. The response when converted to dataset  contains 7 tables. I need to save this data to database. The webservice need to be called for all the cities in every country that service provider has. This count of cities is 11500.

The number of records in each of these tables is 

Table
Expected Records
HotelDetails
10 – 1000
HotelImages
100 – 10,000
HotelFacilities
100 – 10,000
HotelAreas
100 – 10,000
HotelReports
100 – 10,000
HotelRoomTypes
100 – 12,000
HotelLocations
100 – 10,000


What's the best way for saving the data to MySql DB in this scenario. The normal insert query method is very slow & it took about 5-10 minutes to save the data of a single service call. This is because of huge number of records to save in this case. In a response on average I will be getting hotels in a city b/w 10-1000 hotels and for each hotel I will be getting images, facilities, areas, reportdetails, roomtypes, locations. 
So I need to save records in a range of 610-63,000 rows in single service call. This to be saved inside a single transaction. So it's clear that the normal insert query is not the best method here since performance is very poor. The reason is described in the MySql doc online Speed of INSERT Statements . It says for an insert query the time required is determined by the following factors, where the numbers indicate propotions:

  1. Connecting: (3)
  2. Sending query to server: (2)
  3. Parsing query: (2)
  4. Inserting row: (1 × size of row)
  5. Inserting indexes: (1 × number of indexes)
  6. Closing: (1)
In addition to this the initial overhead to open tables, which is done for each concurrently executing query.

So sending insert queries to database in this range 610-63,000 is really time consuming & is bad design decision. As I explored options I have gone through the following stages & implemented the last option finally.






The next option after normal insert query option is passing data to a storedprocedure.But saving this XML data not worked, because of the huge data size. An average of 100 hotels & its details such as Images(No of images 10,000 rows), Locations(1000s), Areas(1000s), Facilities(1000s). After converting to xml successfully, Only small number of hotels(about 20-24 hotels) gets saved & the more importantly query time out occurs.
One advantage of this method over the rest options is that I can get the primary key of parent table after saving each parent record to save to its child tables using LAST_INSERT_ID() method.


This is how I tried to save passing data as xml:

XML bulk insertion

A sample of this XML data can be found here




StoredProcedure:>
DELIMITER ;
create procedure gtaCacheSP(in xmlData LONGTEXT)
begin

set @htlIndex = 1;
SET @imgIndex = 1;
set @locIndex = 1;
set @facDetlsIndex = 1;
set @roomTypeIndex = 1;
set @reptDetlsIndex = 1;
set @areaDetlsIndex = 1;


set @totalHotelCount = extractValue(xmlData,'count(/GTACacheDataSet/HotelDetails)');

    while @htlIndex <= @totalHotelCount do

        INSERT INTO atsgen_m_gtahoteldetails 
                    (
                        City_Id,
                        CityCode,
                        HotelCode,
                        HotelName,
                        Copyright,
                        TotalRooms,
                        Latitude,
                        Longitude,
                        StarRating,
                        Category,
                        MapPageLink,
                        Address1,
                        Address2,
                        Address3,
                        Telephone,
                        Fax,
                        Website
                    )
        VALUES    
                    (
                        extractvalue(xmlData,'/GTACacheDataSet/HotelDetails[$@htlIndex]/City_Id'),
                        extractvalue(xmlData,'/GTACacheDataSet/HotelDetails[$@htlIndex]/CityCode'),
                        extractvalue(xmlData,'/GTACacheDataSet/HotelDetails[$@htlIndex]/HotelCode'),
                        extractvalue(xmlData,'/GTACacheDataSet/HotelDetails[$@htlIndex]/HotelName'),
                        extractvalue(xmlData,'/GTACacheDataSet/HotelDetails[$@htlIndex]/Copyright'),
                        extractvalue(xmlData,'/GTACacheDataSet/HotelDetails[$@htlIndex]/TotalRooms'),
                        extractvalue(xmlData,'/GTACacheDataSet/HotelDetails[$@htlIndex]/Latitude'),
                        extractvalue(xmlData,'/GTACacheDataSet/HotelDetails[$@htlIndex]/Longitude'),
                        extractvalue(xmlData,'/GTACacheDataSet/HotelDetails[$@htlIndex]/StarRating'),
                        extractvalue(xmlData,'/GTACacheDataSet/HotelDetails[$@htlIndex]/Category'),
                        extractvalue(xmlData,'/GTACacheDataSet/HotelDetails[$@htlIndex]/MapPageLink'),
                        extractvalue(xmlData,'/GTACacheDataSet/HotelDetails[$@htlIndex]/Address1'),
                        extractvalue(xmlData,'/GTACacheDataSet/HotelDetails[$@htlIndex]/Address2'),
                        extractvalue(xmlData,'/GTACacheDataSet/HotelDetails[$@htlIndex]/Address3'),
                        extractvalue(xmlData,'/GTACacheDataSet/HotelDetails[$@htlIndex]/Telephone'),
                        extractvalue(xmlData,'/GTACacheDataSet/HotelDetails[$@htlIndex]/Fax'),
                        extractvalue(xmlData,'/GTACacheDataSet/HotelDetails[$@htlIndex]/Website')
                    );
        SET @lastHotelID = LAST_INSERT_ID();
         
        ####### INSERT IMAGES AGAINST AN HOTEL ##############
        set @totalImageCount = extractValue(xmlData,'count(/GTACacheDataSet/HotelDetails[$@htlIndex]/Images)');
        set @imgIndex = 1;
        while @imgIndex <= @totalImageCount do

            INSERT INTO atsgen_m_gtahotelimages
                    (
                        Hotel_ID,
                        ItemCode,
                        ImageLink,
                        ThumbNail,
                        ImageDetails,
                        Height,
                        Width
                    )
            VALUES    
                    (
                        @lastHotelID,
                        extractvalue(xmlData,'/GTACacheDataSet/HotelDetails[$@htlIndex]/Images[$@imgIndex]/ItemCode'),
                        extractvalue(xmlData,'/GTACacheDataSet/HotelDetails[$@htlIndex]/Images[$@imgIndex]/ImageLink'),
                        extractvalue(xmlData,'/GTACacheDataSet/HotelDetails[$@htlIndex]/Images[$@imgIndex]/ThumbNail'),
                        extractvalue(xmlData,'/GTACacheDataSet/HotelDetails[$@htlIndex]/Images[$@imgIndex]/ImageDetails'),
                        extractvalue(xmlData,'/GTACacheDataSet/HotelDetails[$@htlIndex]/Images[$@imgIndex]/Height'),
                        extractvalue(xmlData,'/GTACacheDataSet/HotelDetails[$@htlIndex]/Images[$@imgIndex]/Width')
                    );
            SET @imgIndex = @imgIndex + 1;
        END WHILE;

        SET @htlIndex = @htlIndex + 1;
    END WHILE;
END //
DELIMITER ;


I have shown only the first 2 tables save, full stored procedure can be found here


Errors occurred while executing this sp:



/*
199 16:30:39 START TRANSACTION 0 row(s) affected 0.000 sec
200 16:30:39 call gtaCacheSP(@inputXML) Error Code: 2013. Lost connection to MySQL server during query 600.526 sec
201 16:40:40 COMMIT Error Code: 2006. MySQL server has gone away 0.000 sec
*/

/*

15:35:24 call gtaCacheSP(@inputXML) Error Code: 3. Error writing file 'C:\Windows\TEMP\MLA315.tmp' (Errcode: 28 - No space left on device) 587.328 sec

*/

Since the method 1 is not working, even if it worked it's not a best method since it's performance is so bad that it took about 10 minutes to save just 20 hotels of total 250 in a city.As you know I have to call this web service 11500 times and the hotel range varies b/w 10-600 in a city. It takes time to parse xml to extract the data.




So I reached to the next method :LOAD data method. This method takes the data from csv files or txt files where each column values of a row are seperated by a terminator like comma (') operator and each rows by a terminator like new line operator('\r\n') in txt. So to do this you need the data in a file. For that I converted the response coming from webservice to a file.




LOAD DATA LOCAL INFILE 'E:\\GTACache\\htlDetailsTableData.txt' 
INTO TABLE atsgen_m_gtahoteldetails 
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\r\n'; 

or in a more detailed way:


START transaction;
LOAD DATA LOCAL INFILE 'E:\\GTACache\\QueryFiles\\CSVFiles\\HotelDetails.csv' 
INTO TABLE atsgen_m_gtahoteldetails 
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\r\n'
(
Hotel_ID,
City_Id,
CityCode,
HotelCode,
HotelName,
Copyright,
TotalRooms,
Latitude,
Longitude,
StarRating,
Category,
MapPageLink,
Address1,
Address2,
Address3,
Telephone,
Fax,
Website);
COMMIT;

This seems very fast than xml insertion. But there are 2 problems here: one is I cannot save to children tables bcoz no value of the field Hotel_ID is available which is PK of the table atsgen_m_gtahoteldetails & value of this is saved in all the other tables. And the second is even if I generate value for the Pkey manually by removing AutoIncrement , save using this method is sometimes showing more records than actual no. of records. So could not verify the duplicates. This forced me to go to next method:


INSERT INTO table_name ( field1, field2,...fieldN ) VALUES
(a1, a2,...aN ),( b1, b2,...bN ),..;

For using this method I should generate Pkey Value manually for the parent tables PkeyField, then only I can pass this value to rest of the tables. So that I can do the save to all tables inside a single transation. This I had done in the previous method. I have disabled the AutoIncrement of the parent table. And generated the Pkey field HotelID=CityID*10000, CityID will be unique for all the cities. So increment this value by 1 for each hotel in a city. Now performance wise when I compared the LOAD data & multiple rows insert it's slower than LOAD DATA method but very fast than xml insertion method.

 The difference b/w normal insert query & multiple row insert is that only 1 time the following factors occurs in the case of multiple row insert.
  1. Connecting: (3)
  2. Sending query to server: (2)
  3. Parsing query: (2)
  4. Inserting row: (1 × size of row)
  5. Inserting indexes: (1 × number of indexes)
  6. Closing: (1)
In addition to this the intial overhead to open tables, which is done for each concurrently executing query. 

Only the 4th factor is same in both normal & multiple row inserts. Where as in normal consecutive insert queries, these factors comes repeatedly. It took only 3 secs using Multiple row insert in the place of 10 minutes using single insert queries. Using LOAD DATA method it's less than 2 secs. 

The multiple row insert query is generated using this function

Private Sub CacheCities()
        selectQuery = "select Country_Code,City_ID,city_code from atsgen_m_gtacity order by city_ID ASC;"
        Dim dtAllCityCodeInThisCountry As DataTable = New DataTable()
        Dim htlIDPKey As ULong = 0
        Dim countryCode As String = String.Empty
        Dim cityCode As String = String.Empty
        Dim cityID As UShort = 0
        Dim htlIndex As UShort = 0
        Dim queryCollection As List(Of String) = New List(Of String)
        Dim ndxSizeCollection As List(Of Decimal) = New List(Of Decimal)
        Dim cacheSystemToRun As Int16 = 0
        Dim insertScript As String = String.Empty
        Dim indexOfValueClause As Int32 = 0
        Dim HotelInformation_Id As Int32 = 0
        Dim htlDetailsColumnValues As StringBuilder = New StringBuilder(String.Empty)
        Dim htlImagesColumnValues As StringBuilder = New StringBuilder(String.Empty)
        Dim htlRoomTypesColumnValues As StringBuilder = New StringBuilder(String.Empty)

        Dim columnValues As String = String.Empty
        Dim totalHotels As UShort = 0
        Dim noOfCachedCities As UInt16 = 0


        Try
            'Create ErrorLog file if not exist
            If Not System.IO.File.Exists(errorLogPath) Then
                Using fs = System.IO.File.Create(errorLogPath)

                End Using
            End If

            'Clear ErrorLog file
            'Using ErrorLog = New FileStream(errorLogPath, FileMode.Truncate)

            'End Using

            selectQuery = "select Country_Code,Country_Name from atsgen_m_gtacountry;"
            Dim dtCountryCode As DataTable = ReadFromDB()
            Console.WriteLine("Do not close the application; bcoz GTA caching In Progress...")
            For countryIndex = 0 To dtCountryCode.Rows.Count - 1
                noOfCachedCities = 0
                dtAllCityCodeInThisCountry.Reset()
                selectQuery = String.Format("select City_ID,city_code from atsgen_m_gtacity where Country_Code='{0}'", dtCountryCode.Rows(countryIndex)("Country_Code"))
                dtAllCityCodeInThisCountry.Reset()
                dtAllCityCodeInThisCountry = ReadFromDB()
                For cityIndex = 0 To dtAllCityCodeInThisCountry.Rows.Count - 1
                    selectQuery = String.Format("select count(*) from gtacachestatus where City_ID ={0}", dtAllCityCodeInThisCountry.Rows(cityIndex)("City_ID"))
                    If ReadCount() = 0 Then
                        selectQuery = "select EnableCache from gtacachesystemsettings;"
                        cacheSystemToRun = ReadCount()
                        'If cacheSystemToRun = 1 And dtAllCityCodeInThisCountry.Rows(cityIndex)("city_code") = "DXB" Then
                        If cacheSystemToRun = 1 Then
                            queryCollection.Clear()

                            'Reset the columnScript values
                            htlDetailsColumnValues.Clear()
                            htlImagesColumnValues.Clear()
                            htlRoomTypesColumnValues.Clear()

                            'Reset the counts at each Each City



                            countryCode = dtCountryCode.Rows(countryIndex)("Country_Code")
                            cityCode = dtAllCityCodeInThisCountry.Rows(cityIndex)("city_code")
                            cityID = dtAllCityCodeInThisCountry.Rows(cityIndex)("City_ID")
                            Try

                                htlIndex = 0
                                htlIDPKey = dtAllCityCodeInThisCountry.Rows(cityIndex)("City_ID") * 10000
                                GenerateCityCacheRequest(dtAllCityCodeInThisCountry.Rows(cityIndex)("city_code"))


                                GTAServiceCall()


                                SaveToCacheStatusTable(countryCode, cityID, cityCode)


                                If String.IsNullOrEmpty(gtaServiceErrorMessage) AndAlso IsMandatoryTablesExists() Then
                                    If totalRecords > 0 Then
                                        noOfCachedCities += 1
                                    End If

                                    totalHotels = dsGtaCacheResponse.Tables("HotelInformation").Rows.Count
                                    For hotelRowIndex = 0 To dsGtaCacheResponse.Tables("HotelInformation").Rows.Count - 1
                                        HotelInformation_Id = dsGtaCacheResponse.Tables("HotelInformation").Rows(hotelRowIndex)("HotelInformation_Id")

                                        columnValues = HotelDetailsValues(HotelInformation_Id, htlIDPKey, cityID, cityCode)
                                        If Not String.IsNullOrEmpty(columnValues) Then
                                            If htlDetailsColumnValues.Length > 0 Then
                                                htlDetailsColumnValues.Append(",")
                                            End If
                                            htlDetailsColumnValues.Append(columnValues)
                                        End If
                                        If htlIndex = totalHotels - 1 AndAlso htlDetailsColumnValues.Length > 0 Then
                                            queryCollection.Add(String.Format("INSERT INTO atsgen_m_gtahoteldetails (Hotel_ID,City_Id,CityCode,HotelCode,HotelName,Copyright,TotalRooms,Latitude,Longitude,StarRating,Category,MapPageLink,Address1,Address2,Address3,Telephone,Fax,Website) VALUES {0};", htlDetailsColumnValues.ToString()))
                                        End If


                                        columnValues = HotelImagesValues(HotelInformation_Id, htlIDPKey)
                                        If Not String.IsNullOrEmpty(columnValues) Then
                                            If htlImagesColumnValues.Length > 0 Then
                                                htlImagesColumnValues.Append(",")
                                            End If
                                            htlImagesColumnValues.Append(columnValues)
                                        End If
                                        If htlIndex = totalHotels - 1 AndAlso htlImagesColumnValues.Length > 0 Then
                                            queryCollection.Add(String.Format("INSERT INTO atsgen_m_gtahotelimages (Hotel_ID,ItemCode,ImageLink,ThumbNail,ImageDetails,Height,Width) VALUES {0};", htlImagesColumnValues.ToString()))
                                        End If



                                        columnValues = HotelRoomTypeValues(HotelInformation_Id, htlIDPKey)
                                        If Not String.IsNullOrEmpty(columnValues) Then
                                            If htlRoomTypesColumnValues.Length > 0 Then
                                                htlRoomTypesColumnValues.Append(",")
                                            End If
                                            htlRoomTypesColumnValues.Append(columnValues)
                                        End If
                                        If htlIndex = totalHotels - 1 AndAlso htlRoomTypesColumnValues.Length > 0 Then
                                            queryCollection.Add(String.Format("INSERT INTO atsgen_m_gtaroomtypes (Hotel_ID,ItemCode,RoomTypes) VALUES {0};", htlRoomTypesColumnValues.ToString()))
                                        End If



                                        htlIndex += 1
                                        htlIDPKey += 1
                                    Next
                                    Try
                                        'Bulk insertion using Multiple row inserts
                                        SaveToDB(queryCollection)

                                        
                                    Catch mySqlError As MySqlException
                                        Console.WriteLine(String.Format("An Error occured for the City: {0}, ErrorDecription: {1}", cityCode, mySqlError.Message))
                                        Console.WriteLine("Press any key to continue caching or enter N to stop caching.")
                                        Dim key = Console.Read()
                                        If Convert.ToChar(key).ToString().ToUpper() = "N" Then
                                            Exit Sub
                                        End If
                                        LogError(mySqlError)
                                    End Try
                                End If
                                If cityIndex = dtAllCityCodeInThisCountry.Rows.Count - 1 Then
                                    'Save the Index size of tables to DB per country(Save occurs at each countrys last city). 
                                    Console.WriteLine(String.Format("Cached the Country {0}:{1}, Total/Cached: {2}/{3}", dtCountryCode.Rows(countryIndex)("Country_Code"), dtCountryCode.Rows(countryIndex)("Country_Name"), dtAllCityCodeInThisCountry.Rows.Count, noOfCachedCities))


                                    'Get the Index Size of tables & Save the Ndex sizes. 
                                    ndxSizeCollection = GetNdxSizes()
                                    insertQuery = String.Format("INSERT INTO gtacachendxsize (Country_Code,City_ID,CityCode,TotalNdxInMB,HtlDetailsNdxInMB,HtlImagesNdxInMB,HtlRoomTypesNdxInMB,HtlAreaDtlsNdxInMB,HtlFacilitiesNdxInMB,HtlLocationsNdxInMB,HtlReportsNdxInMB) VALUES ('{0}',{1},'{2}',{3},{4},{5},{6},{7},{8},{9},{10})", countryCode, cityID, cityCode, ndxSizeCollection.Sum(), ndxSizeCollection(0), ndxSizeCollection(1), ndxSizeCollection(2), ndxSizeCollection(3), ndxSizeCollection(4), ndxSizeCollection(5), ndxSizeCollection(6))
                                    Try
                                        SaveToDB()
                                    Catch mySqlError As MySqlException
                                        LogError(mySqlError)
                                    End Try
                                End If


                            Catch ex As Exception
                                LogError(ex)
                            End Try
                        ElseIf cacheSystemToRun = 0 Then
                            Console.WriteLine("Caching is disabled.")
                            Exit Sub
                        End If
                    End If                 
                Next

                
            Next
            Console.WriteLine("GTA caching completed.")


        Catch ex As Exception
            LogError(ex)
        End Try
    End Sub

Column values are generated like this for example:


Private Function HotelDetailsValues(ByVal HotelInformation_Id As Int32, ByVal htlIDPkey As Int32, ByVal cityID As Int32, ByVal cityCode As String) As String

        Dim filteredRowsItemDetail() As DataRow = Nothing
        Dim filteredRowsItem() As DataRow = Nothing
        Dim filteredRowsRoomTypes() As DataRow = Nothing
        Dim filteredRowsGeoCodes() As DataRow = Nothing
        Dim filteredRowsLinks() As DataRow = Nothing
        Dim filteredRowsAddressLines() As DataRow = Nothing
        Dim address3And4 As StringBuilder = New StringBuilder(String.Empty)
        Dim htlColumnValues As StringBuilder = New StringBuilder(String.Empty)
        Dim columnValues As StringBuilder = New StringBuilder(String.Empty)
        Try

            Dim HotelInformationRows() As DataRow = dsGtaCacheResponse.Tables("HotelInformation").Select(String.Format("HotelInformation_Id='{0}'", HotelInformation_Id))
            If HotelInformationRows.Count <> 0 Then
                htlColumnValues.Clear()
                With htlColumnValues
                    .Append(String.Format("{0},", htlIDPkey))
                    .Append(String.Format("{0},", cityID))
                    txtFieldValuePrprty = cityCode
                    .Append(String.Format("{0},", txtFieldValue))
                    filteredRowsItemDetail = dsGtaCacheResponse.Tables("ItemDetail").Select(String.Format("ItemDetail_Id={0}", HotelInformation_Id))
                    For Each ItemDetailRow As DataRow In filteredRowsItemDetail
                        filteredRowsItem = dsGtaCacheResponse.Tables("Item").Select(String.Format("ItemDetail_Id={0}", ItemDetailRow("ItemDetail_Id")))
                        If filteredRowsItem.Count > 0 Then
                            txtFieldValuePrprty = getColumnValue(filteredRowsItem(0), "Code")
                            htlCode = txtFieldValue
                            .Append(String.Format("{0},", txtFieldValue))
                            txtFieldValuePrprty = getColumnValue(filteredRowsItem(0), "Item_Text")
                            .Append(String.Format("{0},", txtFieldValue))
                        End If
                    Next
                    If filteredRowsItemDetail.Count > 0 Then
                        txtFieldValuePrprty = getColumnValue(filteredRowsItemDetail(0), "Copyright")
                        .Append(String.Format("{0},", txtFieldValue))
                    Else
                        txtFieldValuePrprty = String.Empty
                        .Append(String.Format("{0},", txtFieldValue))
                    End If
                    If IsMandatoryTablesExists("RoomTypes") Then
                        filteredRowsRoomTypes = dsGtaCacheResponse.Tables("RoomTypes").Select(String.Format("HotelInformation_Id={0}", HotelInformation_Id))
                        If filteredRowsRoomTypes.Count > 0 Then
                            txtFieldValuePrprty = getColumnValue(filteredRowsRoomTypes(0), "RoomCount")
                            .Append(String.Format("{0},", txtFieldValue))
                        Else
                            txtFieldValuePrprty = String.Empty
                            .Append(String.Format("{0},", txtFieldValue))
                        End If
                    Else
                        txtFieldValuePrprty = String.Empty
                        .Append(String.Format("{0},", txtFieldValue))
                    End If
                    If IsMandatoryTablesExists("GeoCodes") Then
                        filteredRowsGeoCodes = dsGtaCacheResponse.Tables("GeoCodes").Select(String.Format("HotelInformation_Id={0}", HotelInformation_Id))
                        If filteredRowsGeoCodes.Count > 0 Then
                            txtFieldValuePrprty = getColumnValue(filteredRowsGeoCodes(0), "Latitude")
                            .Append(String.Format("{0},", txtFieldValue))
                            txtFieldValuePrprty = getColumnValue(filteredRowsGeoCodes(0), "Longitude")
                            .Append(String.Format("{0},", txtFieldValue))
                        Else
                            txtFieldValuePrprty = String.Empty
                            .Append(String.Format("{0},", txtFieldValue))
                            .Append(String.Format("{0},", txtFieldValue))
                        End If
                    Else
                        txtFieldValuePrprty = String.Empty
                        .Append(String.Format("{0},", txtFieldValue))
                        .Append(String.Format("{0},", txtFieldValue))
                    End If
                    
                    txtFieldValuePrprty = getColumnValue(HotelInformationRows(0), "StarRating")
                    .Append(String.Format("{0},", txtFieldValue))
                    txtFieldValuePrprty = getColumnValue(HotelInformationRows(0), "Category")
                    .Append(String.Format("{0},", txtFieldValue))
                    If IsMandatoryTablesExists("Links") Then
                        filteredRowsLinks = dsGtaCacheResponse.Tables("Links").Select(String.Format("HotelInformation_Id={0}", HotelInformation_Id))
                        If filteredRowsLinks.Count > 0 Then
                            For Each LinksRow As DataRow In filteredRowsLinks
                                If IsMandatoryTablesExists("MapLinks") Then
                                    filteredRowsLinks = dsGtaCacheResponse.Tables("MapLinks").Select(String.Format("Links_Id={0}", LinksRow("Links_Id")))
                                    If filteredRowsLinks.Count > 0 Then
                                        txtFieldValuePrprty = getColumnValue(filteredRowsLinks(0), "MapPageLink")
                                        .Append(String.Format("{0},", txtFieldValue))
                                    Else
                                        txtFieldValuePrprty = String.Empty
                                        .Append(String.Format("{0},", txtFieldValue))
                                    End If
                                Else
                                    txtFieldValuePrprty = String.Empty
                                    .Append(String.Format("{0},", txtFieldValue))
                                End If

                            Next
                        Else
                            txtFieldValuePrprty = String.Empty
                            .Append(String.Format("{0},", txtFieldValue))
                        End If
                    Else
                        txtFieldValuePrprty = String.Empty
                        .Append(String.Format("{0},", txtFieldValue))
                    End If

                    If IsMandatoryTablesExists("AddressLines") Then
                        filteredRowsAddressLines = dsGtaCacheResponse.Tables("AddressLines").Select(String.Format("HotelInformation_Id={0}", HotelInformation_Id))
                        If filteredRowsAddressLines.Count > 0 Then
                            txtFieldValuePrprty = getColumnValue(filteredRowsAddressLines(0), "AddressLine1")
                            .Append(String.Format("{0},", txtFieldValue))
                            txtFieldValuePrprty = getColumnValue(filteredRowsAddressLines(0), "AddressLine2")
                            .Append(String.Format("{0},", txtFieldValue))
                            address3And4.Clear()
                            address3And4.Append(getColumnValue(filteredRowsAddressLines(0), "AddressLine3"))
                            If getColumnValue(filteredRowsAddressLines(0), "AddressLine4").Length > 0 Then
                                address3And4.Append(",")
                                address3And4.Append(getColumnValue(filteredRowsAddressLines(0), "AddressLine4"))
                            End If
                            txtFieldValuePrprty = address3And4.ToString()
                            .Append(String.Format("{0},", txtFieldValue))
                            txtFieldValuePrprty = getColumnValue(filteredRowsAddressLines(0), "Telephone")
                            .Append(String.Format("{0},", txtFieldValue))
                            txtFieldValuePrprty = getColumnValue(filteredRowsAddressLines(0), "Fax")
                            .Append(String.Format("{0},", txtFieldValue))
                            txtFieldValuePrprty = getColumnValue(filteredRowsAddressLines(0), "WebSite")
                            .Append(String.Format("{0}", txtFieldValue))
                        Else
                            txtFieldValuePrprty = String.Empty
                            .Append(String.Format("{0},", txtFieldValue))
                            .Append(String.Format("{0},", txtFieldValue))
                            .Append(String.Format("{0},", txtFieldValue))
                            .Append(String.Format("{0},", txtFieldValue))
                            .Append(String.Format("{0},", txtFieldValue))
                            .Append(String.Format("{0}", txtFieldValue))
                        End If
                    Else
                        txtFieldValuePrprty = String.Empty
                        .Append(String.Format("{0},", txtFieldValue))
                        .Append(String.Format("{0},", txtFieldValue))
                        .Append(String.Format("{0},", txtFieldValue))
                        .Append(String.Format("{0},", txtFieldValue))
                        .Append(String.Format("{0},", txtFieldValue))
                        .Append(String.Format("{0}", txtFieldValue))
                    End If

                End With
                columnValues.Append(String.Format("({0})", htlColumnValues.ToString()))
               
            End If
        Catch ex As Exception
            LogError(ex)
        End Try
        Return columnValues.ToString()
    End Function

The Pros of LOAD DATA is : It's fast & Cons: 1.Could not verify the duplicates. Always showing more records than the rows in the csv file. 2. Need to save all the data of all cities to file. 3. Need permission to execute this method.

About the multiple row insert there are no such cons only thing is that it takes more time than LOAD DATA but this time is very negligible.

Another interesting thing occurred while doing this work is that there was about 9 crores of records found in all these 7 tables. Out of this 9 crores, most were duplicates. This occurred due to the missing of proper unique constraints, when caching service called for the same city it got saved each time. This caused duplicates.So I truncated all these 7 tables & added proper unique keys & indexes into these tables, then called the services for all the cities in each country, the number of records came to 12 lakhs. Also the size of the tables when it was 9 crores was 20 GB came to 700 MB in the production database. I didn't expected the records will be reduced to this much, I thought the opposite, that the records will be much more than 9 crores since there are large number of cities yet to cache, even though the existing 9 crores contains duplicates. It took around 5 hours to cache the all 11500 cities, out of which only 6500 cities has got the cache response. The average time to save per city was about 1.5-3 secs.

 The full code can be found here

0 comments: