甲骨文《數(shù)據(jù)倉庫概念》28頁.ppt
,XuXin,PresalesConsultantOracle(China)Co.,Ltd.,數(shù)據(jù)倉庫的概念,Whatis.,數(shù)據(jù)倉庫(DataWarehouse)/數(shù)據(jù)集市(DataMart)決策支持系統(tǒng)(DecisionSupportSystem)聯(lián)機分析處理(OLAP)/ROLAP/MOLAP元數(shù)據(jù)(MetaData)分析指標(Measure)/維(Dimension)星型模型(StarSchema)/雪花模型(SnowSchema)數(shù)據(jù)鉆入/數(shù)據(jù)鉆出(DrillDown/DrillUp)表旋轉(zhuǎn)(TableRotation)數(shù)據(jù)挖掘(DataMining),數(shù)據(jù)倉庫幾大功能,Query/ReportDrillup/DrillDownCompareExceptionForcast,WhatifDataMining,數(shù)據(jù)倉庫實施方法,建立數(shù)據(jù)倉庫需要考慮的因素,擴展性靈活性集成性可靠性,數(shù)據(jù)倉庫專家的建議,需要業(yè)務(wù)人員的積極參與通過原型設(shè)計驗證需求確定數(shù)據(jù)倉庫的范圍,不要試圖Warehouse所有數(shù)據(jù)為不同需求選擇合適工具控制風(fēng)險利用外部Consultant的經(jīng)驗重點放在不同系統(tǒng)的集成,建立數(shù)據(jù)倉庫舉例,UseaBuildingEstateOLTPdatabaseasanexampletoillustratetheconceptsandhowtobuildasuccessfulDataWarehousewhichusedtocheckandforecasttherentalrateandsellamountinHongKong.,步驟1:確定數(shù)據(jù)倉庫的問題范圍,列出4月份香港地區(qū)每日房屋銷售情況找出銷售額大于4百萬的居民住宅項目比較Whampoo和Kornhill地區(qū)上月銷售情況找出售屋數(shù)量最多的前3個地區(qū)截止到當月的累計銷售數(shù)量用圖表反映最佳銷售模式時間序列分析,確定數(shù)據(jù)倉庫的問題范圍,確定業(yè)務(wù)需求和用戶需求:用戶查詢執(zhí)行的頻度系統(tǒng)保留數(shù)據(jù)的年限用戶主要希望從哪些角度,哪些層次分析數(shù)據(jù)數(shù)據(jù)源是哪些系統(tǒng),步驟2:選擇合適的軟硬件平臺,可靠的供應(yīng)商數(shù)據(jù)建模和管理工具易用性開放集中管理性能并行處理,選擇數(shù)據(jù)庫平臺的依據(jù):,前3位的考慮因素:易用性92.4%集中管理65.2%可靠的供應(yīng)商65.2%,數(shù)據(jù)倉庫的考慮因素,(Source:DataWarehouseInstitute-February96),MOLAP還是ROLAP?,ROLAP和MOLAP的功能區(qū)別,TransactionSystems,DecisionSupportSystems,Strategic,Tactical,MDB,RDBMS,DataCache,linkage,步驟3:根據(jù)需要創(chuàng)建新的實體,#Code_no,No_of_transaction,Constructor_ID,Developer_ID,Buildingdate,Purchasedate,Purchaseprice,Address,Area,Apartment,#Code_no,#Transaction_no,Name/Company,HKID,ContactPhone#,ContactAddress,PurchaseDate,PurchasePrice,Owner,#Code_no,#Flat,#Transaction_no,Name,HKID,Occupy_type(P,R),ContactPhone#,ContactAddress,Date,Price,Occupant,Contractor_ID,CompanyName,Address,ContactPhone#,Constructor,#Code_no,#Flat,No_of_trans,Type,Floor,Area(Building),Area(Actual),FlatDetails,Developer_ID,CompanyName,Address,ContactPhone#,Developer,Day,Month,Quarter,Year,Time,Territory,District,Region,Building/Estate,Geographic,Location,Type,Size,Area,HousingTypes,步驟4:確定維表刪除不必要的表,步驟5:建立層次結(jié)構(gòu),Date,1-Jan-94,13-Jun-95,12-Jan-96,12-Apr-96,15-Apr-96,20-Oct-96,20-Oct-96,12-Dec-96,1-Jan-97,31-Mar-97,15-Apr-97,?.,Time,Year,Quarter,Month,Day,TimeHierarchy,步驟6:確定屬性,TypeSizeAreaClass:AttributesofHousingType,HousingType,Occupant,HousingTypedimensionlookuptable,Attributes,步驟7:建立FactTable,確定合適的粒度,Time,Location,Type,Area,OccupantName,PurchasePrice,Rent,?.,SalesFactTable,步驟8:建立數(shù)據(jù)倉庫模型,BuildingEstateOLTPEnvironment,TimeLocationTypeAreaOccupantNamePurchasePriceRent?.,SalesFactTable,DayMonthQuarterYear,Time,TerritoryDistrictRegionBuilding/Estate,GeographicLocation,TypeSizeArea,HousingTypes,#Code_noNo_of_transactionConstructor_IDDeveloper_IDBuildingdatePurchasedatePurchasepriceAddressArea,Apartment,#Code_no#Transaction_noName/CompanyHKIDContactPhone#ContactAddressPurchaseDatePurchasePrice,Owner,#Code_no#Flat#Transaction_noNameHKIDOccupy_type(P,R)ContactPhone#ContactAddressDatePrice,Occupant,Contractor_IDCompanyNameAddressContactPhone#,Constructor,#Code_no#FlatNo_of_transHousingTypeFloorArea(Building)Area(Actual),FlatDetails,Developer_IDCompanyNameAddressContactPhone#,Developer,Transform,BuildingEstateDataWarehouseOLAPEnvironment,步驟9:數(shù)據(jù)倉庫模型優(yōu)化,TimeLocationTypeAreaOccupantNamePurchasePriceRent?.,SalesFactTable,TypeSizeArea,HousingTypes,TimeLocationTypeAreaOccupantNamePurchasePriceRent?.,SalesFactTable,DayMonthQuarterYear,Time,TerritoryDistrictRegionBuilding/Estate,GeographicLocation,TypeSizeArea,HousingTypes,Starschema,Snowflakeschema,數(shù)據(jù)倉庫設(shè)計優(yōu)化的原則,避免數(shù)據(jù)實時匯總(建立匯總表)減少表連接操作(不要超過3-5個)用IDcode作關(guān)鍵字減少I/O競爭利用分區(qū)技術(shù)提高性能和可管理性,估算數(shù)據(jù)倉庫容量的算法,Estimatedsizeofdatabase=98*96*20*1000*0.75=141.12Mb,步驟10:從業(yè)務(wù)系統(tǒng)中抽取數(shù)據(jù)到數(shù)據(jù)倉庫,數(shù)據(jù)抽取的要求:可訪問各種數(shù)據(jù)源可滿足時間要求可滿足數(shù)據(jù)轉(zhuǎn)換要求可檢測源系統(tǒng)中數(shù)據(jù)的變化,步驟11:開發(fā)前端應(yīng)用,步驟12:數(shù)據(jù)倉庫的管理,安全管理備份和恢復(fù)高可用性數(shù)據(jù)時效,