Loose list of possible optimizations in the xml module

This is a loose list of optimizations items, which the guys which work on the xml project find out. This list expands if any one found more items. The status tells what happens with this items. If an item is implemented, then the status change and the build version is add.

Here are also all items listed which have to do in modules like sw, sc, graphics and so to optimize e.g. the save and load performance of the new fileformat.

If anybody find also some items, where we may change the code then he can write a mail to me (Sascha.Ballach@germany.sun.com)or add a task into the IssueZilla.

Here can anybody see the current state of our optimizations and the current problems.

Here is the complete overview as OpenOffice.org 6.0 file.

Status

Module

Type

Description

D

 

ALL

Runtime

At the moment is the getPropertyStates method implemented as a loop which calls the getPropertyState method for every Property. So the Property has to be searched again and again. Therefor this method needs a lot of string compares. We have to use a better implementation which use, that the getPropertyStates method gets the PropertyNames in alphabetical order.

P

 

ALL

Runtime

We have to implement a XMultiPropertySet interface and use it instead of the XPropertySet interface to get and set the style properties. (Done in some modules)

P

 

ALL

Code/Runtime

Don’t export XML tokens as character pointer in an own object file. This expand the size of each DLL which use any of these XML tokens. This can change to a GetToken(nId) and IsToken(nId).

D

 

SC

Runtime

Remember the style names of the cells so they have not to get again. (save spreadsheet document)

D

 

SC

Runtime

Collect all cells and his styles and merge the cells to ranges and set the styles on this ranges. (load spreadsheet document)

D

 

SC

Runtime

Remember all API objects which are used again and again, so they have not to create so often.

D

 

SC

Runtime

Find a better way to find out whether a cell contains a annotation or not.

D

 

SC

Runtime

The method getFormula() change in every document which is not English the language to get the English formula. This is not necessary.

I

 

ALL

Code

Some Attributes contains the old virtual importXML and exportXML methods. These can remove, because the XML attribute export is changed

I

 

ALL

Runtime

Use MultipropertySet in the API for PageDescriptor’s

I

 

XML

Runtime

The package component has two major performance problems:

  1. When saving, all zip file data is written to a buffer in memory which is then passed to the UCB to be written to disk. If there are multiple OLE objects, this could be crippling. Data should be written to disk, buffered.

  2. When loading, whole streams are read into memory at once and are then read from memory. They should be uncompressed and written to temporary files and be read from there instead

P

 

SW

Runtime

The XML load/save creates every time a DrawModel (SwDoc::MakeDrawModel). This must change, so its only created if its needed.

D

 

SC

Runtime

On save and load of edit cells the method GetTextForwarder() makes to much to often. This is not necessary. On every creation of a new API Object like XTextRange, XTextCursor and so on on the same cell creates a new object and set all the properties and so on. The method should remember the created object and give this again and again.

C

 

SC

Runtime

On save of a document the method GetOutputStringImpl ask for the numberformat on every cell. On documents with more languages this needs very long, because the language is every time changed. The calling method knows the numberformat, but there is no way to give this to the method with the API.

This optimization is changed to a optimization of the ChangeIntl method and the use of an English NumberFormatter to ask whether a string cell use a NumberFormat.

D

 

SC

Runtime

Extend the API to get all formated ranges with one call. This means to make a possibility to get all formated ranges with the same format as XCellRanges object. So we have to ask for the autostyle only one time.

The same should be possible while loading a document. Here we need a possibility to create a XCellRanges object and give them all ranges with the same format and then set the format on all this ranges in one time.

Finally we have to use this extensions.

I

 

SC

Runtime

Use a English NumberFormatter to reduce the calls of the ChangeIntl method.

I

 

SC

Runtime

We have also to optimize the GetTextForwarder method for the saving of the PageStyles.

I

 

XML

Runtime

The GetKeyByAttrName method should be optimized so the string will be created only once and then the string should be cached. This increase the performance, because the same strings are used very often.

P


XML

Runtime

Creating the PropertySetMappers takes a long time, mainly due to string creation. This can be sped up by storing the string length in the maps.

dvo: As it turns out, the string length doesn't help a whole lot. Copying the string from sal_Char* to sal_Unicode* and the corresponding malloc seems to take most of the time. So I'm afraid there's nothing we can do about this.

I


XML

Runtime

When loading/saving small documents, reading/setting the document settings takes a long time. In the writer, reading the document settings takes about as long as importing 200 paragraphs. This should be investigated.

I


XML

Runtime

The NumberFormatter takes quite long to find the first data style for a particular language. Thus, exporting the first textfield that uses a number format takes about as long as exporting 25 paragraphs.

dvo: I can't think of any way to avoid this, though.

P


XML

Runtime

The NumberFormatsSupplier is always obtained from the document, even though it is not used. That takes quite some time in small documents that don't use it.

dvo: creating the NumberFormatsSupplier on-demand fixes the problem. However, this requires an incompatible build, so it is not checked in yet.

I


XML

Runtime

Resetting existing styles to default takes a long time during import. (use XMultiPropertsStates interface). Currently this uses getPropertySet and then sets them to default one-by-one.

P


XML/Writer

Runtime

All styles for a style family are obtained from the document through an XIndexAccess, which is then used to iterate over all styles. The current Writer implementation for that is really, really slow (probably true for the other apps as well, but I haven't checked).

dvo: I have tried replacing the XIndexAccess with an XNameAccess. This speeds obtaining all styles up from 280ms to 10ms, which is almost a factor of 30. However, I'm not sure if other apps have the same performance characteristics. Also, a properly implemented getByIndex should resolve this issues, too. Therefore, I will not commit this 'fix'.

dvo: I submitted a performance bug about this (#90251#)

D


Writer

Runtime

The conversion between UI names and internationalized names is slow.

I


Writer

Runtime

The IsPhysical property with styles is much slower than most other properties.

I


XML / Writer

Runtime

In exportTextContentEnumeration, determining which (relevant) service is implemented by the current XTextContent is quite slow. For a simple document with 14 text content objects (which run twice through the method) the total time needed to determine the service takes as much time as exporting 80 simple paragraphs.

dvo: A more radical approach to this problem would be to give an XTextContent a required property which returns this type as a string or an integer key. However, this would require a modified TextContent service.

dvo: Most of the time seems to be spent in asking a shape whether it happens to be of some type of service. The way the interfaces are, there aren't too many way to speed up the shape implementation. Simply asking the shape early solves this problem nicely. For the 'einfach' Test-Dokument, checking for shapes early reduces time for the entire exportTextContentEnumeration by 10%.

dvo: I submitted a performance Bug #90240# about this.

I


Writer

Runtime

The 'FileLink' property of text sections is implemented fairly slow.

D


XML

Runtime

The FilterPropertyInfo_mpl::AddProperty method is quite slow because it used an (albeit hacked-up) version of insertion sort.

dvo: Using STL sort during GetApiNames() speeds up the operation by about 30%.

C


XML

Runtime

Importing a single control is about as slow as importing 150 simple paragraphs.

dvo: The time is spent in loading several DLLs required for the controls. This is probably unavoidable. The second control is quite speedy.

Last change: 07/25/2001

Agenda:

Status Symbols:

I

Idea

D

Done

P

in Progress

C

Cancelled