Posted on April 29, 2010 · Posted in Siyavula

For the Siyavula project we partnered with Connexions as our online content platform (some of the reasons can be found here). As part of this partnership we try to do more than use Connexions as a repository but also support the continuous development of the repository through sourcing more content and helping enhance technical offering as much as possible. There are always many potential avenues for development in any software project and we’ve been trying to help on those that make life easier for teachers in South Africa to be more effective and efficient when using the site.

One area that has been flagged by a number of people, not just our teachers at workshops, is that the authoring side of the site can be very slow at times. Some people just say that it is because the site is built on Plone but that isn’t fair so, to support our teachers, the broader Connexions community and the Connexions Consortium, I commissioned Upfront Systems to do a performance analysis of the authoring side of the site.

Before we get into details, the bottom-line is that Upfront Systems have shown that there are massive potential performance improvements (in some cases 5 times faster) for Connexions on the authoring side and analysed what it will take to implement them. I estimate that the total cost of implementing the specification is $7500.

This should make it a lot easier to raise the money needed to implement the fixes as we know where the problem lies, we have a detailed specification on how to fix it and have a proper analysis of the expected improvements so funders know what their return on investment will be.

The Gory Details As Reported on the Rhaptos Mailing List

Roché Compaan
Mon Apr 26 08:36:28 CDT 2010

I just checked in our analysis of the performance problems associated
with authoring in Connexions:
https://software.cnx.rice.edu/svn/devsets/performance-authoring/analysis/report.odt
The full text of the report is available below for discussion here on
the mailing list.

The draft specification to develop the fixes to the problems identified
is available here:
https://software.cnx.rice.edu/svn/devsets/performance-authoring/specification/cnx-performance-specification-2010-04-24.odt

Comments/edits on the specification are welcome.

Connexions Performance Analysis
===============================

Hypothesis
———-
A large number of objects indexed in the portal_catalog makes rhaptos
slow. Many of those objects might not need indexing, because postgresql
is used for searching published modules. The content types Module, CNXML
Document and PublishedContentPointer need not be indexed in the
portal_catalog, since users mostly work with these objects in their
personal workspace and group workspaces and there are no site wide
searches for these objects.

Many of the standard plone indexes can be done away with, since they are
never referenced, and the parts of plone that use them are not used by
Rhaptos in any case. New objects are however still added to these
indexes and this wastes time.

We also suspected that the MyCNX page can do with some optimisation.

Methodology
———–
We developed a funkload test that creates a module. We used this to run
a benchmark against rhaptos, using cycles with two, five and ten
concurrent users.

The aim was to see how many modules can be created in a 5 minute period,
with and without disabling indexing of modules and running with and
without a full catalog.

We used funkload’s authentication server to provide different login
details for each concurrent user, so that the results represent a
real-life scenario where several people are creating modules at the same
time.

To check which of the standard plone indexes are in use, we
monkey-patched ZCatalog and logged the catalog queries while running the
existing selenium tests.

To optimise the MyCNX page, we used PTProfiler to see where it spends
its time.

Results
=======
Benchmark results
—————–

We ran four benchmarks, with and without indexing, with a a full and
empty catalog. Each test in a benchmark consists of 8 pages and is
equivalent to a user creating a module on cnx.org. The benchmark was
conducted with 2, 5 and 10 concurrent users.

Listed below are the number of modules we managed to create for 2, 5 and
10 concurrent users over a 5 minute period:

Full catalog, with indexing: 18, 22, 18
Full catalog, no indexing: 21, 36, 32
Empty catalog, with indexing: 24, 25, 26
Empty catalog, no indexing: 27, 37, 36

At higher concurrencies (5 and 10) the occasional ConflictError occured
when indexing was turned on, but this completely disappeared when
disabling indexing.

The slowest requests were those that involved the creation of the
Module, initially when creating the temporary object within
portal_factory, and again later when the final object was stored.
Turning off indexing halved the the time it took for these requests.

The performance improvements the catalog yields is already obvious when
looking at the overall number of successful test per seconds but it is
even more visible when looking at specific requests.

When posting to the URL /mycnx/cc_license, a blank module is created for
the first time. Here already the catalog comes into play. The results
below compare posting to this url between indexing a module with no
indexing. The table shows the minimum, average and maximum response
times in seconds.

* Req: 001, post, url /mycnx/cc_license

Full indexing:

Concurrent users | Min | Avg | Max
—————————————–
2 | 1.291 | 4.440 | 27.300
5 | 2.459 | 8.966 | 44.649
10 | 5.139 | 13.298 | 32.103

No indexing:

Concurrent users | Min | Avg | Max
—————————————–
2 | 0.885 | 1.279 | 2.448
5 | 1.978 | 3.849 | 21.619
10 | 2.512 | 5.691 | 10.281

With no indexing the response times are between 200% and 300% faster
than with indexing enabled.

The performance improvement given by not indexing the module remains
visible on the first save of the new module:

* Req: 001, post, url
/Members//portal_factory/Module/module.2010-02-17.6655525069//content_title

Full indexing:

Concurrent users | Min | Avg | Max
——————————————-
2 | 2.829 | 4.525 | 13.870
5 | 5.059 | 11.285 | 70.596
10 | 8.250 | 20.566 | 45.279

No indexing:

Concurrent users | Min | Avg | Max
——————————————-
2 | 1.973 | 2.585 | 5.613
5 | 3.098 | 6.409 | 21.761
10 | 4.344 | 11.044 | 22.722

It should be noted that we did spot performance problems that did not
relate to indexing but to expressions in templates. An example of this
is the validation of a module’s xml content. The xml is currently
validated by making a call to an external java validator. This
validation happens both when opening up the editor (HTTP GET) and saving
content (HTTP POST) on a module. One would expect that this validation
should only occur when saving the module. Ideally this should not be
validated by an external call to a JAVA process and one should
investigate a pure Python alternative. This call to tha java validator
adds about 3 seconds to the rendering time of the module editor.

The complete results of the benchmarks are available in SVN.

Full catalog, with indexing:

https://software.cnx.rice.edu/svn/devsets/performance-authoring/analysis/test1.txt

Full catalog, without indexing:

https://software.cnx.rice.edu/svn/devsets/performance-authoring/analysis/test2.txt

Empty catalog, without indexing:

https://software.cnx.rice.edu/svn/devsets/performance-authoring/analysis/test3.txt

Empty catalog, without indexing:

https://software.cnx.rice.edu/svn/devsets/performance-authoring/analysis/test4.txt

Rhaptos Content types to be removed from indexing
————————————————-
Shown below is the number of cataloged Rhaptos content types:

65035: PublishedContentPointer
27331: Module
19199: CNXML Document
12958: Workspace
12426: SubCollection
8304: UnifiedFile
5635: Collection
1989: LensFolder
1628: ChangeSet
1528: Workgroup
492: Patch

It is unnecessary to index “CNXML Document”, as the catalog is never
queried for this content type. This content type lives inside a Module
and stores the actual textual content of the module. Each module will
likely have at least one of these, in other words, there could
potentially be as many of these as there are modules. We need not index
them, because we can just list the contents of a module object to find
them.

“Module” is queried for in two places:

Products/RhaptosSite/skins/rhaptos_site/all_editable_content.py

Products/RhaptosCollection/skins/rhaptos_collection/searchWorkspace.py
Products/RhaptosCollection/Field.py

all_editable_content is used to render a list of modules in various
places, example on your myCNX page or when you click on By Type:
Modules. all_editable_content needs to be adapted to handle the
suggested changes.

searchWorkspace.py is called when someone searches his workspace for
modules. This functionality was disabled in svn revision 1864, so we can
probably get away with just removing this old code.

Products/RhaptosCollection/Field.py defines a WorkspaceReferenceField
referencing content type that uses the catalog to construct a vocabulary
of possible modules it can reference. This will need work. Note however
that this field is not in use at the moment since collections only
reference published modules through the PublishedContentPointer.

PublishedContentPointer is another content type of which there are a
great many in the catalog. It is used inside Collections as pointers to
the actual modules. It is never explicitly queried for, and the proper
zope API (objectValues()) is used on the containing
collection/subcollection objects. Collections continue to function
normally even if PublishedContentPointer is removed from the catalog.

Plone indexes
————-
The following indexes are consulted during normal rhaptos usage (this
list might not be exhaustive). This was determined by logging catalog
queries while running the selenium tests as explained above under
methodology:

Creator
portal_type
effectiveRange (index is empty)
allowedRolesAndUsers
orig_id (used only by Patches)
review_state
path
getObjPositionInParent*
sortable_title
modified
created
Date

MyCNX page
———-
To test this properly, we started zope cleanly before each test and
loaded MyCNX once to avoid object fetches from skewing the result before
turning on PTProfiler and profiling the page.

As is, it takes 11.5 seconds for the MyCNX page to render. The major
culprits are:

1. The lensorganizers view is called in order to show lensorganizers you
recently created on your MyCNX page. This takes 5.5 seconds.

2. all_editable_content is called to render a list of recently
modified modules. This takes 3 seconds.

3. showEditableBorder is a standard plone macro that is called to
determine whether the green editing border should be shown. This takes
0.6 seconds to render.

The results from the lensorganizers view is only used for siyavula
users, but due to the way TAL works (“define” is evaluated before
“condition”) it is queried for all users. It uses a catalog query to
find your lens organizers, and this uses a path index. Since users
cannot create content outside their workspaces, the path index can be
removed. Filtering on Creator is already sufficient. This can be further
optimised by restructuring the template so that it is only called for
siyavula users.

all_editable_content was already discussed in the earlier discussion on
the catalog. The 3 seconds it takes to render is likely because of the
size of the catalog.

The slow part of showEditableBorder is a call to getAllowedTypes. Since
users can only add content in places where they have “Add portal
content”, this check only wastes time and can be removed.

After optimising as above, it takes half a second for the lensorganizers
view and showEditableBorder becomes insignificant, bringing the entire
render time to about a second. For some reason all_editable_content also
runs faster, we tested this several times.

Recommendation
————–
We recommend that one stops indexing Module, CNXML Document and
PublishModulePointer entirely.

Listing modules in your workspace breaks if Modules are not indexed,
this needs to be refactored.

Since a lot of Plone UI is dependant on the indexing of a module, one
could phase the implementation and only prevent indexing of CNXML
Document and PublishModulePointer.

CollectionProducts/RhaptosCollection/Field.py needs to be refactored to
live without modules in the catalog.

The unused plone indexes will be removed.

MyCNX will be optimised as suggested above.


Roché Compaan
Upfront Systems http://www.upfrontsystems.co.za

About the Author

Mark Horner is the CEO of Siyavula Education, a social enterprise working in the school sector in South Africa. While working as the Shuttleworth Foundation Fellow for Open and Collaborative Resources, Mark was able to transform the Free High School Science Texts (FHSST) project, which he co-founded, into Siyavula Education. In this process, openly-licensed, collaboratively authored textbooks have been printed and distributed nationally in South Africa. Working at the intersection of community, openness and technology; Mark intends to leverage this success to make Siyavula an innovative, technology provider in education that works effectively as part of the education community to ensure better learning opportunities for all. A recent notable event being the delivery of Siyavula's textbooks over Mxit, the most popular mobile chat solution in South Africa. Mark has a PhD in physics from the University of Cape Town and conducted his research at Lawrence Berkeley National Laboratory in California on the results from the STAR experiment at Brookhaven National Laboratory in New York. His work is carried out in the belief that the liberation of information and support of education in South Africa will lead to a peaceful and prosperous future for all South Africans.