<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Mark Horner &#187; software development</title>
	<atom:link href="http://www.markhorner.net/tag/software-development/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.markhorner.net</link>
	<description>A blog about mixing technology, education, openness, and experience in South Africa.</description>
	<lastBuildDate>Mon, 05 Dec 2011 08:46:52 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=</generator>
		<item>
		<title>Connexions Authoring Performance</title>
		<link>http://www.markhorner.net/2010/04/29/connexions-authoring-performance/</link>
		<comments>http://www.markhorner.net/2010/04/29/connexions-authoring-performance/#comments</comments>
		<pubDate>Thu, 29 Apr 2010 11:40:50 +0000</pubDate>
		<dc:creator>Mark</dc:creator>
				<category><![CDATA[Siyavula]]></category>
		<category><![CDATA[Connexions]]></category>
		<category><![CDATA[OERs]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[software development]]></category>

		<guid isPermaLink="false">http://www.markhorner.net/?p=537</guid>
		<description><![CDATA[<img src="http://www.markhorner.net/wp-content/uploads/SiyavulaBadgeSmall-TextFinal.png" width="50" height="50" alt="" title="Siyavula" /><br/>I forecast a much more responsive authoring experience on Connexions in the near future, the results of a detailed performance analysis are in and some massive potential improvements have been identified. Now to turn my attention to getting them implemented!]]></description>
			<content:encoded><![CDATA[<img src="http://www.markhorner.net/wp-content/uploads/SiyavulaBadgeSmall-TextFinal.png" width="50" height="50" alt="" title="Siyavula" /><br/><p>For the <a href="http://www.siyavula.org.za">Siyavula</a> project we partnered with <a href="http://www.cnx.org">Connexions</a> as our online content platform (some of the reasons can be found <a href="http://www.markhorner.net/2009/12/01/reflections-on-choosing-connexions/">here</a>). As part of this partnership we try to do more than use Connexions as a repository but also support the continuous development of the repository through sourcing more content and helping enhance technical offering as much as possible. There are always many potential avenues for development in any software project and we&#8217;ve been trying to help on those that make life easier for teachers in South Africa to be more effective and efficient when using the site.</p>
<p>One area that has been flagged by a number of people, not just our teachers at workshops, is that the authoring side of the site can be very slow at times. Some people just say that it is because the site is built on <a href="http://www.plone.org">Plone</a> but that isn&#8217;t fair so, to support our teachers, the broader Connexions community and the Connexions Consortium, I commissioned <a href="http://www.upfrontsystems.co.za">Upfront Systems</a> to do a performance analysis of the authoring side of the site.</p>
<p>Before we get into details, the bottom-line is that Upfront Systems have shown that there are massive potential performance improvements (in some cases 5 times faster) for Connexions on the authoring side and analysed what it will take to implement them. I estimate that the total cost of implementing the specification is $7500.</p>
<p>This should make it a lot easier to raise the money needed to implement the fixes as we know where the problem lies, we have a detailed specification on how to fix it and have a proper analysis of the expected improvements so funders know what their return on investment will be.</p>
<h3>The Gory Details As Reported on the Rhaptos Mailing List</h3>
<div style="font-size:.9em;margin-left:6px;margin-right:6px;">Roché Compaan<br />
Mon Apr 26 08:36:28 CDT 2010 </p>
<p>I just checked in our analysis of the performance problems associated<br />
with authoring in Connexions:</p>
<p>https://software.cnx.rice.edu/svn/devsets/performance-authoring/analysis/report.odt</p>
<p>The full text of the report is available below for discussion here on<br />
the mailing list.</p>
<p>The draft specification to develop the fixes to the problems identified<br />
is available here:</p>
<p>https://software.cnx.rice.edu/svn/devsets/performance-authoring/specification/cnx-performance-specification-2010-04-24.odt</p>
<p>Comments/edits on the specification are welcome.</p>
<p>Connexions Performance Analysis<br />
===============================</p>
<p>Hypothesis<br />
&#8212;&#8212;&#8212;-<br />
A large number of objects indexed in the portal_catalog makes rhaptos<br />
slow. Many of those objects might not need indexing, because postgresql<br />
is used for searching published modules. The content types Module, CNXML<br />
Document and PublishedContentPointer need not be indexed in the<br />
portal_catalog, since users mostly work with these objects in their<br />
personal workspace and group workspaces and there are no site wide<br />
searches for these objects.</p>
<p>Many of the standard plone indexes can be done away with, since they are<br />
never referenced, and the parts of plone that use them are not used by<br />
Rhaptos in any case. New objects are however still added to these<br />
indexes and this wastes time.</p>
<p>We also suspected that the MyCNX page can do with some optimisation.</p>
<p>Methodology<br />
&#8212;&#8212;&#8212;&#8211;<br />
We developed a funkload test that creates a module. We used this to run<br />
a benchmark against rhaptos, using cycles with two, five and ten<br />
concurrent users.</p>
<p>The aim was to see how many modules can be created in a 5 minute period,<br />
with and without disabling indexing of modules and running with and<br />
without a full catalog.</p>
<p>We used funkload&#8217;s authentication server to provide different login<br />
details for each concurrent user, so that the results represent a<br />
real-life scenario where several people are creating modules at the same<br />
time.</p>
<p>To check which of the standard plone indexes are in use, we<br />
monkey-patched ZCatalog and logged the catalog queries while running the<br />
existing selenium tests.</p>
<p>To optimise the MyCNX page, we used PTProfiler to see where it spends<br />
its time.</p>
<p>Results<br />
=======<br />
Benchmark results<br />
&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;</p>
<p>We ran four benchmarks, with and without indexing, with a a full and<br />
empty catalog. Each test in a benchmark consists of 8 pages and is<br />
equivalent to a user creating a module on cnx.org. The benchmark was<br />
conducted with 2, 5 and 10 concurrent users.</p>
<p>Listed below are the number of modules we managed to create for 2, 5 and<br />
10 concurrent users over a 5 minute period:</p>
<p>        Full catalog, with indexing: 18, 22, 18<br />
        Full catalog, no indexing: 21, 36, 32<br />
        Empty catalog, with indexing: 24, 25, 26<br />
        Empty catalog, no indexing: 27, 37, 36</p>
<p>At higher concurrencies (5 and 10) the occasional ConflictError occured<br />
when indexing was turned on, but this completely disappeared when<br />
disabling indexing.</p>
<p>The slowest requests were those that involved the creation of the<br />
Module, initially when creating the temporary object within<br />
portal_factory, and again later when the final object was stored.<br />
Turning off indexing halved the the time it took for these requests.</p>
<p>The performance improvements the catalog yields is already obvious when<br />
looking at the overall number of successful test per seconds but it is<br />
even more visible when looking at specific requests.</p>
<p>When posting to the URL /mycnx/cc_license, a blank module is created for<br />
the first time. Here already the catalog comes into play. The results<br />
below compare posting to this url between indexing a module with no<br />
indexing. The table shows the minimum, average and maximum response<br />
times in seconds.</p>
<p>* Req: 001, post, url /mycnx/cc_license </p>
<p>    Full indexing:</p>
<p>        Concurrent users | Min   |  Avg   |  Max<br />
        &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<br />
        2                | 1.291 |  4.440 | 27.300<br />
        5                | 2.459 |  8.966 | 44.649<br />
        10               | 5.139 | 13.298 | 32.103</p>
<p>    No indexing:</p>
<p>        Concurrent users | Min   |  Avg   |  Max<br />
        &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<br />
        2                | 0.885 |  1.279 | 2.448<br />
        5                | 1.978 |  3.849 | 21.619<br />
        10               | 2.512 |  5.691 | 10.281</p>
<p>With no indexing the response times are between 200% and 300% faster<br />
than with indexing enabled.</p>
<p>The performance improvement given by not indexing the module remains<br />
visible on the first save of the new module:</p>
<p>* Req: 001, post, url<br />
/Members/<memberid>/portal_factory/Module/module.2010-02-17.6655525069//content_title </p>
<p>    Full indexing:</p>
<p>        Concurrent users | Min   |  Avg   |   Max<br />
        &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<br />
        2                | 2.829 |  4.525 |  13.870<br />
        5                | 5.059 | 11.285 |  70.596<br />
        10               | 8.250 | 20.566 |  45.279</p>
<p>    No indexing:</p>
<p>        Concurrent users |  Min  |  Avg   |   Max<br />
        &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<br />
        2                | 1.973 |  2.585 |   5.613<br />
        5                | 3.098 |  6.409 |  21.761<br />
        10               | 4.344 | 11.044 |  22.722</p>
<p>It should be noted that we did spot performance problems that did not<br />
relate to indexing but to expressions in templates. An example of this<br />
is the validation of a module&#8217;s xml content. The xml is currently<br />
validated by making a call to an external java validator. This<br />
validation happens both when opening up the editor (HTTP GET) and saving<br />
content (HTTP POST) on a module. One would expect that this validation<br />
should only occur when saving the module. Ideally this should not be<br />
validated by an external call to a JAVA process and one should<br />
investigate a pure Python alternative. This call to tha java validator<br />
adds about 3 seconds to the rendering time of the module editor.</p>
<p>The complete results of the benchmarks are available in SVN.</p>
<p>    Full catalog, with indexing:</p>
<p>https://software.cnx.rice.edu/svn/devsets/performance-authoring/analysis/test1.txt</p>
<p>    Full catalog, without indexing:</p>
<p>https://software.cnx.rice.edu/svn/devsets/performance-authoring/analysis/test2.txt</p>
<p>    Empty catalog, without indexing:</p>
<p>https://software.cnx.rice.edu/svn/devsets/performance-authoring/analysis/test3.txt</p>
<p>    Empty catalog, without indexing:</p>
<p>https://software.cnx.rice.edu/svn/devsets/performance-authoring/analysis/test4.txt</p>
<p>Rhaptos Content types to be removed from indexing<br />
&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<br />
Shown below is the number of cataloged Rhaptos content types:</p>
<p>65035: PublishedContentPointer<br />
27331: Module<br />
19199: CNXML Document<br />
12958: Workspace<br />
12426: SubCollection<br />
8304:  UnifiedFile<br />
5635:  Collection<br />
1989:  LensFolder<br />
1628:  ChangeSet<br />
1528:  Workgroup<br />
492:   Patch</p>
<p>It is unnecessary to index &#8220;CNXML Document&#8221;, as the catalog is never<br />
queried for this content type. This content type lives inside a Module<br />
and stores the actual textual content of the module. Each module will<br />
likely have at least one of these, in other words, there could<br />
potentially be as many of these as there are modules. We need not index<br />
them, because we can just list the contents of a module object to find<br />
them.</p>
<p>&#8220;Module&#8221; is queried for in two places:</p>
<p>   Products/RhaptosSite/skins/rhaptos_site/all_editable_content.py</p>
<p>Products/RhaptosCollection/skins/rhaptos_collection/searchWorkspace.py<br />
   Products/RhaptosCollection/Field.py</p>
<p>all_editable_content is used to render a list of modules in various<br />
places, example on your myCNX page or when you click on By Type:<br />
Modules. all_editable_content needs to be adapted to handle the<br />
suggested changes.</p>
<p>searchWorkspace.py is called when someone searches his workspace for<br />
modules. This functionality was disabled in svn revision 1864, so we can<br />
probably get away with just removing this old code.</p>
<p>Products/RhaptosCollection/Field.py defines a WorkspaceReferenceField<br />
referencing content type that uses the catalog to construct a vocabulary<br />
of possible modules it can reference. This will need work. Note however<br />
that this field is not in use at the moment since collections only<br />
reference published modules through the PublishedContentPointer.</p>
<p>PublishedContentPointer is another content type of which there are a<br />
great many in the catalog. It is used inside Collections as pointers to<br />
the actual modules. It is never explicitly queried for, and the proper<br />
zope API (objectValues()) is used on the containing<br />
collection/subcollection objects. Collections continue to function<br />
normally even if PublishedContentPointer is removed from the catalog.</p>
<p>Plone indexes<br />
&#8212;&#8212;&#8212;&#8212;-<br />
The following indexes are consulted during normal rhaptos usage (this<br />
list might not be exhaustive). This was determined by logging catalog<br />
queries while running the selenium tests as explained above under<br />
methodology:</p>
<p>    Creator<br />
    portal_type<br />
    effectiveRange (index is empty)<br />
    allowedRolesAndUsers<br />
    orig_id (used only by Patches)<br />
    review_state<br />
    path<br />
    getObjPositionInParent*<br />
    sortable_title<br />
    modified<br />
    created<br />
    Date</p>
<p>MyCNX page<br />
&#8212;&#8212;&#8212;-<br />
To test this properly, we started zope cleanly before each test and<br />
loaded MyCNX once to avoid object fetches from skewing the result before<br />
turning on PTProfiler and profiling the page.</p>
<p>As is, it takes 11.5 seconds for the MyCNX page to render. The major<br />
culprits are:</p>
<p>1. The lensorganizers view is called in order to show lensorganizers you<br />
recently created on your MyCNX page. This takes 5.5 seconds.</p>
<p>2. all_editable_content is called to render a list of recently<br />
modified modules. This takes 3 seconds.</p>
<p>3. showEditableBorder is a standard plone macro that is called to<br />
determine whether the green editing border should be shown. This takes<br />
0.6 seconds to render.</p>
<p>The results from the lensorganizers view is only used for siyavula<br />
users, but due to the way TAL works (&#8220;define&#8221; is evaluated before<br />
&#8220;condition&#8221;) it is queried for all users. It uses a catalog query to<br />
find your lens organizers, and this uses a path index. Since users<br />
cannot create content outside their workspaces, the path index can be<br />
removed. Filtering on Creator is already sufficient. This can be further<br />
optimised by restructuring the template so that it is only called for<br />
siyavula users.</p>
<p>all_editable_content was already discussed in the earlier discussion on<br />
the catalog. The 3 seconds it takes to render is likely because of the<br />
size of the catalog.</p>
<p>The slow part of showEditableBorder is a call to getAllowedTypes. Since<br />
users can only add content in places where they have &#8220;Add portal<br />
content&#8221;, this check only wastes time and can be removed.</p>
<p>After optimising as above, it takes half a second for the lensorganizers<br />
view and showEditableBorder becomes insignificant, bringing the entire<br />
render time to about a second. For some reason all_editable_content also<br />
runs faster, we tested this several times.</p>
<p>Recommendation<br />
&#8212;&#8212;&#8212;&#8212;&#8211;<br />
We recommend that one stops indexing Module, CNXML Document and<br />
PublishModulePointer entirely.</p>
<p>Listing modules in your workspace breaks if Modules are not indexed,<br />
this needs to be refactored.</p>
<p>Since a lot of Plone UI is dependant on the indexing of a module, one<br />
could phase the implementation and only prevent indexing of CNXML<br />
Document and PublishModulePointer.</p>
<p>CollectionProducts/RhaptosCollection/Field.py needs to be refactored to<br />
live without modules in the catalog.</p>
<p>The unused plone indexes will be removed.</p>
<p>MyCNX will be optimised as suggested above.</p>
<p>&#8211;<br />
Roché Compaan<br />
Upfront Systems                   http://www.upfrontsystems.co.za
</p></div>
 <img src="http://www.markhorner.net/wp-content/plugins/wordpress-feed-statistics/feed-statistics.php?view=1&post_id=537" width="1" height="1" style="display: none;" />]]></content:encoded>
			<wfw:commentRss>http://www.markhorner.net/2010/04/29/connexions-authoring-performance/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>

