<div xmlns="http://www.w3.org/1999/xhtml"><div>Greetings,</div><div> </div><div>i am interested in databases and would like to make a contribution to the</div><div>PostgreSQL by participating in GSoC 2019. Currently i am studying in HSE[1],</div><div>doing last year of master's program that mostly build on top of collaboration</div><div>with ISP RAS[2].</div><div> </div><div>In the previous year i have been working on llvm_jit extension for</div><div>PostgreSQL 9.6, that was developed in ISP RAS and presented at PGCON[3].</div><div>Specifically, my work consisted of adding support for several missing</div><div>nodes(bitmapscan, mergejoin, subqueryscan, etc)</div><div>by rewriting them with LLVM API, as well as other functionality(e.g. distinct in group by)</div><div>that is required to fully support TPC-H of SCALE 100.</div><div> </div><div>Originally i wanted to pursue "TOAST" tasks from ideas list, but noticed</div><div>that couple of students have already mentioned them in mailing list. So, instead</div><div>of increasing the queue for single possible idea, i would like to offer other</div><div>ones, that sound interesting to me and can potentially be useful for PostgreSQL</div><div>and community:</div><div> </div><div>1) The so-called Adaptive join, that exists in modern Oracle[4] and MSSQL[5] </div><div> versions. This type of node is designed to mitigate cardinality estimation</div><div> errors in queries that are somewhere inbetween NL(nested loop with indexscan)</div><div> and HJ(hashjoin).</div><div> </div><div> One possible implementation of that is to start execution in HJ fasion, by accumulating</div><div> rows in hashtable with certain threshold. If threshold is not exceeded, then</div><div> continue with indexscan, otherwise switch to usual HJ.</div><div> </div><div>2) Changing buffer manager strategy.</div><div> Somewhere in 2016 Andres Freund made a presention[6] of possible improvements</div><div> that can be done in buffer manager. I find the idea of changing hashtable to</div><div> trees of radix trees[7] promising. Most likely, taking into account program's</div><div> time constraints, this task won't be done as "ready to deploy" solution.<br /> Instead, some kind of prototype can be implemented and benchmarked. </div><div> </div><div>3) Improvements in jit component.</div><div> Great progress has been made in this direction in 10 and 11 versions, but</div><div> still there's a lot to be done. Possible subareas: compiled code caching/sharing,</div><div> cost-based optimizer improvements, push-based execution with bytecode</div><div> transformation, compiling plpgsql, etc.</div><div> </div><div>At this stage i would like to receive some feedback from the community,</div><div>which of those ideas are more useful for the near future of PostgreSQL and</div><div>more suitable for GSoC itself. With that information i can dive into particular</div><div>topic, extract additional information and prepare required proposal.</div><div> </div><div>p.s. my preferred order: 2,1,3</div><div> </div><div>--------------------------------------------------------------------------------</div><div>[1] https://www.hse.ru/en/ma/sp</div><div>[2] http://www.ispras.ru/en/</div><div>[3] http://www.pgcon.org/2017/schedule/events/1092.en.html</div><div>[4] https://www.oracle.com/technetwork/database/bi-datawarehousing/twp-optimizer-with-oracledb-12c-1963236.pdf</div><div>[5] https://blogs.msdn.microsoft.com/sqlserverstorageengine/2017/04/19/introducing-batch-mode-adaptive-joins/</div><div>[6] https://pgconf.ru/media/2016/05/13/1io.pdf</div><div>[7] http://events17.linuxfoundation.org/sites/events/files/slides/LinuxConNA2016%20-%20Radix%20Tree.pdf</div><div> </div><div>Best regards,</div><div> </div><div>Michael.</div></div>