はじめに

ABEJAアドベントカレンダー2023の14日目の記事になります。こんにちは、ABEJAでデータサイエンティストをしている中西 @cfiken です。

LLM の登場のおかげで arxiv に上がる論文やテクニカルレポートの概要をチェックするのが簡単な時代になりました。例えば ChatGPT に Web で pdf を投げるだけで要約してくれる上に質疑まで答えてくれるのは、以前では考えられないほど楽ちんです。 ...なのですが、楽になるともっと楽をしたくなるのが人の性で、私も OpenAI API で GPT-4 を使って "いい感じ" の要約の自動生成を試みていました（下記画像）。

十分便利ではあるものの、図表や箇条書きを用いてもう少し見やすくできないかと考えた結果、今回はスライド風のデザインでの論文まとめ（以下、スライド）を、プロンプトテクニックを使いつつ GPT-4 を使って自動作成できないか試してみました。

本記事で目指すこと

本記事では、arxiv で参照できる学術論文やテクニカルレポートを、OpenAI API を使って4枚のスライド風にまとめることを目指します。得られた結果を使ってキャッチアップを早めることが目的なので、スライドはサクッと見やすい4枚としました。

モデルは GPT-4、特に gpt-4-1106-preview を使用します。

今回対象とする論文は、最近 Stability.ai からこの手法を使ったモデルが出たことでも話題*1の「Zephyr: Direct Distillation of LM Alignment」*2 を使用させていただきました。

スライドの作成には Reveal.js *3 を使用します。スライドは HTML として出力します。なお、私は Reveal.js の知識は全くなく、完全に GPT-4 頼りです。

※ 本記事の取り組みはあくまで自分用にキャッチアップを高速化するためのものです。

スライド作成のステップ

人間がやる場合を想定し、次のようなステップでスライドを作成することを考えます。

論文内容（Latex）を読み込んで、要約のために内容を理解する。
4枚のスライドにどのような情報を含めるか検討する。
各スライドについて、どのようなコンポーネントを使うか検討する。
3 で作成したレイアウトを元に、日本語でスライドに載せるコンテンツを作成する。
4 の情報を元に、Reveal.js のコードを作成する。

実際に GPT-4 に処理をさせる際は、1~3までを1回のリクエストで、その後 4, 5 でまた API を叩く形で処理させています。

プロンプト作成

ステップ1~3: スライドコンテンツ作成までの準備

内容は英語なのでざっくり説明すると、下記のようなプロンプトになっています。

上述のステップ1~4について説明した上で、その中の 1, 2, 3 について実行するように指示しています。
ステップ2 では、一般的なスライドの作成方法として、1枚目に概要やキーコンセプトを、2枚目に関連研究との差分を、3,4枚目に提案手法と実験内容とその結果という構成を提示しています。
ステップ3 では、テキストボックス、画像、表の3つのコンポーネントから、複数選択する形でレイアウトを作成するように依頼しています。
ステップ3 以降で画像を参照する場合、latex ファイルから参照先のファイルパスを取得するよう指示しています。（スライド作成時に使いたいため）

From now on, You are an excellent researcher in the field of machine learning and artificial intelligence,\
and good at explaining and skilled at making the content of the academic paper understandable to others.
[Task description]
Your ultimate goal is to create presentation slides in Japanese that explain the contents of an academic paper to researchers and engineers in the field of machine learning.
You will be consolidating the content of the paper into four slides.
There is no need to convey all the information which is not important for reader, like Acknowledgements; it's essential to summarize and explain the key points\
(abstract, conclusion, difference from related papers, novel method, experiment, results, and so on.) of the paper concisely and clearly.
Think step-by-step in the following order and create the slides.

[Order of steps]
1. (Now) Read and understand the content of the academic paper, and prepare it for summarization.\
Consider what kind of knowledge is required to understand and explain the contents of the paper concisely in slide layout.
2. (Now) Consider what information should be included in the four slides.\
This is not about the actual text and layout information to be placed on the slides, but rather a preliminary step of extracting and organizing the necessary content from the paper.\
This should include not only the content of the paper, but also the use of figures and tables found within the paper.\
Generally, the four slides will be as follows. You don't necessarily have to follow this, but use it as a reference.
- Slide 1: Key points such as the abstract, conclusion, and key concepts of the paper
- Slide 2: The position of this study, including motivation and differences from related research
- Slide 3, 4: The proposed methods and experimental methods and their results
Meta information such as the title and authors is not necessary.
3. (Now) For each slide, decide on the layout for the slide creation.\
There are 3 component options available for use on the slides, each consisting of its content and position: (a) Text box (b) Figure (c) Table.\
You can use the same component for multiple times on a single slide, like Text box for the title and Text box for bullet points.
As for (b) Figure, you must only specify the image file relative path like `figure/figure1.png` or `ModelNetwork.png` or so on. Do not explain other.
As for (c) Table, you must refer the reference number and use markdown syntax to represent the table.
4. (Not Now) Create the contents in Japanese to be placed in each block on the slides.\
In the (a) text box, you can use Markdown syntax to format the text, such as bold, italic, bullet points, numbered lists, and tables.\
As for (b) Figure, refer to images using the provided relative path like `figure/figure1.png` or `ModelNetwork.png` or so on.
As for (c) Table, use markdown syntax to represent the table contents.

[paper]
{paper}

[Task on this step]
Do the steps 1, 2, 3.

ステップ4: 日本語でスライドコンテンツを作成

内容はほとんど前述のものと同じで、今度はステップ4を実行するように指示しています。また、スライドを1枚ずつ作成するため、現在何枚目を作成させようとしているかを最後に明示しています。

[Order of steps]
1. (Done) Read and understand the content of the academic paper, and prepare it for summarization.\
Consider what kind of knowledge is required to understand and explain the contents of the paper concisely in slide layout.
2. (Done) Consider what information should be included in the four slides.\
This is not about the actual text and layout information to be placed on the slides, but rather a preliminary step of extracting and organizing the necessary content from the paper.\
This should include not only the content of the paper, but also the use of figures and tables found within the paper.\
Generally, the four slides will be as follows. You don't necessarily have to follow this, but use it as a reference.
- Slide 1: Key points such as the abstract, conclusion, and key concepts of the paper
- Slide 2: The position of this study, including motivation and differences from related research
- Slide 3, 4: The proposed methods and experimental methods and their results
Meta information such as the title and authors is not necessary.
3. (Done) For each slide, decide on the layout for the slide creation.\
There are 3 component options available for use on the slides, each consisting of its content and position: (a) Text box (b) Figure (c) Table.\
You can use the same component for multiple times on a single slide, like Text box for the title and Text box for bullet points.
As for (b) Figure, you must only specify the image file relative path like `figure/figure1.png` or `ModelNetwork.png` or so on. Do not explain other.
4. (Now) Create the contents in Japanese to be placed in each block on the slides.\
In the (a) text box, you can use Markdown syntax to format the text, such as bold, italic, bullet points, numbered lists, and tables.\
As for (b) Figure, refer to images using the provided relative path like `figure/figure1.png` or `ModelNetwork.png` or so on.
As for (c) Table, use markdown syntax to represent the table contents.

[paper]
{paper}

[Previous Step Output]
{output}

[Task on this step]
4.{order}. Create {order}th slide's content in Japanese.

ステップ5: Reveal.js のコードを作成

ステップ4でのアウトプットを渡しつつ、 Reveal.js を使ってスライドを作るよう指示しています。スライドの特性を活かして、視覚的に見やすくなることを期待できるような内容にしています。

文章や結果の表がしばしば省略されてしまっていたため、内容を書き切るように複数回念押ししています。

From now on, You are an excellent researcher in the field of machine learning and artificial intelligence,\
 and good at explaining and skilled at making the content of the academic paper understandable to others.
[Task description]
You will be provided with the structure and content of slides summarizing an academic paper.
First and foremost, consider what kind of slides would make it easy and concise to understand the content of the paper.
Next, please create the slides by using Reveal.js in Japanese from the given information [Slide Information].\
 If a reference to the image is included in [Slide Information], please use the image file path to embed it in the slide.
Utilize the characteristics of the slides and aim for a visually appealing presentation.\
 Instead of just plain explanatory text, for example, use bullet points or similar formatting to enhance readability.
Please complete the slides without any omissions or shortcuts, ensuring they are fully finished.

The HTML is organized by basic HTML and Reveal.js component. The template is provided in [template HTML]

[template HTML]
{html}

[Slide Information]
{content}

**Ensure to include ALL explanations and contents of tables without any omissions, and write everything in full detail.**

[template HTML] {html} の部分には下記を入れています。

<!DOCTYPE html>
<html>
<head>
    <title>論文解説スライド</title>
    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/reveal.js/4.1.0/reveal.min.css" />
    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/reveal.js/4.1.0/theme/black.min.css" />
    <style>
        .reveal {
            font-family: "Helvetica Neue", Helvetica, Arial, sans-serif;
            font-size: 20px;
        }
        .reveal h1 {
            font-size: 1.8em;
        }
        .reveal h1, .reveal h2, .reveal h3, .reveal h4, .reveal ul {
            text-align: left;
            display: block;
        }
    </style>
</head>
<body>
    <div class="reveal">
        <div class="slides">
        # ここに作成する日本語スライドが入ります
        # ここ以外も必要に応じて編集してください
        </div>
    </div>

    <script src="https://cdnjs.cloudflare.com/ajax/libs/reveal.js/4.1.0/reveal.min.js"></script>
    <script>
        Reveal.initialize();
    </script>
</body>
</html>

工夫点

明示的に CoT をさせるため、ステップの中で複数回に分けて GPT-4 にリクエストを送る

最初に人間が行う場合のフローをトレースして、各ステップに分割して GPT-4 を使うようにしました。これにより、うまく前段の内容を引き継ぎつつ、内容が薄くならないような生成ができています。

Role-Play Prompting を先頭に仕込む

Better Zero-Shot Reasoning with Role-Play Prompting *4 で提案されている Role-Play Prompting （のようなもの）を各プロンプトに入れています。 Role-Play Prompting は一言でいうと、ロールプレイ対話を通じて LLM に特定の役割を与え、その役割に基づいて質問に答えるように誘導した上で対話を行うことで性能が向上すると報告されている手法です。提案手法内では Role-Play 用に1ターンの会話をしたあとに目的のやりとりをしていますが、今回は面倒だったのでプロンプトの頭に足す形で試しました。

試してみたものの大きな差は感じませんでしたが、生成の質は安定した気がします。

STEP-BACK PROMPTING として途中にメタ質問を加える

Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models *5 で提案されている STEP-BACK PROMPTING も、各プロンプト内に入れています。 STEP-BACK PROMPTING は、本来行いたい推論の前段で、抽象化のステップとして、LLM に高次の概念や原則について一般的な質問をし（例: 力学の計算問題を解かせる前に、推論に必要な前提となる方程式について説明させる）、その後元の推論を行うことで性能を向上する手法です。これも Role-Play Prompting と同様に本来は別の対話として1ステップ挟むものですが、本記事では、次のようにプロンプト内に組み込んで使用しています。

例1: 論文要約時

1. (Now) Read and understand the content of the academic paper, and prepare it for summarization.\
 Consider what kind of knowledge is required to understand and explain the contents of the paper concisely in slide layout.

例2: スライド作成時

First and foremost, consider what kind of slides would make it easy and concise to understand the content of the paper.

作成した結果

Zephyr: Direct Distillation of LM Alignment の論文をスライド風にまとめてみた結果を2パターン載せます。スライド内の内容・画像は論文内から引用しています。

パターン1

パターン2

結果を見ると分かりますが、スライドデザイン内にコンテンツを収めるのが難しく、画像が入ると多くの場合見切れてしまいます。内容も極端に悪いわけではないですが、安定して読みやすいかと言われると難しく、かつステップを複数通っているのもあり生成の度にクオリティにある程度差が出てしまいます。

一方で、テキストによる要約では見づらい図や表についてスライド内でうまく参照できており、スライドというインターフェースを活かせています。すぐに活用とはいきませんが、一定の価値は感じられる結果となりました。

見づらい部分は CSS, Reveal.js 側を調整することによりある程度見やすく改善できそうです。また、レイアウトに関しても今回はゼロから GPT-4 に作らせたものの、別でテンプレートを用意して選ぶ形式にすることで統一させることはできました。

その他のトライ

上手く行かなかった取り組みを供養しておきます。

ステップ1~5を一括でのスライド作成
- 想像通りではありますが、ほとんどのケースで内容が減ってスカスカになってしまい、うまくいきませんでした。明示的 CoT はまだまだ必要そうです。
スライドの内容作成時に4枚分を1回のリクエストで作成
- 上に同じで、内容が減ってしまいました。表だと最初の1,2項目しか生成してくれない、のようなものも。
Google Slides API でのスライド作成
- GPT-4 は Reveal.js の方が得意そうでした。
- Google Slides API で実際にスライドリクエストを作らせても、該当のオブジェクトがないなどで失敗することが多く、手動での修正が必要でした。

さいごに

本記事では、GPT-4 を用いて論文やテクニカルレポートのスライド風まとめの作成にチャレンジしました。結果として、あまりスライド風デザインの長所（自由なレイアウトや簡潔な説明など）は活かせているか怪しいものになってしまいましたが、チューニングすることで改善すれば使えそうだと考えています。今後、マルチモーダルな生成が一般化していく中で、単なるテキスト要約ではなく、視覚的にも分かりやすく物事をまとめる必要性は上がっていくと考えています。本記事は arxiv にあるような PDF 形式の論文を対象にしましたが、このようなスライド風まとめは広く需要があるものだと思います。今回の取り組みが何かに活きれば幸いです。

We Are Hiring!

ABEJAは、テクノロジーの社会実装に取り組んでいます。技術はもちろん、技術をどのようにして社会やビジネスに組み込んでいくかを考えるのが好きな方は、下記採用ページからエントリーください！（新卒の方のエントリーもお待ちしております）

careers.abejainc.com

*1: StableLM Zephyr 3B のご紹介：StableLMに新たな機能を追加し、エッジデバイスに強力な LLM アシスタントを提供 — Stability AI Japan

*2:[2310.16944] Zephyr: Direct Distillation of LM Alignment

*3:The HTML presentation framework | reveal.js

*4:[2308.07702] Better Zero-Shot Reasoning with Role-Play Prompting

*5:[2310.06117] Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models