Как печатать страницы без трассировки с помощью pdftools в R

Я прочитал следующий PDF-файл в R, используя:

library(pdftools)
download.file("http://www.nba.com/data/html/nbacom/2017/gameinfo/20180117/0021700653_Book.pdf", "mydf", mode = "wb")
txt <- pdf_text("mydf")

Теперь, как я могу прочитать свой текстовый объект?

> str(txt)
 chr [1:18] "NATIONAL BASKETBALL ASSOCIATION                                                                                "| __truncated__ ...

Если я просто ввожу его, R усекает мои результаты.

> txt
 [1] "NATIONAL BASKETBALL ASSOCIATION                                                                                                  OFFICIAL SCORER'S REPORT\r\n                                                                                                                                                      FINAL BOX\r\nWednesday, January 17, 2018 Spectrum Center, Charlotte, NC\r\nOfficials: #15 Zach Zarba, #11 Derrick Collins, #12 CJ Washington\r\n                                                                                                                                                Game Duration: 2:14\r\n                                                                                                                                                 Attendance: 11528\r\nVISITOR: Washington Wizards (25-20)\r\n                                                    POS       MIN         FG FGA 3P 3PA FT FTA OR DR TOT A PF ST                                 TO  BS    +/- PTS\r\n22  O... <truncated>
 [2] "NATIONAL BASKETBALL ASSOCIATION                                                                                                  OFFICIAL SCORER'S REPORT\r\n                                                                                                                                           1st QUARTER ONLY\r\nWednesday, January 17, 2018 Spectrum Center, Charlotte, NC\r\nOfficials: #15 Zach Zarba, #11 Derrick Collins, #12 CJ Washington\r\n                                                                                                                                              Period Duration: 0:25\r\n                                                                                                                                                Attendance: 11528\r\nVISITOR: Washington Wizards\r\n                                                    POS      MIN          FG FGA 3P 3PA FT FTA OR DR TOT A PF ST                                TO   BS    +/- PTS\r\n22  Otto Porter Jr... <truncated>
 [3] "NATIONAL BASKETBALL ASSOCIATION                                                                                                  OFFICIAL SCORER'S REPORT\r\n                                                                                                                                         2nd QUARTER ONLY\r\nWednesday, January 17, 2018 Spectrum Center, Charlotte, NC\r\nOfficials: #15 Zach Zarba, #11 Derrick Collins, #12 CJ Washington\r\n                                                                                                                                             Period Duration: 0:28\r\n                                                                                                                                               Attendance: 11528\r\nVISITOR: Washington Wizards\r\n                                                    POS      MIN          FG FGA 3P 3PA FT FTA OR DR TOT A PF ST                               TO   BS +/-     PTS\r\n22  Otto Porter Jr.   ... <truncated>
 [4] "NATIONAL BASKETBALL ASSOCIATION                                                                                                  OFFICIAL SCORER'S REPORT\r\n                                                                                                                                                     FIRST HALF\r\nWednesday, January 17, 2018 Spectrum Center, Charlotte, NC\r\nOfficials: #15 Zach Zarba, #11 Derrick Collins, #12 CJ Washington\r\n                                                                                                                                              Period Duration: 0:56\r\n                                                                                                                                                Attendance: 11528\r\nVISITOR: Washington Wizards\r\n                                                    POS       MIN         FG FGA 3P 3PA FT FTA OR DR TOT A PF ST                                TO   BS +/-    PTS\r\n22  Otto Porte... <truncated>
 [5] "NATIONAL BASKETBALL ASSOCIATION                                                                                                  OFFICIAL SCORER'S REPORT\r\n                                                                                                                                         3rd QUARTER ONLY\r\nWednesday, January 17, 2018 Spectrum Center, Charlotte, NC\r\nOfficials: #15 Zach Zarba, #11 Derrick Collins, #12 CJ Washington\r\n                                                                                                                                             Period Duration: 0:26\r\n                                                                                                                                               Attendance: 11528\r\nVISITOR: Washington Wizards\r\n                                                    POS      MIN          FG FGA 3P 3PA FT FTA OR DR TOT A PF ST                               TO   BS +/-    PTS\r\n22  Otto Porter Jr.    ... <truncated>
 [6] "NATIONAL BASKETBALL ASSOCIATION                                                                                                  OFFICIAL SCORER'S REPORT\r\n                                                                                                                                 1st QUARTER - 3rd QUARTER\r\nWednesday, January 17, 2018 Spectrum Center, Charlotte, NC\r\nOfficials: #15 Zach Zarba, #11 Derrick Collins, #12 CJ Washington\r\n                                                                                                                                              Period Duration: 1:38\r\n                                                                                                                                                Attendance: 11528\r\nVISITOR: Washington Wizards\r\n                                                    POS       MIN         FG FGA 3P 3PA FT FTA OR DR TOT A PF ST                                TO   BS    +/- PTS\r\n22  Otto Porter Jr.... <truncated>
 [7] "NATIONAL BASKETBALL ASSOCIATION                                                                                                  OFFICIAL SCORER'S REPORT\r\n                                                                                                                                          4th QUARTER ONLY\r\nWednesday, January 17, 2018 Spectrum Center, Charlotte, NC\r\nOfficials: #15 Zach Zarba, #11 Derrick Collins, #12 CJ Washington\r\n                                                                                                                                             Period Duration: 0:32\r\n                                                                                                                                               Attendance: 11528\r\nVISITOR: Washington Wizards\r\n                                                    POS      MIN          FG FGA 3P 3PA FT FTA OR DR TOT A PF ST                               TO   BS    +/- PTS\r\n22  Otto Porter Jr.   ... <truncated>
 [8] "NATIONAL BASKETBALL ASSOCIATION                                                                                                  OFFICIAL SCORER'S REPORT\r\n                                                                                                                                                 SECOND HALF\r\nWednesday, January 17, 2018 Spectrum Center, Charlotte, NC\r\nOfficials: #15 Zach Zarba, #11 Derrick Collins, #12 CJ Washington\r\n                                                                                                                                              Period Duration: 1:02\r\n                                                                                                                                                Attendance: 11528\r\nVISITOR: Washington Wizards\r\n                                                    POS       MIN         FG FGA 3P 3PA FT FTA OR DR TOT A PF ST                                TO   BS +/-    PTS\r\n22  Otto Porter J... <truncated>
 [9] "NATIONAL BASKETBALL ASSOCIATION                                                                                      OFFICIAL PLAY-BY-PLAY\r\n                                                                                                                                   1st QUARTER\r\nWashington Wizards at CHARLOTTE HORNETS\r\nWednesday, January 17, 2018 Spectrum Center, Charlotte, NC\r\nHORNETS Starters: M.Kidd-Gilchrist M.Williams D.Howard N.Batum K.Walker\r\nWizards Starters: O.PorterJr. MarkMorris M.Gortat B.Beal J.Wall\r\nTime    HORNETS                                                     Score           Lead           Wizards\r\n12:00                                                           Start of Period (7:10 PM)\r\n12:00                                      JUMP BALL D.Howard VS. M.Gortat: TIP TO J.Wall\r\n11:42                                                                                              MISS J.Wall 18' Pullup Shot\r\n11:38   N.Batum REBOUND\r\n11:22 ... <truncated>
[10] "NATIONAL BASKETBALL ASSOCIATION                                                                                     OFFICIAL PLAY-BY-PLAY\r\n                                                                                                                                    1st QUARTER\r\nWashington Wizards at CHARLOTTE HORNETS\r\nWednesday, January 17, 2018 Spectrum Center, Charlotte, NC\r\nHORNETS Starters: M.Kidd-Gilchrist M.Williams D.Howard N.Batum K.Walker\r\nWizards Starters: O.PorterJr. MarkMorris M.Gortat B.Beal J.Wall\r\nTime    HORNETS                                                     Score           Lead           Wizards\r\n04:01                                                               30-24           +6             B.Beal 25' 3PT FB Jump Shot\r\n03:46                                                                                              I.Mahinmi P.FOUL (P1, T3) (D.Collins)\r\n03:46   SUB: F.Kaminsky FOR M.Williams\r\n03:33   MISS D.Howard Layup          ... <truncated>
[11] "NATIONAL BASKETBALL ASSOCIATION                                                                                     OFFICIAL PLAY-BY-PLAY\r\n                                                                                                                                  2nd QUARTER\r\nWashington Wizards at CHARLOTTE HORNETS\r\nWednesday, January 17, 2018 Spectrum Center, Charlotte, NC\r\nHORNETS Starters: J.Lamb M.Carter-Williams F.Kaminsky J.O'BryantIII T.Graham\r\nWizards Starters: K.OubreJr. I.Mahinmi M.Scott T.Satoransky J.Meeks\r\nTime    HORNETS                                                    Score           Lead           Wizards\r\n12:00                                                          Start of Period (7:38 PM)\r\n11:40   F.Kaminsky 12' Jump Shot                                   40-36           +4\r\n11:16                                                              40-39           +1             M.Scott 26' 3PT Jump Shot (T.Satoransky)\r\n11:04   F.Kaminsky 1... <truncated>
[12] "NATIONAL BASKETBALL ASSOCIATION                                                                                     OFFICIAL PLAY-BY-PLAY\r\n                                                                                                                                 2nd QUARTER\r\nWashington Wizards at CHARLOTTE HORNETS\r\nWednesday, January 17, 2018 Spectrum Center, Charlotte, NC\r\nHORNETS Starters: J.Lamb M.Carter-Williams F.Kaminsky J.O'BryantIII T.Graham\r\nWizards Starters: K.OubreJr. I.Mahinmi M.Scott T.Satoransky J.Meeks\r\nTime    HORNETS                                                    Score           Lead           Wizards\r\n05:23   D.Howard REBOUND\r\n05:23                                                                                             K.OubreJr. S.FOUL (P2, T4) (Z.Zarba)\r\n05:23   D.Howard Free Throw 1 of 2                                 58-50           +8\r\n05:23   MISS D.Howard Free Throw 2 of 2\r\n05:20                                         ... <truncated>
[13] "BIG HOME LEAD 16                                                                              *LEAD CHANGES 0\r\nBIG VISITOR LEAD 0                                                                                TIMES TIED 0\r\n    1 FOR 0 PTS                            TURNOVERS                                              4 FOR 8 PTS\r\n  15/24 FOR 62.5%                          FIELD GOALS                                           10/20 FOR 50%\r\n   5/8 FOR 62.5%                          FREE THROWS                                             2/5 FOR 40%\r\n   OFF: 3 DEF: 10                           REBOUNDS                                              OFF: 3 DEF: 9\r\n     K.Walker: 9                          HIGH SCORER                                            B.Beal, J.Wall: 6\r\n M.Kidd-Gilchrist: 3                  HIGH REBOUNDER                                   O.PorterJr., M.Gortat, K.OubreJr.: 2\r\nM.Carter-Williams: 3                      HIGH ASSISTS             ... <truncated>
[14] "NATIONAL BASKETBALL ASSOCIATION                                                                                      OFFICIAL PLAY-BY-PLAY\r\n                                                                                                                                  3rd QUARTER\r\nWashington Wizards at CHARLOTTE HORNETS\r\nWednesday, January 17, 2018 Spectrum Center, Charlotte, NC\r\nHORNETS Starters: M.Williams N.Batum D.Howard M.Kidd-Gilchrist K.Walker\r\nWizards Starters: J.Wall B.Beal M.Gortat O.PorterJr. MarkMorris\r\nTime    HORNETS                                                     Score           Lead           Wizards\r\n12:00                                                           Start of Period (8:22 PM)\r\n11:44   MISS D.Howard 9' Hook\r\n11:41                                                                                              MarkMorris REBOUND\r\n11:30                                                                                              MISS... <truncated>
[15] "NATIONAL BASKETBALL ASSOCIATION                                                                                      OFFICIAL PLAY-BY-PLAY\r\n                                                                                                                                    3rd QUARTER\r\nWashington Wizards at CHARLOTTE HORNETS\r\nWednesday, January 17, 2018 Spectrum Center, Charlotte, NC\r\nHORNETS Starters: M.Williams N.Batum D.Howard M.Kidd-Gilchrist K.Walker\r\nWizards Starters: J.Wall B.Beal M.Gortat O.PorterJr. MarkMorris\r\nTime    HORNETS                                                     Score           Lead           Wizards\r\n04:50   MISS K.Walker Driving Layup\r\n04:46   K.Walker REBOUND\r\n04:43   M.Kidd-Gilchrist Driving Layup (K.Walker)                   97-72           +25\r\n04:30                                                                                              MISS MarkMorris 26' 3PT Jump Shot\r\n04:28   HORNETS REBOUND\r\n04:28                      ... <truncated>
[16] "NATIONAL BASKETBALL ASSOCIATION                                                                                     OFFICIAL PLAY-BY-PLAY\r\n                                                                                                                                  4th QUARTER\r\nWashington Wizards at CHARLOTTE HORNETS\r\nWednesday, January 17, 2018 Spectrum Center, Charlotte, NC\r\nHORNETS Starters: J.Lamb T.Graham F.Kaminsky J.O'BryantIII M.Carter-Williams\r\nWizards Starters: K.OubreJr. J.Smith J.Meeks M.Scott T.Satoransky\r\nTime    HORNETS                                                    Score           Lead           Wizards\r\n12:00                                                          Start of Period (8:51 PM)\r\n11:52   T.Graham P.FOUL (P1, T1) (D.Collins)\r\n11:44                                                              102-81          +21            K.OubreJr. 20' Jump Shot (J.Smith)\r\n11:30   J.O'BryantIII Offensive (P1) (C.Washington)\r\n11:30   J.O'Br... <truncated>
[17] "NATIONAL BASKETBALL ASSOCIATION                                                                                      OFFICIAL PLAY-BY-PLAY\r\n                                                                                                                                    4th QUARTER\r\nWashington Wizards at CHARLOTTE HORNETS\r\nWednesday, January 17, 2018 Spectrum Center, Charlotte, NC\r\nHORNETS Starters: J.Lamb T.Graham F.Kaminsky J.O'BryantIII M.Carter-Williams\r\nWizards Starters: K.OubreJr. J.Smith J.Meeks M.Scott T.Satoransky\r\nTime    HORNETS                                                     Score           Lead           Wizards\r\n05:15                                                                                              MISS T.Frazier Free Throw 2 of 2\r\n05:08                                                                                              J.Smith REBOUND\r\n05:05                                                               119-99          +20 ... <truncated>
[18] "NATIONAL BASKETBALL ASSOCIATION                                                                                    OFFICIAL PLAY-BY-PLAY\r\n                                                                                                                                  4th QUARTER\r\nWashington Wizards at CHARLOTTE HORNETS\r\nWednesday, January 17, 2018 Spectrum Center, Charlotte, NC\r\nHORNETS Starters: J.Lamb T.Graham F.Kaminsky J.O'BryantIII M.Carter-Williams\r\nWizards Starters: K.OubreJr. J.Smith J.Meeks M.Scott T.Satoransky\r\nTime     HORNETS                                                   Score           Lead           Wizards\r\n:45.4    J.O'BryantIII REBOUND\r\n:37.0    M.Monk 27' 3PT Jump Shot                                  133-107         +26\r\n:26.5                                                              133-109         +24            J.Smith 19' Jump Shot (T.Satoransky)\r\n:01.6    HORNETS Shot Clock TURNOVER #10\r\n                                        ... <truncated>

Даже если я пытаюсь просто распечатать одну страницу, она все равно обрезается.

> txt[1]
[1] "NATIONAL BASKETBALL ASSOCIATION                                                                                                  OFFICIAL SCORER'S REPORT\r\n                                                                                                                                                      FINAL BOX\r\nWednesday, January 17, 2018 Spectrum Center, Charlotte, NC\r\nOfficials: #15 Zach Zarba, #11 Derrick Collins, #12 CJ Washington\r\n                                                                                                                                                Game Duration: 2:14\r\n                                                                                                                                                 Attendance: 11528\r\nVISITOR: Washington Wizards (25-20)\r\n                                                    POS       MIN         FG FGA 3P 3PA FT FTA OR DR TOT A PF ST                                 TO  BS    +/- PTS\r\n22  Ot... <truncated>

По некоторым другим советам, которые я получил на этом форуме, я также пробовал:

> cat(txt[1])
NATIONAL BASKETBALL ASSOCIATION                                                                                                  OFFICIAL SCORER'S REPORT
                                                                                                                                                      FINAL BOX
Wednesday, January 17, 2018 Spectrum Center, Charlotte, NC
Officials: #15 Zach Zarba, #11 Derrick Collins, #12 CJ Washington
                                                                                                                                                Game Duration: 2:14
                                                                                                                                                 Attendance: 11528
VISITOR: Washington Wizards (25-20)
                                                    POS       MIN         FG FGA 3P 3PA FT FTA OR DR TOT A PF ST                                 TO  BS    +/- PTS
22  Otto Porter Jr.                                   F      22:45         2      6      1       3        1       2       0    2    2  2 0 0      0   0   -24   6
 5  Markieff Morris                                   F      15:26         1      5      0       2        0       0       0    5    5  1 4 0      1   0   -10   2
13  Marcin Gortat                                     C      19:42         0      3      0       0        0       2       3    5    8  1 2 2      0   0   -23   0
 3  Bradley Beal                                      G      27:43        10 19          4       6        2       2       1    2    3  2 1 0      5   1   -14   26
 2  John Wall                                         G      24:20         5     11      2       2        0       0       0    2    2  9 2 0      3   2   -20   12
30  Mike Scott                                               25:13 7             10      2       2        2       2       0    2    2  3  3   1 0     0    -8   18
12  Kelly Oubre Jr.                                          26:32 5              9      3       5        3       4       0    5    5  0  4   0 3     0     3   16
28  Ian Mahinmi                                              08:47 1              2      0       0        2       2       0    1    1  2  1   0 0     1    -1   4
31  Tomas Satoransky                                         25:49 2              3      0       0        2       2       1    2    3  7  1   0 1     0   -13   6
20  Jodie Meeks                                              20:17 2              3      1       2        2       3       1    4    5  0  0   0 2     0   -10   7
14  Jason Smith                                              13:50 4              8      0       0        2       2       1    2    3  3  5   0 1     2     0   10
 1  Chris McCullough                                         06:48 1              3      0       1        0       0       0    1    1  0  0   0 0     1    -1   2
 8  Tim Frazier                                              02:48 0              0      0       0        0       2       0    0    0  1  0   0 0     0     1   0
                                                            240:00 40            82     13      23       16      23       7   33   40 31 23   3 16    7   -24  109
                                                                           48.8%         56.5%             69.6%           TM REB: 7     TOT TO: 16 (20 PTS)
HOME: CHARLOTTE HORNETS (18-25)
                                                    POS       MIN         FG FGA 3P 3PA FT FTA OR DR TOT A PF ST                                 TO  BS    +/- PTS
14  Michael Kidd-Gilchrist                            F      22:59         8     11      0       0        5       6       0    4    4  2 1 3      0   0    26   21
 2  Marvin Williams                                   F      21:38         4      7      3       4        1       1       1    2    3  1 0 0      1   0    25   12
12  Dwight Howard                                     C      28:54         7     13      0       0        4       5       3   12 15 2 2 2         2   2    17   18
 5  Nicolas Batum                                     G      26:00         4      8      2       4        1       2       0    3    3  4 1 1      2   0    17   11
15  Kemba Walker                                      G      28:48         6     15      4       8        3       3       1    2    3  7 1 0      1   1    14   19
 3  Jeremy Lamb                                              21:02 7              9      2       2        0       0       3    0    3  0  4   0   0   1    -4   16
44  Frank Kaminsky                                           23:36 6             14      1       4        1       1       0    2    2  2  1   1   0   0    -5   14
10  Michael Carter-Williams                                  15:12 0              2      0       1        3       4       0    3    3  5  2   0   0   1     8   3
 8  Johnny O'Bryant III                                      19:07 2              6      1       2        2       2       3    3    6  1  1   1   2   0     7   7
21  Treveon Graham                                           22:00 3              6      1       2        2       2       2    2    4  1  4   1   1   0     7   9
 1  Malik Monk                                               03:59 1              5      1       3        0       0       0    1    1  1  0   1   0   0     2   3
 7  Dwayne Bacon                                             03:59 0              2      0       1        0       0       0    0    0  0  0   0   0   0     2   0
32  Julyan Stone                                             02:46 0              0      0       0        0       0       0    2    2  1  1   0   0   0     4   0
                                                            240:00 48            98     15      31       22      26      13   36   49 27 18  10   9   5    24  133
                                                                             49%         48.4%             84.6%           TM REB: 7     TOT TO: 10 (15 PTS)
SCORE BY PERIOD 1                    2       3       4     FINAL
               Wizards 36           25      18      30        109
            HORNETS 38              39      25      31        133
Inactive: Wizards - Mac (Injury/Illness - left achilles surgery), Robinson (G League Team - two-way player)
Inactive: Hornets - Mathiang, Paige (G League Team - two-way player), Zeller (Injury/Illness - left knee surgery)
Points in the Paint: Wizards 30 (15/27), HORNETS 50 (25/48)                                Biggest Lead: Wizards 2, HORNETS 28
2nd Chance Points: Wizards 9 (4/7), HORNETS 21 (6/12)                                      Lead Changes: 2
Fast Break Points: Wizards 16 (6/8), HORNETS 10 (5/8)                                      Times Tied: 5
Technical fouls - Individual
Wizards (3): Wall 4:16 1st , Brooks 6:49 2nd , Frazier 4:00 4th
HORNETS (2): Kidd-Gilchrist 3:08 2nd , Carter-Williams 4:00 4th
Technical fouls - Defensive Three Seconds
Wizards (0) : NONE
HORNETS (1) : Howard 2:27 1st
Ejections
Wizards (1): Frazier 4:00 4th
HORNETS (1): Carter-Williams 4:00 4th
MEMO: Ejected for excessive communication and contact during stoppage in play.
MEMO: Ejected for excessive communication and contact during stoppage in play.
                                                         Copyright (c ) 2017-2018 NBA Properties, INC. All Rights Reserved

Это хорошо работает при печати всей страницы без усечения; однако эти результаты бесполезны для последующих шагов, которые я должен предпринять. Эти результаты объединяют \r и \n, которые мне нужны для анализа текста. Например, я хотел бы извлечь имена официальных лиц на первой странице. Следующее работает нормально:

> library(stringr)
> officials <- str_sub(str_extract(txt[1], "(?<=\\b\r\nOfficials).+?.(\\b\r\n)"), start = 3L, end = -3L)
> officials
[1] "#15 Zach Zarba, #11 Derrick Collins, #12 CJ Washington"

Однако, если бы я был знаком только с результатами, полученными с помощью cat(txt[1]), я бы не знал, что нужно включать \r\n в "(?<=\\b\r\nOfficials).+?.(\\b\r\n)".

Как распечатать одну страницу целиком в исходном закодированном формате?


person DataProphets    schedule 23.01.2018    source источник


Ответы (1)


Я понял. Это была проблема с R-Studio. Мне пришлось перейти к Tools > Global Options > Code > Display, прежде чем я смог установить Limit length of lines displayed in counsel to: в 0. R-Studio теперь печатает всю мою текстовую строку. Я нашел это решение здесь:

избегайте усечения строки, напечатанной на консоли (в RStudio)

на случай, если у кого-то еще возникнет такая же проблема.

person DataProphets    schedule 24.01.2018