Как я могу получить оценку в конвейере вопросов и ответов? Есть ли ошибка при использовании конвейера вопрос-ответ?

Когда я запускаю следующий код

from transformers import AutoTokenizer, AutoModelForQuestionAnswering
import torch

tokenizer = AutoTokenizer.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
model = AutoModelForQuestionAnswering.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")

text = r"""
As checked Dis is not yet on boarded to ARB portal, hence we cannot upload the invoices in portal
"""

questions = [
    "Dis asked if it is possible to post the two invoice in ARB.I have not access so I wanted to check if you would be able to do it.",
]

for question in questions:
    inputs = tokenizer.encode_plus(question, text, add_special_tokens=True, return_tensors="pt")
    input_ids = inputs["input_ids"].tolist()[0]

    text_tokens = tokenizer.convert_ids_to_tokens(input_ids)
    answer_start_scores, answer_end_scores = model(**inputs)

    answer_start = torch.argmax(
        answer_start_scores
    )  # Get the most likely beginning of answer with the argmax of the score
    answer_end = torch.argmax(answer_end_scores) + 1  # Get the most likely end of answer with the argmax of the score

    answer = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(input_ids[answer_start:answer_end]))

    print(f"Question: {question}")
    print(f"Answer: {answer}\n")

Вот что я получаю:

Question: Dis asked if it is possible to post the two invoice in ARB.I have not access so I wanted to check if you would be able to do it.
Answer: dis is not yet on boarded to ARB portal

Как мне получить оценку за этот ответ? Оценка здесь очень похожа на то, что я получаю, когда запускаю конвейер вопрос-ответ.

Я должен использовать этот подход, поскольку конвейер вопрос-ответ при использовании выдает мне ключевую ошибку для приведенного ниже кода.

from transformers import pipeline

nlp = pipeline("question-answering")

context = r"""
As checked Dis is not yet on boarded to ARB portal, hence we cannot upload the invoices in portal.
"""

print(nlp(question="Dis asked if it is possible to post the two invoice in ARB?", context=context))

huggingface-transformers

Saumyadip 22.08.2020 источник

comment

Оценка рассчитывается с помощью метода - cronoik 25.08.2020

comment

Спасибо, Cronoik. Очень ценится. Однако, я думаю, оба кода должны делать одно и то же. Так почему же конвейер ответов на вопросы дает сбой, тогда как первый дает мне результат? - Saumyadip 26.08.2020

comment

Привет, Кроник, не могли бы вы также показать мне, как использовать метод декодирования? Я перешел по указанной вами ссылке, но не могу понять, как ее использовать в моем контексте. Жду ваших ценных предложений. - Saumyadip 26.08.2020

comment

Коды будут производить то же самое, когда они на самом деле делают то же самое. В настоящее время конвейер загружает DistilBertForQuestionAnswering, а не bert-large-uncased-whole-word-masking-finetuned-squad. Вы можете сделать это случайно, указав параметр модели: nlp = pipeline("question-answering", model='bert-large-uncased-whole-word-masking-finetuned-squad'). Что также дает результат для вашего примера. Вы также должны иметь в виду, что конвейер более сложен, чем ваш код, и охватывает больше случаев в пре- и постобработке. Поэтому будут еще примеры, которые дадут 1/2 - cronoik 26.08.2020

comment

разные выходы. Что касается метода _decode, в настоящее время у меня нет времени объяснять вам, что там происходит. Если вы можете подождать, я вернусь к вашему вопросу через две недели. Если нет, не стесняйтесь отправить ответ самостоятельно 2/2 - cronoik 26.08.2020

comment

Привет, Cronoik, я очень ценю ваше время и благодарю вас за ваш ответ. Я обязательно буду ждать ответа от вас. - Saumyadip 29.08.2020

comment

Так много всего происходит в - Prayson W. Daniel 11.12.2020

Ответы (1)

arrow_upward
0
arrow_downward

Это моя попытка набрать очки. Похоже, я не могу понять, что feature.p_mask. Поэтому я не мог удалить неконтекстные индексы, которые влияют на softmax на данный момент.

# ... assuming imports and question and context

model_name="deepset/roberta-base-squad2"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForQuestionAnswering.from_pretrained(model_name)

inputs = tokenizer(question, context, 
                       add_special_tokens=True, 
                       return_tensors='pt')
input_ids = inputs['input_ids'].tolist()[0]

outputs = model(**inputs)
    

# used to compute score
start = outputs.start_logits.detach().numpy()
end = outputs.end_logits.detach().numpy()

# from source code

# Ensure padded tokens & question tokens cannot belong to the set of candidate answers.
#?? undesired_tokens = np.abs(np.array(feature.p_mask) - 1) & feature.attention_mask

# Generate mask

undesired_tokens = inputs['attention_mask']
undesired_tokens_mask = undesired_tokens == 0.0

# Make sure non-context indexes in the tensor cannot contribute to the softmax
start_ = np.where(undesired_tokens_mask, -10000.0, start)
end_ = np.where(undesired_tokens_mask, -10000.0, end)

# Normalize logits and spans to retrieve the answer
start_ = np.exp(start_ - np.log(np.sum(np.exp(start_), axis=-1, keepdims=True)))
end_ = np.exp(end_ - np.log(np.sum(np.exp(end_), axis=-1, keepdims=True)))

# Compute the score of each tuple(start, end) to be the real answer
outer = np.matmul(np.expand_dims(start_, -1), np.expand_dims(end_, 1))

# Remove candidate with end < start and end - start > max_answer_len
max_answer_len = 15
candidates = np.tril(np.triu(outer), max_answer_len - 1)
scores_flat = candidates.flatten()

idx_sort = [np.argmax(scores_flat)]
start, end = np.unravel_index(idx_sort, candidates.shape)[1:]
end += 1
score = candidates[0, start, end-1]
start, end, score = start.item(), end.item(), score.item()


print(tokenizer.decode(input_ids[start:end]))
print(score)

См. Дополнительные исходный код

Prayson W. Daniel 11.12.2020

comment

Привет, молитва, хорошая реализация. Можно ли получить оценку, если вы используете тип возвращаемого значения по умолчанию (список) вместо pt. (Поскольку список быстрее). Спасибо! - jokol; 17.12.2020

Как я могу получить оценку в конвейере вопросов и ответов? Есть ли ошибка при использовании конвейера вопрос-ответ?

Ответы (1)

Похожие вопросы